Professional Documents
Culture Documents
HadoopClusterSetup:
1.Howwillyouadd/deleteaNodetotheexistingcluster?
A) Add: Add the host name/Ip address in dfs.hosts/slaves file and refresh the cluster with $hadoop dfsamin
refreshNodes
Delete:Addthehostname/Ipaddresstodfs.hosts.exclude/removetheentryfromslavesfileandrefreshtheclusterwith
$hadoopdfsaminrefreshNodes
2.WhatisSSH?WhatistheuseofitInHadoop?
A)SecureShell.
3.HowwillyousetupPasswordlessSSH?
A)searchinthissite
4.HowwillyouformattheHDFS?Howfrequentlyitwillbedone?
A)$hadoopnamnodeformat.
Note:Formathadtobedoneonlyoncethattoduringinitialclustersetup.
5.HowwillyoumanagetheLogfilesgeneratedinHadoopcluster?
A)
6.Doyouknowaboutcronjobs?HowwillyouSetup?
A)InUbuntu,gototheterminalandtype:
$crontabe
thiswillopenourpersonalcrontab(cronconfigurationfile),thefirstlineinthatfileexplainsitall,Ineverylinewecan
defineonecommandtorun,andtheformatisquitesimple.Sothestructureis:
minutehourdayofmonthmonthdayofweekcommand
Forallthenumbersyoucanuselistseg,5,34,55inthefirstfieldwillmeanrunat5past34pastand55pastwhatever
hourisdefined.
7.Whatistheroleof/etc/hostsfileinsettingupofHDFScluster?
A)ForhostnametoIpaddressmaping
8.WhatisdfsadmincommandinHadoop?
9.Ifoneofthedatanodeisfailedtostartontheclusterhowwillyoucometoknow?Andwhatarethenecessaryactionsto
betakennow?
A)ViaHDFSwebUI,wecanseenoofdecommissionednodesandweneedtorebalancetheclusternow
10.Whatistheimpactifnamenodefailsandwhatarethenecessaryactionitemsnow?
A)EntirehdfswillbedownandweneedtorestartthenamenodeaftercopyingfsimageandeditsfromsecondaryNN
11.WhatisLog4j?
A)LoggingFramework
12.Howdowesetlogginglevelforhadoopdaemons/commands?
A)Inlog4j.propertiesorinhadoopenv.shfile,hadoop.root.logger=INFO,console(WARN,DRFA)
13.Isthereanyimpactonmapreducejobsifthereisnomapredsite.xmlfilecreatedinHADOOP_HOME/confdirectorybut
allthenecessarypropertiesaredifinedinyarnsite.xml?
A)no
14.HowdoesHadoopsCLASSPATHplaysvitalroleinstartingorstoppinginhadoopdaemons.
A) Classpath will contain list of directories containing jar files required to start/stop daemons for example
HADOOP_HOME/share/hadoop/common/libcontainsallthecommonutilityjarfiles.
15.Whatisthedefaultlogginglevelinhadoop?
A)hadoop.root.logger=INFO,console.
16.Whatisthehadoop.tmp.dirconfigurationparameterdefaultto?
A) It is user.name. We need a directory that a user can write and also not to interfere with other users. If we didnt
includetheusername,thendifferentuserswouldsharethesametmpdirectory.Thiscancauseauthorizationproblems,
iffolksdefaultumaskdoesntpermitwriteby others.Itcan also resultinfolksstompingoneachother,whentheyre,
e.g.,playingwithHDFSandreformattheirfilesystem.
17.Howdoweverifythestatusandhealthofthecluster?
A)EitherbyHDFSWebUIathttp://namenode:50070/orby$hadoopdfsadminreport.
18.Whatisthereasonforthefrequentexceptionconnectionrefusedinhadoop?
A) If there is no configuration error at client machine or namenode machine, a common cause for this is the Hadoop
serviceisntrunning.IfthereisproblemwithCheckthatthereisntanentryforourhostnamemappedto127.0.0.1or
127.0.1.1in/etc/hosts.
19. How do we set a configuration property to be unique/constant across the cluster nodes and no slave nodes should
overridethis?
A) We can achive this by defining this property in core/hdfs/mapred/yarnsite.xml file on namenode with final tag as
shownbelow.
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
<final>true</final>
20.Doesthenamenodestayinsafemodetillallunderreplicatedfilesarefullyreplicated?
A)No.Thenamenodewaitsuntilallormajorityofdatanodesreporttheirblocks.Butnamenodewillstayinsafemode
untilaspecificpercentageofblocksofthesystemisminimallyreplicated.minimallyreplicatedisnotfullyreplicated.
More Hadoop Interview Questions at below links:
http://hadooptutorial.info/category/interview-questions/hadoop-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/mapreduce-interview-questions/
http://hadooptutorial.info/category/interview-questions/hbase-interview-questions-for-experienced-freshers/
http://hadooptutorial.info/category/interview-questions/hive-interview-questions/
http://hadooptutorial.info/category/interview-questions/pig-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/sqoop-interview-questions-and-answers/
HDFSInterviewQuestionsandAnswers:
1.WhatisDefaultreplicationfactorandhowwillyouchangeitatfilelevel?
2.Whydoweneedreplicationfactor>1inproductionHadoopcluster?
3.Howwillyoucombinethe4partrfilesofamapreducejob?
A)Usinghadoopfsgetmerge
4.WhataretheCompressiontechniquesinHDFSandwhichisthebestoneandwhy?
5.HowwillyouviewthecompressedfilesviaHDFScommand?
A)hadoopfstext
6.WhatisSecondaryNamenodeanditsFunctionalities?whydoweneedit?
7.WhatisBackupnodeandhowisitdifferentfromSecondarynamenode?
8.WhatisFSimageandeditlogsandhowtheyarerelated?
9.whatisdefaultblocksizeinHDFS?andwhyisitsolarge?
10.Howwillyoucopyalargefileof50GBintoHDFSinparllel
A)distcp
11.whatisBalancinginHDFS?
12.WhatisexpungeinHDFS?
A)Trashempty
13.WhatisthedefaulturiforHDFSWEBUI?CanwecreatefilesviaHDFSWEBUI?
A)namenode:50070.No.Itisreadonly
14.HowcanwecheckexistenceofnonzerolengthfileinHDFScommands
A)hadoopfstestcommand
15.WhatisIOUtilsinHDFSAPIandhowisituseful?
16.CanwearchivefilesinHDFS?Ifyes,howcanwedothat?
A)hadooparchivearchiveNameNAMEp<parentpath>srcdest
17.WhatissafemodeinHadoopandwhataretherestrictionsduringsafemode?
18.Whatisrackawarenessinhadoop?
19.Canwecomeoutofsafemodemanually,ifyeshow?
A)$hadoopdfsadminsafemodeenter/get/leave
20.Whyblocksizeinhadoopismaintainedasverybigcomparedtotraditionalblocksize?
21.WhatareSequencefilesandhowaretheydifferentfromtextfiles?
22.WhatisthelimitationofSequencefiles?
A)supportsonlyjava,nootherAPI
23.WhatareAvrofiles?
24.CananavrofilecreatedinJavainmachine1canbereadonmachinewithRubyAPI?
A)Yes
25.WheredoestheschemaofanAvrofileisstoreifthefileistransferredfromonehosttoanother?
A)inthesamefileitselfasaheadersection
26.HowdowehandlesmallfilesinHDFS?
A)mergeintosequence/avrofileorarchivethemintoharfiles.
27.WhatisdelegationtokeninHadoopandwhyisitimportant?
28.WhatisfsckinHadoop?
29.CanweappenddatarecordstoanexistingfileinHDFS?
A) Yes by command $ hdfs dfs appendToFile Appends single src, or multiple srcs from local file system to the
destinationfilesystem.Alsoreadsinputfromstdinandappendstodestinationfilesystem.
30.CanwegetcountoffilesinadirectoryonHDFSviacommandline?
A)Yesbyusingcommand$hdfsdfscounthdfs://NN/file1
31.HowdoweachievesecurityonHadoopcluster?
A)WithKerberose
32.CanwecreatemultiplefilesinHDFSwithdifferentblocksizes?
Yes.HDFSprovidesapitospecifyblocksizeatthetimeoffilecreation.Belowisthemethodsignature:
public FSDataOutputStream create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize) throws
IOException
33.Whatistheimportanceofdfs.namenode.name.dir?
Itcontainsthefsimagefilefornamenode,itshouldbeconfiguredtowritetoatleasttwofilesystemsondifferentphysical
hosts,namenodeandsecondarynamenode,asifwelosefsimagefilewewillloseentireHDFSfilesystemandthereis
nootherrecoverymechanismifthereisnofsimagefileavailable.
34.Whatistheneedforfsckinhadoop?
itcanbeusedtodeterminethefileswithmissingblocks.
35.DoesHDFSblockboundariesbebetweenrecordsoracrosstherecords?
No,HDFSdoesnotproviderecordorientedboundaries,Soblockscanendinthemiddleofarecord.
More Hadoop Interview Questions at below links:
http://hadooptutorial.info/category/interview-questions/hadoop-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/mapreduce-interview-questions/
http://hadooptutorial.info/category/interview-questions/hbase-interview-questions-for-experienced-freshers/
http://hadooptutorial.info/category/interview-questions/hive-interview-questions/
http://hadooptutorial.info/category/interview-questions/pig-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/sqoop-interview-questions-and-answers/
MapreduceInterviewQuestionsandAnswers:
1.WhatisSpeculativeexecution?
2.WhatisDistributedCache?
3.WorkFlowofMapReducejob?
A)map,combiner,reducer,shuffle,partitioner
4.Howwillyougloballysorttheoutputofmapreducejob?
A)totalorderpartitioner
5.DifferencebetweenmapsideandreducersideJoin?
[adsense]
6.WhatisMapreducechaining?
7.HowwillYoupassparameterstomapperorreducer?
8.Howwillyoucreatecustomkeyandvaluetypes?
9.SortingbasedonanycolumnotherthanKey?
10.Howwillyoucreatecustominputformats?
11.HowwillyouprocesshugenumberofsmallfilesinMRjob?
A)Afterconvertingintosequencefile/avrofile
12.CanwerunReducerwithoutMapper?
A) Yes in this Identity mapper will be run in the back ground to copy the input to reducer
13.Whethermapperandreducertasksruninparallel?Ifno,whyseesometimesas(map80%,reduce10%)?
A) No, its due to data copy phase.
14.Howwillyousetupacustomcountertodetectbadrecordsintheinput?
A)context.getcounter.enumvalue
[adsense]
15.HowwillyouschedulemapreduceJobs?
A) Through Oozie or Azkaban
16.whatiscombiner?Tellmeonescenariowhereitisnotsuitable?
A)foraggregatefunctions
17.Howwillyousubmitmapreducejobthroughcommandline?
18.Howwillyoukillarunningmapreducejob?
19.Forafailedmapreducejobhowwilltracefortherootcause
A)YarnWEBUI?logs>Userlogs?ApplicationIDcontainer?Syserr/syslog/
20.WhatwillyoudoifamapreducejobfailedwithJavaheapspaceerrormessage?
A)InHADOOP_CLIENT_OPTSorJAVA_CHILD_OPTSincreaseXmxproperty
21.Howmanymaptasks&reducetaskswillrunoneachdatanodebydefault
A)2maptasksand1reducetask
22)WhatistheminimumRAMcapacityneededforthisdatanode?
Asthere3jvmsrunningfor3tasks,1datanodedaemonalsoruns,so,itisneededatleast4GBRAM,assumingthatat
least1GBcanbeasssignedforeachYARNtask.
22.WhatisdifferencebetweenMapreduceandYARN?
23.WhatisTezframework?
A) An alternative framework for mapreduce, it can be used in Yarn in place of mapreduce
24.WhatisthedifferencebetweenTezandMapreduce?
A) Tez is at least 2 times faster than Mapreduce
25.Whatisinputsplit,inputformatandrecordreaderinMapreduceprogramming?
26.DoesMapreducesupportprocessingofAvrofiles?Ifyes,whatisthemainclassesoftheAPI?
27.HowwillyouprocessadatasetinJSONformatinmapreducejob?
A)JSONObjectclasscanbeusedtoparsetheJSONrecordsinthedataset
28.Canwecreatemultileveldirectorystructure(year/month/date)inMapreducebasedontheinputdata?
A)yesbyusingmultipleoutputs
29.WhatistherelationbetweenTextOutputFormatandKeyValueTextInputFormat?
A)secondoneisusedtoreadthefilescreatedbyfirstone
30.WhatisLazyOutpuFormatinMapreduceandwhydoweneedit?
A)createsoutputfilesifdataispresent
31.HowdowepreventfilesplittinginMapreduce?
A)byreturningfalsefromisSplittablemethodonourcustomInputFormatClass
32.WhatisthedifferencebetweenWritableandWritableComparableinterfaces?AndwhatissufficientforvaluetypeinMR
job?
A)writable
33.WhatistheRoleofApplicationMasterinrunningMapreducejobthroughYARN?
34.WhatisUbertask?
35.WhatareIdentityMapper&IdentityReducerclasses?
36.Howdowecreatejarfilewith.classfilesinadirectorythroughcommandline?
37.WhatisthedefaultportforYARNWebUI?
A)8088
38.HowcanwedistributeourapplicationsjarstoallofthenodesintheYARNclusterthatneedit?
39.HowdoWeincludenativelibrariesinYARNjobs?
A)byusingDjava.library.pathoptiononthecommandorelsebysettingLD_LIBRARY_PATHin.bashrcfile.
40.WhatisthedefaultschedulerinsideYARNframeworkforstartingtasks?
A)CapacityScheduler
41.HowdowehandlerecordbounderiesinTextfilesorSequencefilesinMapreduceInputsplits?
InMapreduce,InputSplitsRecordReaderwillstartandendatarecordboundary.InSequenceFiles,every2kbyteshas
a 20 bytes sync mark between the records. These sync marks allow the RecordReader to seek to the start of the
InputSplit, which contains a file, offset and length and find the first sync mark after the start of the split. The
RecordReadercontinuesprocessingrecordsuntilitreachesthefirstsyncmarkaftertheendofthesplit.Textfilesare
handledsimilarly,usingnewlinesinsteadofsyncmarks.
42.Sometimesmapreducejobswillfailifwesubmitthesamejobsfromadifferentuser?Whatisthecauseandhowdowe
fixthese?
A)Itmightbeduetomissingofsettingmapreduce.jobtracker.system.dir
43.Howtochangethedefaultlocationofmapreducejobsintermediatedata?
A)bychaningthevalueinmapreduce.cluster.local.dir
44.Ifamaptaskisfailedonceduringmapreducejobexecutionwilljobfailimmediately?
A)Noitwilltryrestartingthetasksuptomaxattemptsallowedonmap/reducetasks,bydefaultitis4
More Hadoop Interview Questions at below links:
http://hadooptutorial.info/category/interview-questions/hadoop-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/mapreduce-interview-questions/
http://hadooptutorial.info/category/interview-questions/hbase-interview-questions-for-experienced-freshers/
http://hadooptutorial.info/category/interview-questions/hive-interview-questions/
http://hadooptutorial.info/category/interview-questions/pig-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/sqoop-interview-questions-and-answers/
PigInterviewQuestionsandanswers:
1.Howwillloadafileintopig?
2.Whatarethecomplexdatatypesinpig?
3.Whatisouterbag?
4.Loadanemptablefilewithcolumnsid,name,deptid,description.Displaynameandidwheredeptid=
5.HowwillyouwritecustomUDFs?
6.Whatisthedifferencebetweeninnerbagandouterbag?
7.Whatisatuple?
8.WhatisthedifferencebetweenFOREACHandFILTER?
9.Whatisthedifferencebetweenlocalmodeandmapreducemode?
10.WhatisthedifferencebetweenGROUPBYandJOINBYinPig?
11.HowmanyreducetaskswillberunifwespecifybothGROUPBYandORDERBYclausesinthesamepigscript?
12.WhatisDISTINCToperator?
13.DifferencebetweenUNION,JOINandCROSS?
14.HowdowesortrecordsindescendingorderinadatasetinPig?(ORDERDESC/ASC)
15.WhatisthedifferencebetweenGROPandCOGROUP?
16.WhatisthedifferencebetweenSTOREandDUMPcommands?
17.Howwillyoudebugapigscript?
A)setdebugon
18.CanwerunbasicHadoopfscommandsinGruntshell?
A)yes
19.CanwerunUnixshellcommandsfromGruntshellitself?
A)yesbyusingshcommand
20.Canwesubmitpigscriptsinbatchmodefromgruntshell?
A)yesbyusingrun/execcommand
21.Whatisthedifferencebetweenrunandexeccommandsingruntshell?
A)Runwillexecutethepigscriptinthesamegruntshellbutexecwillsubmitinanewgruntshell
22.WhatarediagnosticoperatorsinPig?
23.WhatisthedifferencebetweenEXPLAIN,ILLUSTRATEandDESCRIBE?
24.HowdoweaccessacustomeUDFfunctioncreatedinPig?
A)byusingREGISTERandDEFINEfunctionsitwillbeavailableinpigsession
25.WhatisDIFFfunctioninpig?
26.Canwedorandomsamplingfromalargedatasetinpig?
A)SAMPLEcommand
27.Howcanwedividerecordsofasingledatasetintomultipledatasetsbyusinganycriterialikecountrywise?
A)usingSPLITcommand
28.WhatisthedifferencebetweenCOUNTandCOUNT_STARTfunctionsinpig?
A)COUNT_STARTincludesnullvaluesalsoincountingwhereasCOUNTwillnot
29.WhatarePigStorage&HBaseStorage?
30.WhatistheuseofLIMITinpig?
31.WhatisthedifferencebetweenMapreduceandPigandcanweusePiginallscenarioswherewecanwriteMRjobs?
A)No
HiveInterviewQuestionsandAnswers:
1.Doeshivesupportrecordleveloperations?
2.InhivetablecanwechangestringDTtoIntDT?
3.CanwerenameaTableinHive?ifYes,How?
4.Whatismetastore?howwillyoustarttheservice?
5.WHatisSerdeinHive?Example?
6.DifferencebetweenHiveandHbase?
7.Howtoprintcolumnnameofatableinhivequeryresult?
8.Howwillyouknowwhetheratableisexternalormanaged?(descextended)
9.WhatisHivethriftserver?
10.Whatisthedifferencebetweenlocalmetastoreandembeddedmetastore?
11.HowdoweloaddataintoHivetablewithSequenceFileformatfromtextfileonlocalfilesystem.
12.WhatisHCatalog?
13.HowisHCatalogisdifferentfromHive?
14.WhatisWebHCat?
15.HowdoweimportXMLdataintoHive?
16.HowdoweimportCSVdataintoHive?
17.HowdoweimportJSONdataintoHive?
18.Whataredynamicpartitions?
19.CanaHivetablecontaindatainmorethanoneformat?
20.HowdoIimportAvrodataintoHive?
21.DoesHivehaveanODBCdriver?
A)YesclouderaprovidesODBCdriversforHiveserver
22.IsHiveQLcasesensitive?
A)no
23.DoesHivesupportUnicode?
A)YeswecanuseUnicodestringondata/comments,butcannotusefordatabase/table/columnname.
24.CanaHivetablecontaindatainmorethanoneformat?
25.Isitpossibletosetthedataformatonaperpartitionbasis?
26.Whataredynamicpartitions?
27.DoesHivehaveaJDBCDriver?
A)Yes,Thedriverisorg.apache.hadoop.hive.jdbc.HiveDriver.
Itsupportstwomodes:alocalmodeandaremoteone.
In the remote mode it connects to the hive server through its Thrift API. The JDBC url to use should be of the form:
jdbc:hive://hostname:port/databasename. In the local mode Hive is embedded. The JDBC url to use should be
jdbc:hive://.
28.HowcanweimportfixedwidthdataintoHive?
[adsense]
29.HowcanweimportASCIIlogfiles(HTTP,etc)intoHive?
30.WhenrunningaJOINquery,whatistheideatosolveoutofmemoryerrors.
A)This is usually caused by the order of JOIN tables. Instead of FROM tableA a JOIN tableB b ON , try FROM
tableB b JOIN tableA a ON . NOTE that if we are using LEFT OUTER JOIN, we might want to change to RIGHT
OUTERJOIN.Thistrickusuallysolvetheproblemtheruleofthumbis,alwaysputthetablewithalotofrowshaving
thesamevalueinthejoinkeyontherightmostsideoftheJOIN.
31.HowmanytimesTezenginerunsfasterthanMRengineinHive?
32.HowmuchtimeeachTezsessionwillbeactive?
HbaseInterviewQuestionsandanswers:
1.WhataretheCatalogtablesinHbase?
2.WhatisZookeeperroleinhbasearchitecture?
3.HowwillyoudropatableinHbase?
4.DoyouknowHiveonhbase?howwillyouachiveit?(Hbasestoragehandler)..Ifwedeleteatablefromhivewilliteffecton
hbasetable?
A)yes
[adsense]
5.Howwillyouloadbulkdataof50GBfileintoHbasetable?
6.LimitationsofHbase?(nosupportforsqlsyntax,indexing,joins,..)
7.DifferencebetweenHbaseandHdfs?
8.HowdoweintegrateHBaseandHive?
9.Howcanweadd/removeanodetoHBasecluster?
A)ByAdding/removinganentryinHBASE_CONF_DIR/regionserversfile
10.Canwesafelymovethehbaserootdirinhdfs?
A)Yes.HBasemustbedownforthemove.Afterthemove,updatethehbasesite.xmlacrosstheclusterandrestart.
11.CanwesafelymovethemasterfromnodeAtonodeB?
A)Yes.HBasemustbedownforthemove.Afterthemove,updatethehbasesite.xmlacrosstheclusterandrestart.
12.HowdowefixOutOfMemoryExceptionsinhbase?
A) Hbase uses a default of 1 GB heap size. By increasing this at HBASE_HEAPSIZE environment variable in
${HBASE_HOME}/conf/hbaseenv.shwecansolvetheseerrormessages.
13.HowcanwechangelogginglevelinHBase?
A)Inlog4j.propertiesfilewecansetlogginglevelasDEBUGlog4j.logger.org.apache.hadoop.hbase=DEBUGand
restartourclusterorinhbaseenv.shfile.
[adsense]
14.WhatportsdoesHBaseuse?
A)hbaserunsthemasteranditsinformationalhttpserverat60000and60010respectivelyandregionserversat60020
andtheirinformationalhttpserverat60030.
15.SometimesHBaseisignoringHDFSclientconfigurationsuchasdfs.replication.whatisthecause?
A)IfwemadeHDFSclientconfigurationonourhadoopcluster,HBasewillnotseethisconfigurationunless:
WeAddapointertoHADOOP_CONF_DIRtoCLASSPATHinhbaseenv.shorsymlinkyourhadoopsite.xmlfromthe
hbaseconfdirectory.
Addacopyofhadoopsite.xmlto${HBASE_HOME}/conf,orIfonlyasmallsetofHDFSclientconfigurations,addthem
tohbasesite.xml
Thefirstoptionisthebetterofthethreesinceitavoidsduplication.
16.Whatisthemaximumrecommendedcellsize?
A)Aroughruleofthumb,withlittleempiricalvalidation,istokeepthedatainHDFSandstorepointerstothedatain
HBaseifyouexpectthecellsizetobeconsistentlyabove10MB.Ifyoudoexpectlargecellvaluesandyoustillplanto
useHBaseforthestorageofcellcontents,youllwanttoincreasetheblocksizeandthemaximumregionsizeforthe
tabletokeeptheindexsizereasonableandthesplitfrequencyacceptable.
17.WhycantIiteratethroughtherowsofatableinreverseorder?
A)BecauseofthewayHFileworks:forefficiency,columnvaluesareputondiskwiththelengthofthevaluewrittenfirst
andthenthebytesoftheactualvaluewrittensecond.Tonavigatethroughthesevaluesinreverseorder,theselength
valueswouldneedtobestoredtwice(attheendaswell)orinasidefile.Arobustsecondaryindeximplementationis
thelikelysolutionheretoensuretheprimaryusecaseremainsfast.
18.Whatisphoenix?
A) phoenix is an sql layer on hbase
[adsense]
19.HowfastisPhoenix?Whyisitsofast?
A) Phoenix is fast. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized
cluster).Thistimecomedowntofewmillisecondsifquerycontainsfilteronkeycolumns.Forfiltersonnonkeycolumns
ornonleadingkeycolumns,youcanaddindexonthesecolumnswhichleadstoperformanceequivalenttofilteringon
keycolumnbymakingcopyoftablewithindexedcolumn(s)partofkey.
20)WhyisPhoenixfastevenwhendoingfullscan?
A) Phoenix chunks up your query using the region boundaries and runs them in parallel on the client using a
configurablenumberofthreads
Theaggregationwillbedoneinacoprocessorontheserverside,collapsingtheamountofdatathatgetsreturnedback
totheclientratherthanreturningitall.
SqoopInterviewQuestionsandanswers:
1.HowwillyougetdatafromRDBMSintoHDFS?
2.Canwestoremysqltabledataassequencefileinhdfsviasqoop?
3.DoessqoopsupportcompressiontechniquestostoredatainHDFS?
[adsense]
4.Canweloadallthetablesinadatabaseintohdfsinasingleshot?
A)importalltables
5.CanwecopyasubsetofdatafromatableinRDBMSintoHDFS?(basedonsomecriteria)
A)Usingwherecountry=us'conditioninimportcommand
6.Howmanyreducetaskswillberunbydefaultforasqoopimportcommand?Howmanymappers?
A) 0 , 4
[adsense]
7.Ifwegetjavaheapspaceerrorandwehavealreadygiventhemaximummemory,whatisthepossiblesolution?
A)increasemappersbym100
8.WhatisthedefaultportforconnectingtoMySQLserver?
A)3036
9.HowcanweresolveaCommunicationsLinkFailurewhenconnectingtoMySQL?
VerifythatwecanconnecttothedatabasefromthenodewherewearerunningSqoop:
$mysqlhost=database=testuser=password=Addthenetworkportfortheservertoyourmy.cnffile.Setupauser
accounttoconnectviaSqoop.Grantpermissionstotheusertoaccessthedatabaseoverthenetwork:
LogintoMySQLasrootmysqlurootp
Issuethefollowingcommand:mysql>grantallprivilegeson*.*touser@%identifiedbytestpassword
mysql>grantallprivilegeson*.*touser@identifiedbytestpassword
10.CanweprovideSQLqueriesinSQOOPImportcommand?
[adsense]
FlumeInterviewQuestionsandanswers:
1.CanweloaddatadirectlyintoHbase?
A)yes
2.HowwillyoucreatedirectoriesinHDFSbasedonthetimestamppresentininputfile?
A)hdfs.path=/user/%y%m%d/%H%M%S)(formatescapesequences)
3.Whatwillhappenifnotimestampsarepresentininputfile?
itwillthrowanexeception,toslovethishdfs.useLocalTimeStam=true
[adsense]
4.Workflowofflume?
5.WhatarethechanneltypesinFlume?(Memory,JDBC,Filechannel)Whichoneisfastermemory?
6.HowwillyoustartaflumeagentfromCommandline?
7.Whatareinterceptorsinflume?
8. We are getting a NumberFormatException when using format escape sequences for date & time(%Y %M %D etc..) in
HDFSsink.Howcanwesolvethisexception?
TousedataescapesequencesinFlume,thereshouldbetimestamppresentinheaderofthesourcerecord.Ifthereis
notimestampinthesourcefile,wecansolvethisexceptionbytwoways
i)ByaddingTimestampinterceptorinsourceasshownbelow
a1.sources.tail.interceptors=ts
a1.sources.tail.interceptors.ts.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
ii)OrbyaddingUselocaltimestamp=trueparameterinconfigurationpropertiesofagentforHDFSsink.
[adsense]
9.WhatisthebridgemechanismusedforMultihopagentsetupinFlume?
A)AvroRPC
9.Whichisthereliablechanneltomakesurethereisnodataloss(JDBC,File,Memory)?
A)filechannelisreliable
10.WhatisFanoutflowinFlume?
11.WhataretheeventserializersavailableinFlume?
A)Text,Avro
12.HowdowecollectrecordsinJSONformatdirectlythroughFlume?
A)byusringJSONHandler
13.WhatisthedifferencebetweenFileSinkandFileRollSink?
14.DifferencebetweenASynchHbaseSinkandHBasesinktypes?
15.IfWeneedtotestthefunctionalitiesofacustomsourceandchannelandwedonotneedanysinkcanwesetupthiskind
ofagent?
A)Yes,withsinktypeasnull
16.CanweperformrealtimeanalysisonthedatacollectedbyFlumedirectly?ifyeshow?
A)yesbyusingMorphlineSolrSinkwecanextractdatafromFlumeevents,transformit,andloaditinnearrealtimeinto
ApacheSolrservers,whichinturnservequeriestoendusersorsearchapplications.
17.Ifweneedtogetspeedofmemorychannelanddatareliabiltyoffilechannelinasingleagentchannel,thenhowcanwe
achievethis?
A)UseSpillableMemoryChannelforthispurpose
18.Whataremultiplexingselectorsinflume?
19.Whatarereplicationselectiorsinflume?
[adsense]
20.WhatistheuseofHostInterceptorinflume?
21.WhatistheadvantageofUUIDInterceptorinflume?
22.Indefiningtypeofsourcesorsinksinflumeisitmandatorytoprovidethefullclassname?
A) No, we can also provide the alias names. For example, we use hdfs as sink.type in place of
org.apache.flume.sink.hdfs.HDFSEventSink
SplunkInterviewQuestionsandanswers:
1.WhatisSplunkandwhatishunk?
2.HowdoweconnecttoHDFSinhunk?
3.IsthereanyconnectorforHiveserverdirectlytoloadHivetablesintoHunk?
[adsense]
4.WhatisHiveSplitgeneratorinHiveprovider?
5.DoweneedtokeepHivethriftserverrunningandHivemetastoreservicesrunningtoretrievehivetablesintohunk?yes
6.Canwecreatedashboardsinhunkwithvisualizationchartsembeddedinit?
[adsense]
7.DeosHunksupportreadingofcompressedfiles(.gz,.bz2)filesonHadoop?
8.Doeshunksupportreadingofsnappycompressedfileonhadoop?
9.Wheredowecanlookfortheerrormessagesorexceptionsinsearchqueryinhunk?(search.logfileunderdispatcher
folderinHunkdistribution)
10.WhatisthedefaultportforaccessingHunkwebUI?(8000)
[adsense]
TableauInterviewQuestionsandanswers:
1.CanweuseTableauonLinuxserver?(no,supportsonlywindowsandmac)
2.WhatisthedifferencebetweenHunkandTableau
3.HowdoweconnecttoHiveserverfromTableau?
[adsense]
4.HowcanweconnecttoMySQLfromtableau?
5.Canweperformdatablendingoftwodifferentsourcesintableau?
6.Doweneedtowritequeriestoperformjoinsorfiltersintableau?
[adsense]
7.DoesTableaufireanymapreducejobsinthebackendtopulldatafromhive?
Oozie&AzkabanInterviewQuestionsandanswers:
1.WhatistheJobscheduleryouuseinyourproductioncluster?
2.DoesOoziesupporttimebasedtriggeringofjobs?(yes)
3.DoesAzkabansupporttimebasedtriggeringofjobs?(yes)
[adsense]
4.DoesOoziesupportdatabasedtriggeringofjobs?(yes)
5.DoesAzkabansupporttimebasedtriggeringofjobs?(yes)
6.CanwedefinedependenciesbetweenjobsinAkabanflows?(yes)
7.WhatisthedifferencebetweenOozieandAzkaban?
[adsense]
8.HowdowecreatepropertiesfilesinAzkaban?
9.HowdowecreatepropertiesfilesinOozie?
UnixInterviewQuestionsandanswers:
1.HowdoyouknowwhataretheprocessesrunninginUnix?
$pslistsalltheunixsystemprocesses
$jpsListsallthejavaprocesses
$jobsListsalltheprocessesthatweresuspendedandrunninginthe
background.Becausethejobscommandisaforegroundprocess,itcannotshowusactiveforegroundprocesses.
2.HowwillyoustopaprocessforciblyinUnix?
Usethebelowcommandtokill/stopaprocessforcibly.
$kill9processid
Hereoption9denotesforcekilling.
[adsense]
3.Willthebelowcommandsresultinsameoutput?
TEST=helloworld
$echo$TEST
$echoTEST
Ans)No.firstcommandwillprinthelloworldonconsoleandsecondonewillprintTESTonconsole.
4)HowcanwedefineconstantsinUnixshellscripting?
Ans)WecanachievethiswiththehelpofreadonlyvariablesinUnixshellscripting.
Forexample,considerthefollowingcommands:
$TEST1=hello
$readonlyTEST1
$echo$TEST1
hello
$TEST1=world
Thelastcommandresultsinanerrormessage:
/bin/sh:TEST1:Thisvariableisreadonly.
5)CanweunsetvariablesinUnix?
Yes,wecanreleasethevariablenamesbyusingunsetcommand.
Forexample,
$unsetTEST
willreleasethevariableTESTanditnolongerreferenceshelloworldstring.Butwecannotusetheunsetcommandto
unsetvariablesthataremarkedreadonly.Forexample,
$unsetTEST1
willresultinanerrormessage.
[adsense]
6)WhatareEnvironmentVariablesinUnix?
An environment variable is a variable that is available to any child process of the shell. We will make a variable
environmentalbyusingexportcommand.
Syntaxfordeclaringenvironmentvariables:
$name=valueexportname
7)WhatareShellVariablesinUnix?
A shell variable is a special variable that is set by the shell and is required by the shell in order to function correctly.
Someofthesevariablesareenvironmentvariableswhereasothersarelocalvariables.
Thesearethevariablesthattheshellsetsduringinitializationandusesinternally.Exampleare:
PWDIndicatesthecurrentworkingdirectoryassetbythecdcommand.
UIDExpandstothenumericuserIDofthecurrentuser,initializedatshellstartup.
PATHIndicatessearchpathforcommands.Itisacolonseparatedlistofdirectoriesinwhichtheshell
looksforcommands.Acommonvalueis
HOMEIndicatesthehomedirectoryofthecurrentuser:thedefaultargumentforthecdbuiltincommand.
8)Whatdoes$@representinUnix?(Allargumentsofcommand)
[adsense]
9)Whatis$?inunix(itisthestatusoflastexecutedcommand)
10)Whatissed?andwhydouseit?(Itisstreameditor,itcanbeusedforreplacingsetofcharacterswithotherset)
1. what is java
2. about JVM,JRE,JDK
3. oops concept with realtime examples
4. String and string pool concept
5. diff between String,StringBuilder,String buffer
6. diff between final and finally
7. diff between equals and hashcode
8. comparission concept in set,hashmap,hashtable
9. accessing methods and variables using superclass reference and subclass object
10. what is abstract class and interface
11. what is is-a ,has-a ,uses-a relation in java
12. diff b/w comparator and comparable interface in java and its methods
13. what are mutable objects and immutable objects and how to create immutable object in java
14. what is default acceess modifiers for a variable in interface=====public static final
15. what are adapters classes in java
16. what is abstractfactory,singleton and facade design pattern in java
17. diff between anonymous,innerclass and nested class
18. how to create object for innerclass and nested class
19. what is exception
20. diff between checked and uncheckedexception
21. concepts of throw,throws,try,catch,finally
22. diff between classcastexception,classnotfound exception,nomethoddeff exception
23. what is collection
24. diff between arraylist and linkedlist, hashmap and linkedhashmap
25. about dictionary,vector,hashtable,properties
26. how to create stack using two arraylist
27. difference between java5,java6,java7,java8
28. what is autoboxing and unboxing(faeture from 1.5)
29. what is wrapper class
30. what is multithreading,
31. stages in multithreading
32. diff between sleep,wait,join methods
33. when we get interruptedexception and illegalmonitorstate exception
34. what is deadlock in java
35. what is synchronization
}finally{
return val;
}
================== did below program exexcutes
class A{
public void f1(){
}
}
class B extends A{
protected void f1(){
}
}
================== did below program exexcutes
class A{
protected void f1(){
}
}
class B extends A{
public void f1(){
}
}
================== did below program exexcutes
class A{
public void f1() throws NullPointerException{
}
}
class B extends A{
public void f1() throws Exception{
}
}
scenario====
scenario======
class A{
}
}
===
77. What is Connection? is it calss or interface, if it is interface where is the implementation calss
78. write your own logic of connection pooling
79. check with array declaration for types int,float,boolean,double.
80. learn narrowing and widening conversions.
81. difference between int[] a,b and int a[],b
82. concept of superclass object casting
83. instanceof example using inheritence concept;
84. switch statement allows only byte,char,int,string literals
85. difference between fail-safe iterator and fail-fast iterator;
86. brief copyonwritearraylist,concurrenthashmap
87. what is tight coupling and loose coupling in java
88. whats is interthread communication and example(producer and consumer example)
89. what is blockingqueue and linked blockingqueue
90. Why Collection doesnt extend Cloneable and Serializable interfaces ?
91. What do you know about the big-O notation and can you give some examples with respect to different data structures
92. What is the tradeoff between using an unordered array versus an ordered array ?
93. What is the difference between Serial and Throughput Garbage collector ?
94. what is connection pooling in java?
95. Explain Serialization and Deserialization.
96. Why wait, notify and notifyAll is defined in Object Class and not on Thread class in Java
97. Why wait notify and notifyAll called from synchronized block or method in Java
98. what is varargs
Author
Posts
Viewing 1 post (of 1 total)
Reply To: 250 Hadoop Interview Questions and answers for Experienced Hadoop developers
Your information: