You are on page 1of 35

requestinthisforumtogetanswerstoanyparticularquestion.

HadoopClusterSetup:
1.Howwillyouadd/deleteaNodetotheexistingcluster?
A) Add: Add the host name/Ip address in dfs.hosts/slaves file and refresh the cluster with $hadoop dfsamin
refreshNodes
Delete:Addthehostname/Ipaddresstodfs.hosts.exclude/removetheentryfromslavesfileandrefreshtheclusterwith
$hadoopdfsaminrefreshNodes
2.WhatisSSH?WhatistheuseofitInHadoop?
A)SecureShell.
3.HowwillyousetupPasswordlessSSH?
A)searchinthissite
4.HowwillyouformattheHDFS?Howfrequentlyitwillbedone?
A)$hadoopnamnodeformat.
Note:Formathadtobedoneonlyoncethattoduringinitialclustersetup.
5.HowwillyoumanagetheLogfilesgeneratedinHadoopcluster?
A)
6.Doyouknowaboutcronjobs?HowwillyouSetup?
A)InUbuntu,gototheterminalandtype:

$crontabe
thiswillopenourpersonalcrontab(cronconfigurationfile),thefirstlineinthatfileexplainsitall,Ineverylinewecan
defineonecommandtorun,andtheformatisquitesimple.Sothestructureis:
minutehourdayofmonthmonthdayofweekcommand
Forallthenumbersyoucanuselistseg,5,34,55inthefirstfieldwillmeanrunat5past34pastand55pastwhatever
hourisdefined.
7.Whatistheroleof/etc/hostsfileinsettingupofHDFScluster?
A)ForhostnametoIpaddressmaping
8.WhatisdfsadmincommandinHadoop?
9.Ifoneofthedatanodeisfailedtostartontheclusterhowwillyoucometoknow?Andwhatarethenecessaryactionsto
betakennow?
A)ViaHDFSwebUI,wecanseenoofdecommissionednodesandweneedtorebalancetheclusternow
10.Whatistheimpactifnamenodefailsandwhatarethenecessaryactionitemsnow?
A)EntirehdfswillbedownandweneedtorestartthenamenodeaftercopyingfsimageandeditsfromsecondaryNN
11.WhatisLog4j?
A)LoggingFramework
12.Howdowesetlogginglevelforhadoopdaemons/commands?
A)Inlog4j.propertiesorinhadoopenv.shfile,hadoop.root.logger=INFO,console(WARN,DRFA)
13.Isthereanyimpactonmapreducejobsifthereisnomapredsite.xmlfilecreatedinHADOOP_HOME/confdirectorybut

allthenecessarypropertiesaredifinedinyarnsite.xml?
A)no
14.HowdoesHadoopsCLASSPATHplaysvitalroleinstartingorstoppinginhadoopdaemons.
A) Classpath will contain list of directories containing jar files required to start/stop daemons for example
HADOOP_HOME/share/hadoop/common/libcontainsallthecommonutilityjarfiles.
15.Whatisthedefaultlogginglevelinhadoop?
A)hadoop.root.logger=INFO,console.
16.Whatisthehadoop.tmp.dirconfigurationparameterdefaultto?
A) It is user.name. We need a directory that a user can write and also not to interfere with other users. If we didnt
includetheusername,thendifferentuserswouldsharethesametmpdirectory.Thiscancauseauthorizationproblems,
iffolksdefaultumaskdoesntpermitwriteby others.Itcan also resultinfolksstompingoneachother,whentheyre,
e.g.,playingwithHDFSandreformattheirfilesystem.
17.Howdoweverifythestatusandhealthofthecluster?
A)EitherbyHDFSWebUIathttp://namenode:50070/orby$hadoopdfsadminreport.
18.Whatisthereasonforthefrequentexceptionconnectionrefusedinhadoop?
A) If there is no configuration error at client machine or namenode machine, a common cause for this is the Hadoop
serviceisntrunning.IfthereisproblemwithCheckthatthereisntanentryforourhostnamemappedto127.0.0.1or
127.0.1.1in/etc/hosts.
19. How do we set a configuration property to be unique/constant across the cluster nodes and no slave nodes should
overridethis?
A) We can achive this by defining this property in core/hdfs/mapred/yarnsite.xml file on namenode with final tag as

shownbelow.
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
<final>true</final>
20.Doesthenamenodestayinsafemodetillallunderreplicatedfilesarefullyreplicated?
A)No.Thenamenodewaitsuntilallormajorityofdatanodesreporttheirblocks.Butnamenodewillstayinsafemode
untilaspecificpercentageofblocksofthesystemisminimallyreplicated.minimallyreplicatedisnotfullyreplicated.
More Hadoop Interview Questions at below links:
http://hadooptutorial.info/category/interview-questions/hadoop-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/mapreduce-interview-questions/
http://hadooptutorial.info/category/interview-questions/hbase-interview-questions-for-experienced-freshers/
http://hadooptutorial.info/category/interview-questions/hive-interview-questions/
http://hadooptutorial.info/category/interview-questions/pig-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/sqoop-interview-questions-and-answers/

HDFSInterviewQuestionsandAnswers:
1.WhatisDefaultreplicationfactorandhowwillyouchangeitatfilelevel?
2.Whydoweneedreplicationfactor>1inproductionHadoopcluster?

3.Howwillyoucombinethe4partrfilesofamapreducejob?
A)Usinghadoopfsgetmerge
4.WhataretheCompressiontechniquesinHDFSandwhichisthebestoneandwhy?
5.HowwillyouviewthecompressedfilesviaHDFScommand?
A)hadoopfstext
6.WhatisSecondaryNamenodeanditsFunctionalities?whydoweneedit?
7.WhatisBackupnodeandhowisitdifferentfromSecondarynamenode?
8.WhatisFSimageandeditlogsandhowtheyarerelated?
9.whatisdefaultblocksizeinHDFS?andwhyisitsolarge?
10.Howwillyoucopyalargefileof50GBintoHDFSinparllel
A)distcp
11.whatisBalancinginHDFS?
12.WhatisexpungeinHDFS?
A)Trashempty
13.WhatisthedefaulturiforHDFSWEBUI?CanwecreatefilesviaHDFSWEBUI?
A)namenode:50070.No.Itisreadonly
14.HowcanwecheckexistenceofnonzerolengthfileinHDFScommands
A)hadoopfstestcommand

15.WhatisIOUtilsinHDFSAPIandhowisituseful?
16.CanwearchivefilesinHDFS?Ifyes,howcanwedothat?
A)hadooparchivearchiveNameNAMEp<parentpath>srcdest
17.WhatissafemodeinHadoopandwhataretherestrictionsduringsafemode?
18.Whatisrackawarenessinhadoop?
19.Canwecomeoutofsafemodemanually,ifyeshow?
A)$hadoopdfsadminsafemodeenter/get/leave
20.Whyblocksizeinhadoopismaintainedasverybigcomparedtotraditionalblocksize?
21.WhatareSequencefilesandhowaretheydifferentfromtextfiles?
22.WhatisthelimitationofSequencefiles?
A)supportsonlyjava,nootherAPI
23.WhatareAvrofiles?
24.CananavrofilecreatedinJavainmachine1canbereadonmachinewithRubyAPI?
A)Yes
25.WheredoestheschemaofanAvrofileisstoreifthefileistransferredfromonehosttoanother?
A)inthesamefileitselfasaheadersection
26.HowdowehandlesmallfilesinHDFS?
A)mergeintosequence/avrofileorarchivethemintoharfiles.

27.WhatisdelegationtokeninHadoopandwhyisitimportant?
28.WhatisfsckinHadoop?
29.CanweappenddatarecordstoanexistingfileinHDFS?
A) Yes by command $ hdfs dfs appendToFile Appends single src, or multiple srcs from local file system to the
destinationfilesystem.Alsoreadsinputfromstdinandappendstodestinationfilesystem.
30.CanwegetcountoffilesinadirectoryonHDFSviacommandline?
A)Yesbyusingcommand$hdfsdfscounthdfs://NN/file1
31.HowdoweachievesecurityonHadoopcluster?
A)WithKerberose
32.CanwecreatemultiplefilesinHDFSwithdifferentblocksizes?
Yes.HDFSprovidesapitospecifyblocksizeatthetimeoffilecreation.Belowisthemethodsignature:
public FSDataOutputStream create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize) throws
IOException
33.Whatistheimportanceofdfs.namenode.name.dir?
Itcontainsthefsimagefilefornamenode,itshouldbeconfiguredtowritetoatleasttwofilesystemsondifferentphysical
hosts,namenodeandsecondarynamenode,asifwelosefsimagefilewewillloseentireHDFSfilesystemandthereis
nootherrecoverymechanismifthereisnofsimagefileavailable.
34.Whatistheneedforfsckinhadoop?
itcanbeusedtodeterminethefileswithmissingblocks.

35.DoesHDFSblockboundariesbebetweenrecordsoracrosstherecords?
No,HDFSdoesnotproviderecordorientedboundaries,Soblockscanendinthemiddleofarecord.
More Hadoop Interview Questions at below links:
http://hadooptutorial.info/category/interview-questions/hadoop-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/mapreduce-interview-questions/
http://hadooptutorial.info/category/interview-questions/hbase-interview-questions-for-experienced-freshers/
http://hadooptutorial.info/category/interview-questions/hive-interview-questions/
http://hadooptutorial.info/category/interview-questions/pig-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/sqoop-interview-questions-and-answers/

MapreduceInterviewQuestionsandAnswers:
1.WhatisSpeculativeexecution?
2.WhatisDistributedCache?
3.WorkFlowofMapReducejob?
A)map,combiner,reducer,shuffle,partitioner
4.Howwillyougloballysorttheoutputofmapreducejob?
A)totalorderpartitioner
5.DifferencebetweenmapsideandreducersideJoin?
[adsense]

6.WhatisMapreducechaining?

7.HowwillYoupassparameterstomapperorreducer?
8.Howwillyoucreatecustomkeyandvaluetypes?
9.SortingbasedonanycolumnotherthanKey?
10.Howwillyoucreatecustominputformats?
11.HowwillyouprocesshugenumberofsmallfilesinMRjob?
A)Afterconvertingintosequencefile/avrofile
12.CanwerunReducerwithoutMapper?
A) Yes in this Identity mapper will be run in the back ground to copy the input to reducer

13.Whethermapperandreducertasksruninparallel?Ifno,whyseesometimesas(map80%,reduce10%)?
A) No, its due to data copy phase.

14.Howwillyousetupacustomcountertodetectbadrecordsintheinput?
A)context.getcounter.enumvalue
[adsense]

15.HowwillyouschedulemapreduceJobs?
A) Through Oozie or Azkaban

16.whatiscombiner?Tellmeonescenariowhereitisnotsuitable?
A)foraggregatefunctions
17.Howwillyousubmitmapreducejobthroughcommandline?
18.Howwillyoukillarunningmapreducejob?

19.Forafailedmapreducejobhowwilltracefortherootcause
A)YarnWEBUI?logs>Userlogs?ApplicationIDcontainer?Syserr/syslog/
20.WhatwillyoudoifamapreducejobfailedwithJavaheapspaceerrormessage?
A)InHADOOP_CLIENT_OPTSorJAVA_CHILD_OPTSincreaseXmxproperty
21.Howmanymaptasks&reducetaskswillrunoneachdatanodebydefault
A)2maptasksand1reducetask
22)WhatistheminimumRAMcapacityneededforthisdatanode?
Asthere3jvmsrunningfor3tasks,1datanodedaemonalsoruns,so,itisneededatleast4GBRAM,assumingthatat
least1GBcanbeasssignedforeachYARNtask.
22.WhatisdifferencebetweenMapreduceandYARN?
23.WhatisTezframework?
A) An alternative framework for mapreduce, it can be used in Yarn in place of mapreduce

24.WhatisthedifferencebetweenTezandMapreduce?
A) Tez is at least 2 times faster than Mapreduce

25.Whatisinputsplit,inputformatandrecordreaderinMapreduceprogramming?
26.DoesMapreducesupportprocessingofAvrofiles?Ifyes,whatisthemainclassesoftheAPI?
27.HowwillyouprocessadatasetinJSONformatinmapreducejob?
A)JSONObjectclasscanbeusedtoparsetheJSONrecordsinthedataset
28.Canwecreatemultileveldirectorystructure(year/month/date)inMapreducebasedontheinputdata?

A)yesbyusingmultipleoutputs
29.WhatistherelationbetweenTextOutputFormatandKeyValueTextInputFormat?
A)secondoneisusedtoreadthefilescreatedbyfirstone
30.WhatisLazyOutpuFormatinMapreduceandwhydoweneedit?
A)createsoutputfilesifdataispresent
31.HowdowepreventfilesplittinginMapreduce?
A)byreturningfalsefromisSplittablemethodonourcustomInputFormatClass
32.WhatisthedifferencebetweenWritableandWritableComparableinterfaces?AndwhatissufficientforvaluetypeinMR
job?
A)writable
33.WhatistheRoleofApplicationMasterinrunningMapreducejobthroughYARN?
34.WhatisUbertask?
35.WhatareIdentityMapper&IdentityReducerclasses?
36.Howdowecreatejarfilewith.classfilesinadirectorythroughcommandline?
37.WhatisthedefaultportforYARNWebUI?
A)8088
38.HowcanwedistributeourapplicationsjarstoallofthenodesintheYARNclusterthatneedit?
39.HowdoWeincludenativelibrariesinYARNjobs?

A)byusingDjava.library.pathoptiononthecommandorelsebysettingLD_LIBRARY_PATHin.bashrcfile.
40.WhatisthedefaultschedulerinsideYARNframeworkforstartingtasks?
A)CapacityScheduler
41.HowdowehandlerecordbounderiesinTextfilesorSequencefilesinMapreduceInputsplits?
InMapreduce,InputSplitsRecordReaderwillstartandendatarecordboundary.InSequenceFiles,every2kbyteshas
a 20 bytes sync mark between the records. These sync marks allow the RecordReader to seek to the start of the
InputSplit, which contains a file, offset and length and find the first sync mark after the start of the split. The
RecordReadercontinuesprocessingrecordsuntilitreachesthefirstsyncmarkaftertheendofthesplit.Textfilesare
handledsimilarly,usingnewlinesinsteadofsyncmarks.
42.Sometimesmapreducejobswillfailifwesubmitthesamejobsfromadifferentuser?Whatisthecauseandhowdowe
fixthese?
A)Itmightbeduetomissingofsettingmapreduce.jobtracker.system.dir
43.Howtochangethedefaultlocationofmapreducejobsintermediatedata?
A)bychaningthevalueinmapreduce.cluster.local.dir
44.Ifamaptaskisfailedonceduringmapreducejobexecutionwilljobfailimmediately?
A)Noitwilltryrestartingthetasksuptomaxattemptsallowedonmap/reducetasks,bydefaultitis4
More Hadoop Interview Questions at below links:
http://hadooptutorial.info/category/interview-questions/hadoop-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/mapreduce-interview-questions/
http://hadooptutorial.info/category/interview-questions/hbase-interview-questions-for-experienced-freshers/

http://hadooptutorial.info/category/interview-questions/hive-interview-questions/
http://hadooptutorial.info/category/interview-questions/pig-interview-questions-for-experienced-and-freshers/
http://hadooptutorial.info/category/interview-questions/sqoop-interview-questions-and-answers/

PigInterviewQuestionsandanswers:
1.Howwillloadafileintopig?
2.Whatarethecomplexdatatypesinpig?
3.Whatisouterbag?
4.Loadanemptablefilewithcolumnsid,name,deptid,description.Displaynameandidwheredeptid=
5.HowwillyouwritecustomUDFs?
6.Whatisthedifferencebetweeninnerbagandouterbag?
7.Whatisatuple?
8.WhatisthedifferencebetweenFOREACHandFILTER?
9.Whatisthedifferencebetweenlocalmodeandmapreducemode?
10.WhatisthedifferencebetweenGROUPBYandJOINBYinPig?
11.HowmanyreducetaskswillberunifwespecifybothGROUPBYandORDERBYclausesinthesamepigscript?
12.WhatisDISTINCToperator?
13.DifferencebetweenUNION,JOINandCROSS?
14.HowdowesortrecordsindescendingorderinadatasetinPig?(ORDERDESC/ASC)

15.WhatisthedifferencebetweenGROPandCOGROUP?
16.WhatisthedifferencebetweenSTOREandDUMPcommands?
17.Howwillyoudebugapigscript?
A)setdebugon
18.CanwerunbasicHadoopfscommandsinGruntshell?
A)yes
19.CanwerunUnixshellcommandsfromGruntshellitself?
A)yesbyusingshcommand
20.Canwesubmitpigscriptsinbatchmodefromgruntshell?
A)yesbyusingrun/execcommand
21.Whatisthedifferencebetweenrunandexeccommandsingruntshell?
A)Runwillexecutethepigscriptinthesamegruntshellbutexecwillsubmitinanewgruntshell
22.WhatarediagnosticoperatorsinPig?
23.WhatisthedifferencebetweenEXPLAIN,ILLUSTRATEandDESCRIBE?
24.HowdoweaccessacustomeUDFfunctioncreatedinPig?
A)byusingREGISTERandDEFINEfunctionsitwillbeavailableinpigsession
25.WhatisDIFFfunctioninpig?
26.Canwedorandomsamplingfromalargedatasetinpig?

A)SAMPLEcommand
27.Howcanwedividerecordsofasingledatasetintomultipledatasetsbyusinganycriterialikecountrywise?
A)usingSPLITcommand
28.WhatisthedifferencebetweenCOUNTandCOUNT_STARTfunctionsinpig?
A)COUNT_STARTincludesnullvaluesalsoincountingwhereasCOUNTwillnot
29.WhatarePigStorage&HBaseStorage?
30.WhatistheuseofLIMITinpig?
31.WhatisthedifferencebetweenMapreduceandPigandcanweusePiginallscenarioswherewecanwriteMRjobs?
A)No

HiveInterviewQuestionsandAnswers:
1.Doeshivesupportrecordleveloperations?
2.InhivetablecanwechangestringDTtoIntDT?
3.CanwerenameaTableinHive?ifYes,How?
4.Whatismetastore?howwillyoustarttheservice?
5.WHatisSerdeinHive?Example?
6.DifferencebetweenHiveandHbase?
7.Howtoprintcolumnnameofatableinhivequeryresult?
8.Howwillyouknowwhetheratableisexternalormanaged?(descextended)

9.WhatisHivethriftserver?
10.Whatisthedifferencebetweenlocalmetastoreandembeddedmetastore?
11.HowdoweloaddataintoHivetablewithSequenceFileformatfromtextfileonlocalfilesystem.
12.WhatisHCatalog?
13.HowisHCatalogisdifferentfromHive?
14.WhatisWebHCat?
15.HowdoweimportXMLdataintoHive?
16.HowdoweimportCSVdataintoHive?
17.HowdoweimportJSONdataintoHive?
18.Whataredynamicpartitions?
19.CanaHivetablecontaindatainmorethanoneformat?
20.HowdoIimportAvrodataintoHive?
21.DoesHivehaveanODBCdriver?
A)YesclouderaprovidesODBCdriversforHiveserver
22.IsHiveQLcasesensitive?
A)no
23.DoesHivesupportUnicode?
A)YeswecanuseUnicodestringondata/comments,butcannotusefordatabase/table/columnname.

24.CanaHivetablecontaindatainmorethanoneformat?
25.Isitpossibletosetthedataformatonaperpartitionbasis?
26.Whataredynamicpartitions?
27.DoesHivehaveaJDBCDriver?
A)Yes,Thedriverisorg.apache.hadoop.hive.jdbc.HiveDriver.
Itsupportstwomodes:alocalmodeandaremoteone.
In the remote mode it connects to the hive server through its Thrift API. The JDBC url to use should be of the form:
jdbc:hive://hostname:port/databasename. In the local mode Hive is embedded. The JDBC url to use should be
jdbc:hive://.
28.HowcanweimportfixedwidthdataintoHive?
[adsense]

29.HowcanweimportASCIIlogfiles(HTTP,etc)intoHive?
30.WhenrunningaJOINquery,whatistheideatosolveoutofmemoryerrors.
A)This is usually caused by the order of JOIN tables. Instead of FROM tableA a JOIN tableB b ON , try FROM
tableB b JOIN tableA a ON . NOTE that if we are using LEFT OUTER JOIN, we might want to change to RIGHT
OUTERJOIN.Thistrickusuallysolvetheproblemtheruleofthumbis,alwaysputthetablewithalotofrowshaving
thesamevalueinthejoinkeyontherightmostsideoftheJOIN.
31.HowmanytimesTezenginerunsfasterthanMRengineinHive?
32.HowmuchtimeeachTezsessionwillbeactive?

HbaseInterviewQuestionsandanswers:

1.WhataretheCatalogtablesinHbase?
2.WhatisZookeeperroleinhbasearchitecture?
3.HowwillyoudropatableinHbase?
4.DoyouknowHiveonhbase?howwillyouachiveit?(Hbasestoragehandler)..Ifwedeleteatablefromhivewilliteffecton
hbasetable?
A)yes
[adsense]

5.Howwillyouloadbulkdataof50GBfileintoHbasetable?
6.LimitationsofHbase?(nosupportforsqlsyntax,indexing,joins,..)
7.DifferencebetweenHbaseandHdfs?
8.HowdoweintegrateHBaseandHive?
9.Howcanweadd/removeanodetoHBasecluster?
A)ByAdding/removinganentryinHBASE_CONF_DIR/regionserversfile
10.Canwesafelymovethehbaserootdirinhdfs?
A)Yes.HBasemustbedownforthemove.Afterthemove,updatethehbasesite.xmlacrosstheclusterandrestart.
11.CanwesafelymovethemasterfromnodeAtonodeB?
A)Yes.HBasemustbedownforthemove.Afterthemove,updatethehbasesite.xmlacrosstheclusterandrestart.
12.HowdowefixOutOfMemoryExceptionsinhbase?
A) Hbase uses a default of 1 GB heap size. By increasing this at HBASE_HEAPSIZE environment variable in

${HBASE_HOME}/conf/hbaseenv.shwecansolvetheseerrormessages.
13.HowcanwechangelogginglevelinHBase?
A)Inlog4j.propertiesfilewecansetlogginglevelasDEBUGlog4j.logger.org.apache.hadoop.hbase=DEBUGand
restartourclusterorinhbaseenv.shfile.
[adsense]

14.WhatportsdoesHBaseuse?
A)hbaserunsthemasteranditsinformationalhttpserverat60000and60010respectivelyandregionserversat60020
andtheirinformationalhttpserverat60030.
15.SometimesHBaseisignoringHDFSclientconfigurationsuchasdfs.replication.whatisthecause?
A)IfwemadeHDFSclientconfigurationonourhadoopcluster,HBasewillnotseethisconfigurationunless:
WeAddapointertoHADOOP_CONF_DIRtoCLASSPATHinhbaseenv.shorsymlinkyourhadoopsite.xmlfromthe
hbaseconfdirectory.
Addacopyofhadoopsite.xmlto${HBASE_HOME}/conf,orIfonlyasmallsetofHDFSclientconfigurations,addthem
tohbasesite.xml
Thefirstoptionisthebetterofthethreesinceitavoidsduplication.
16.Whatisthemaximumrecommendedcellsize?
A)Aroughruleofthumb,withlittleempiricalvalidation,istokeepthedatainHDFSandstorepointerstothedatain
HBaseifyouexpectthecellsizetobeconsistentlyabove10MB.Ifyoudoexpectlargecellvaluesandyoustillplanto
useHBaseforthestorageofcellcontents,youllwanttoincreasetheblocksizeandthemaximumregionsizeforthe
tabletokeeptheindexsizereasonableandthesplitfrequencyacceptable.
17.WhycantIiteratethroughtherowsofatableinreverseorder?
A)BecauseofthewayHFileworks:forefficiency,columnvaluesareputondiskwiththelengthofthevaluewrittenfirst

andthenthebytesoftheactualvaluewrittensecond.Tonavigatethroughthesevaluesinreverseorder,theselength
valueswouldneedtobestoredtwice(attheendaswell)orinasidefile.Arobustsecondaryindeximplementationis
thelikelysolutionheretoensuretheprimaryusecaseremainsfast.
18.Whatisphoenix?
A) phoenix is an sql layer on hbase
[adsense]

19.HowfastisPhoenix?Whyisitsofast?
A) Phoenix is fast. Full table scan of 100M rows usually completes in 20 seconds (narrow table on a medium sized
cluster).Thistimecomedowntofewmillisecondsifquerycontainsfilteronkeycolumns.Forfiltersonnonkeycolumns
ornonleadingkeycolumns,youcanaddindexonthesecolumnswhichleadstoperformanceequivalenttofilteringon
keycolumnbymakingcopyoftablewithindexedcolumn(s)partofkey.
20)WhyisPhoenixfastevenwhendoingfullscan?
A) Phoenix chunks up your query using the region boundaries and runs them in parallel on the client using a
configurablenumberofthreads
Theaggregationwillbedoneinacoprocessorontheserverside,collapsingtheamountofdatathatgetsreturnedback
totheclientratherthanreturningitall.

SqoopInterviewQuestionsandanswers:
1.HowwillyougetdatafromRDBMSintoHDFS?
2.Canwestoremysqltabledataassequencefileinhdfsviasqoop?
3.DoessqoopsupportcompressiontechniquestostoredatainHDFS?
[adsense]

4.Canweloadallthetablesinadatabaseintohdfsinasingleshot?
A)importalltables
5.CanwecopyasubsetofdatafromatableinRDBMSintoHDFS?(basedonsomecriteria)
A)Usingwherecountry=us'conditioninimportcommand
6.Howmanyreducetaskswillberunbydefaultforasqoopimportcommand?Howmanymappers?
A) 0 , 4
[adsense]

7.Ifwegetjavaheapspaceerrorandwehavealreadygiventhemaximummemory,whatisthepossiblesolution?
A)increasemappersbym100
8.WhatisthedefaultportforconnectingtoMySQLserver?
A)3036
9.HowcanweresolveaCommunicationsLinkFailurewhenconnectingtoMySQL?
VerifythatwecanconnecttothedatabasefromthenodewherewearerunningSqoop:
$mysqlhost=database=testuser=password=Addthenetworkportfortheservertoyourmy.cnffile.Setupauser
accounttoconnectviaSqoop.Grantpermissionstotheusertoaccessthedatabaseoverthenetwork:
LogintoMySQLasrootmysqlurootp
Issuethefollowingcommand:mysql>grantallprivilegeson*.*touser@%identifiedbytestpassword
mysql>grantallprivilegeson*.*touser@identifiedbytestpassword
10.CanweprovideSQLqueriesinSQOOPImportcommand?
[adsense]

FlumeInterviewQuestionsandanswers:
1.CanweloaddatadirectlyintoHbase?
A)yes
2.HowwillyoucreatedirectoriesinHDFSbasedonthetimestamppresentininputfile?
A)hdfs.path=/user/%y%m%d/%H%M%S)(formatescapesequences)
3.Whatwillhappenifnotimestampsarepresentininputfile?
itwillthrowanexeception,toslovethishdfs.useLocalTimeStam=true
[adsense]

4.Workflowofflume?
5.WhatarethechanneltypesinFlume?(Memory,JDBC,Filechannel)Whichoneisfastermemory?
6.HowwillyoustartaflumeagentfromCommandline?
7.Whatareinterceptorsinflume?
8. We are getting a NumberFormatException when using format escape sequences for date & time(%Y %M %D etc..) in
HDFSsink.Howcanwesolvethisexception?
TousedataescapesequencesinFlume,thereshouldbetimestamppresentinheaderofthesourcerecord.Ifthereis
notimestampinthesourcefile,wecansolvethisexceptionbytwoways
i)ByaddingTimestampinterceptorinsourceasshownbelow
a1.sources.tail.interceptors=ts
a1.sources.tail.interceptors.ts.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
ii)OrbyaddingUselocaltimestamp=trueparameterinconfigurationpropertiesofagentforHDFSsink.

[adsense]

9.WhatisthebridgemechanismusedforMultihopagentsetupinFlume?
A)AvroRPC
9.Whichisthereliablechanneltomakesurethereisnodataloss(JDBC,File,Memory)?
A)filechannelisreliable
10.WhatisFanoutflowinFlume?
11.WhataretheeventserializersavailableinFlume?
A)Text,Avro
12.HowdowecollectrecordsinJSONformatdirectlythroughFlume?
A)byusringJSONHandler
13.WhatisthedifferencebetweenFileSinkandFileRollSink?
14.DifferencebetweenASynchHbaseSinkandHBasesinktypes?
15.IfWeneedtotestthefunctionalitiesofacustomsourceandchannelandwedonotneedanysinkcanwesetupthiskind
ofagent?
A)Yes,withsinktypeasnull
16.CanweperformrealtimeanalysisonthedatacollectedbyFlumedirectly?ifyeshow?
A)yesbyusingMorphlineSolrSinkwecanextractdatafromFlumeevents,transformit,andloaditinnearrealtimeinto
ApacheSolrservers,whichinturnservequeriestoendusersorsearchapplications.
17.Ifweneedtogetspeedofmemorychannelanddatareliabiltyoffilechannelinasingleagentchannel,thenhowcanwe

achievethis?
A)UseSpillableMemoryChannelforthispurpose
18.Whataremultiplexingselectorsinflume?
19.Whatarereplicationselectiorsinflume?
[adsense]

20.WhatistheuseofHostInterceptorinflume?
21.WhatistheadvantageofUUIDInterceptorinflume?
22.Indefiningtypeofsourcesorsinksinflumeisitmandatorytoprovidethefullclassname?
A) No, we can also provide the alias names. For example, we use hdfs as sink.type in place of
org.apache.flume.sink.hdfs.HDFSEventSink

SplunkInterviewQuestionsandanswers:
1.WhatisSplunkandwhatishunk?
2.HowdoweconnecttoHDFSinhunk?
3.IsthereanyconnectorforHiveserverdirectlytoloadHivetablesintoHunk?
[adsense]

4.WhatisHiveSplitgeneratorinHiveprovider?
5.DoweneedtokeepHivethriftserverrunningandHivemetastoreservicesrunningtoretrievehivetablesintohunk?yes
6.Canwecreatedashboardsinhunkwithvisualizationchartsembeddedinit?

[adsense]

7.DeosHunksupportreadingofcompressedfiles(.gz,.bz2)filesonHadoop?
8.Doeshunksupportreadingofsnappycompressedfileonhadoop?
9.Wheredowecanlookfortheerrormessagesorexceptionsinsearchqueryinhunk?(search.logfileunderdispatcher
folderinHunkdistribution)
10.WhatisthedefaultportforaccessingHunkwebUI?(8000)
[adsense]

TableauInterviewQuestionsandanswers:
1.CanweuseTableauonLinuxserver?(no,supportsonlywindowsandmac)
2.WhatisthedifferencebetweenHunkandTableau
3.HowdoweconnecttoHiveserverfromTableau?
[adsense]

4.HowcanweconnecttoMySQLfromtableau?
5.Canweperformdatablendingoftwodifferentsourcesintableau?
6.Doweneedtowritequeriestoperformjoinsorfiltersintableau?
[adsense]

7.DoesTableaufireanymapreducejobsinthebackendtopulldatafromhive?

Oozie&AzkabanInterviewQuestionsandanswers:

1.WhatistheJobscheduleryouuseinyourproductioncluster?
2.DoesOoziesupporttimebasedtriggeringofjobs?(yes)
3.DoesAzkabansupporttimebasedtriggeringofjobs?(yes)
[adsense]

4.DoesOoziesupportdatabasedtriggeringofjobs?(yes)
5.DoesAzkabansupporttimebasedtriggeringofjobs?(yes)
6.CanwedefinedependenciesbetweenjobsinAkabanflows?(yes)
7.WhatisthedifferencebetweenOozieandAzkaban?
[adsense]

8.HowdowecreatepropertiesfilesinAzkaban?
9.HowdowecreatepropertiesfilesinOozie?

UnixInterviewQuestionsandanswers:
1.HowdoyouknowwhataretheprocessesrunninginUnix?
$pslistsalltheunixsystemprocesses
$jpsListsallthejavaprocesses
$jobsListsalltheprocessesthatweresuspendedandrunninginthe
background.Becausethejobscommandisaforegroundprocess,itcannotshowusactiveforegroundprocesses.
2.HowwillyoustopaprocessforciblyinUnix?
Usethebelowcommandtokill/stopaprocessforcibly.

$kill9processid
Hereoption9denotesforcekilling.
[adsense]

3.Willthebelowcommandsresultinsameoutput?
TEST=helloworld
$echo$TEST
$echoTEST
Ans)No.firstcommandwillprinthelloworldonconsoleandsecondonewillprintTESTonconsole.
4)HowcanwedefineconstantsinUnixshellscripting?
Ans)WecanachievethiswiththehelpofreadonlyvariablesinUnixshellscripting.
Forexample,considerthefollowingcommands:
$TEST1=hello
$readonlyTEST1
$echo$TEST1
hello
$TEST1=world
Thelastcommandresultsinanerrormessage:
/bin/sh:TEST1:Thisvariableisreadonly.
5)CanweunsetvariablesinUnix?
Yes,wecanreleasethevariablenamesbyusingunsetcommand.
Forexample,
$unsetTEST
willreleasethevariableTESTanditnolongerreferenceshelloworldstring.Butwecannotusetheunsetcommandto
unsetvariablesthataremarkedreadonly.Forexample,

$unsetTEST1
willresultinanerrormessage.
[adsense]

6)WhatareEnvironmentVariablesinUnix?
An environment variable is a variable that is available to any child process of the shell. We will make a variable
environmentalbyusingexportcommand.
Syntaxfordeclaringenvironmentvariables:
$name=valueexportname
7)WhatareShellVariablesinUnix?
A shell variable is a special variable that is set by the shell and is required by the shell in order to function correctly.
Someofthesevariablesareenvironmentvariableswhereasothersarelocalvariables.
Thesearethevariablesthattheshellsetsduringinitializationandusesinternally.Exampleare:
PWDIndicatesthecurrentworkingdirectoryassetbythecdcommand.
UIDExpandstothenumericuserIDofthecurrentuser,initializedatshellstartup.
PATHIndicatessearchpathforcommands.Itisacolonseparatedlistofdirectoriesinwhichtheshell
looksforcommands.Acommonvalueis
HOMEIndicatesthehomedirectoryofthecurrentuser:thedefaultargumentforthecdbuiltincommand.
8)Whatdoes$@representinUnix?(Allargumentsofcommand)
[adsense]

9)Whatis$?inunix(itisthestatusoflastexecutedcommand)
10)Whatissed?andwhydouseit?(Itisstreameditor,itcanbeusedforreplacingsetofcharacterswithotherset)

JAVAInterview Questions and answers:

1. what is java
2. about JVM,JRE,JDK
3. oops concept with realtime examples
4. String and string pool concept
5. diff between String,StringBuilder,String buffer
6. diff between final and finally
7. diff between equals and hashcode
8. comparission concept in set,hashmap,hashtable
9. accessing methods and variables using superclass reference and subclass object
10. what is abstract class and interface
11. what is is-a ,has-a ,uses-a relation in java
12. diff b/w comparator and comparable interface in java and its methods
13. what are mutable objects and immutable objects and how to create immutable object in java
14. what is default acceess modifiers for a variable in interface=====public static final
15. what are adapters classes in java
16. what is abstractfactory,singleton and facade design pattern in java
17. diff between anonymous,innerclass and nested class
18. how to create object for innerclass and nested class
19. what is exception
20. diff between checked and uncheckedexception
21. concepts of throw,throws,try,catch,finally
22. diff between classcastexception,classnotfound exception,nomethoddeff exception
23. what is collection
24. diff between arraylist and linkedlist, hashmap and linkedhashmap
25. about dictionary,vector,hashtable,properties
26. how to create stack using two arraylist
27. difference between java5,java6,java7,java8
28. what is autoboxing and unboxing(faeture from 1.5)
29. what is wrapper class
30. what is multithreading,
31. stages in multithreading
32. diff between sleep,wait,join methods
33. when we get interruptedexception and illegalmonitorstate exception
34. what is deadlock in java
35. what is synchronization

36. how many ways to craete object in java


37. what is serializable
38. what is transient keyword in ajava
39. diff b/w interpreter and compiler
40. what are the methods in java.lang.object class
41. what is externalizable
42. how many interface are there in collections
43. what is collections class in collections
44. what are applets and its advantages and disadvantages
45. socket communication in java(java.net.*;)
scenario
============
classA temp=new classA();
temp.i=5;
ClassB temp1=temp;
temp1.i=10;
sysout(temp.i and temp1.i);
temp1=null;
sysout(temp.i);
sysout(temp1.i);
==========================
46. how to find number of days between two date objects date1 and date2
47. Read all methods in Java.lang.math class
48. e. Math.random(),math.ceil,math.round,math.abs
49. what is pojo and poji
50. how to get a connection from database
51. diff between resultset and rowset

52. what is updatable resultset


53. what is metadata
54. how to get metadata of a table in java
55. how many ways to create string
56. differentaite below statements
=====
String s1=hai
String s2=new String(hai);
================
57. what is instance of keyword in java and how its check
58. what is min,normal,highest priority of a thread
59. which thread has least and highest priority in java
60. what are markerinterfaces and examples
61. concept about acessspecifiers public,private,protected,default and its other names
62. what accessmodifiers can we place for a variable inside a method(check it only final and static)
63. what is default access specifier in java
64. what are the accessspecifiers can be used for a class in java
65. why java does not support multiple inheritance and how java can achieve this?
66. what is static binding and dynamic binding?
67. how many ways to create threads
68. when we go for creating thread using extends and by implementing Runnable interface
69. what is garbageCollector concept in java how we can invoke programmatically
70. what is finalize method and when it is invoked
71. concept of system.out.println() method;
72. concept of list,set,map interfaces and its sub classes
73. difference between for each and iterator
74. which one will be executed below if no exception occurs
try{
return val;
}catch(Exception e){


}finally{
return val;
}
================== did below program exexcutes
class A{
public void f1(){
}
}
class B extends A{
protected void f1(){
}
}
================== did below program exexcutes
class A{
protected void f1(){
}
}
class B extends A{
public void f1(){
}

}
================== did below program exexcutes
class A{
public void f1() throws NullPointerException{
}
}
class B extends A{
public void f1() throws Exception{
}
}

75. what is volatile variable in java?


76. what is race condition in java?
scenario===
set s=new SortedSet();
s.add(1);
s.add(3);
s.add(2);
then which order it prints==ans===ClassCast ExceptionError

scenario====

Object arr[]={new Integer(5),new String(1),new Double(2)}


then which order its sorts using Arrays.sort(arr);====Error

scenario======

class A{

public void s(short n){


sop(short);
}
public void s(long n){
sop(long);
}

public void s(int n){


sop(int);}
psv main(String []args){
short a=1;
long b=10;
s(a);
s(b);

}
}
===
77. What is Connection? is it calss or interface, if it is interface where is the implementation calss
78. write your own logic of connection pooling
79. check with array declaration for types int,float,boolean,double.
80. learn narrowing and widening conversions.
81. difference between int[] a,b and int a[],b
82. concept of superclass object casting
83. instanceof example using inheritence concept;
84. switch statement allows only byte,char,int,string literals
85. difference between fail-safe iterator and fail-fast iterator;
86. brief copyonwritearraylist,concurrenthashmap
87. what is tight coupling and loose coupling in java
88. whats is interthread communication and example(producer and consumer example)
89. what is blockingqueue and linked blockingqueue
90. Why Collection doesnt extend Cloneable and Serializable interfaces ?
91. What do you know about the big-O notation and can you give some examples with respect to different data structures
92. What is the tradeoff between using an unordered array versus an ordered array ?
93. What is the difference between Serial and Throughput Garbage collector ?
94. what is connection pooling in java?
95. Explain Serialization and Deserialization.
96. Why wait, notify and notifyAll is defined in Object Class and not on Thread class in Java
97. Why wait notify and notifyAll called from synchronized block or method in Java
98. what is varargs

Author
Posts
Viewing 1 post (of 1 total)
Reply To: 250 Hadoop Interview Questions and answers for Experienced Hadoop developers
Your information:

You might also like