You are on page 1of 6

16/06/2016

21MustKnowDataScienceInterviewQuestionsandAnswers

KDnuggets
DataMining,Analytics,BigData,andDataScience
SubscribetoKDnuggetsNews|Follow |Contact
searchKDnuggets
Search

DataMiningSoftware
News
Topstories
Opinions
Tutorials
Jobs
Academic
Companies
Courses
Datasets
Education
Meetings
Polls
Webinars

KDnuggetsHomeNews2016FebTutorials,Overviews21MustKnowDataScienceInterview
QuestionsandAnswers(16:n06)

LatestNews,Stories
Toptweets,Jun814:Allinone
DockerimageforDee...
HowMuchWillA.I.SurpriseUs?
FiguringOuttheAlgorithmsof
Intelligence
TMobile:Sr.DataScientist
YaleSchoolofManagement:Assistant
ProfessorofMark...
MoreNews&Stories|TopStories

21MustKnowDataScienceInterviewQuestions
http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook

1/6

16/06/2016

21MustKnowDataScienceInterviewQuestionsandAnswers

andAnswers
Previouspost
Nextpost
50

Tweet
Tags:Bootstrapsampling,DataScience,Interviewquestions,KirkD.Borne,Precision,Recall,Regularization,
YannLeCun
440

KDnuggetsEditorsbringyoutheanswersto20QuestionstoDetectFakeDataScientists,includingwhatis
regularization,DataScientistsweadmire,modelvalidation,andmore.
ByGregoryPiatetsky,KDnuggets.
comments
TherecentpostonKDnuggets
20QuestionstoDetectFakeDataScientistshasbeenverypopular
mostviewedinthemonthofJanuary.
Howeverthesequestionswerelackinganswers,soKDnuggets
Editorsgottogetherandwrotetheanswerstothesequestions.I
alsoaddedonemorecriticalquestionnumber21,whichwas
omittedfromthe20questionspost.
Herearetheanswers.Becauseofthelength,herearetheanswers
tothefirst11questions,andhereispart2.

Q1.Explainwhatregularizationisandwhyitis
useful.
AnswerbyMatthewMayo.
Regularizationistheprocessofaddingatuningparametertoamodeltoinducesmoothnessinordertoprevent
overfitting.(seealsoKDnuggetspostsonOverfitting)
Thisismostoftendonebyaddingaconstantmultipletoanexistingweightvector.Thisconstantisofteneither
theL1(Lasso)orL2(ridge),butcaninactualitycanbeanynorm.Themodelpredictionsshouldthenminimize
themeanofthelossfunctioncalculatedontheregularizedtrainingset.
XavierAmatriainpresentsagoodcomparisonofL1andL2regularizationhere,forthoseinterested.

http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook

2/6

16/06/2016

21MustKnowDataScienceInterviewQuestionsandAnswers

Fig1:Lpball:Asthevalueofpdecreases,thesizeofthecorrespondingLpspacealsodecreases.

Q2.Whichdatascientistsdoyouadmiremost?whichstartups?
AnswerbyGregoryPiatetsky:
Thisquestiondoesnothaveacorrectanswer,buthereismypersonallistof12DataScientistsImostadmire,
notinanyparticularorder.

http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook

3/6

16/06/2016

21MustKnowDataScienceInterviewQuestionsandAnswers

GeoffHinton,YannLeCun,andYoshuaBengioforperseveringwithNeuralNetswhenandstartingthecurrent
DeepLearningrevolution.
DemisHassabis,forhisamazingworkonDeepMind,whichachievedhumanorsuperhumanperformanceon
AtarigamesandrecentlyGo.
JakePorwayfromDataKindandRayidGhanifromU.Chicago/DSSG,forenablingdatasciencecontributions
tosocialgood.
DJPatil,FirstUSChiefDataScientist,forusingDataSciencetomakeUSgovernmentworkbetter.
KirkD.Borneforhisinfluenceandleadershiponsocialmedia.
ClaudiaPerlichforbrilliantworkonadecosystemandservingasagreatKDD2014chair.
HilaryMasonforgreatworkatBitlyandinspiringothersasaBigDataRockStar.
UsamaFayyad,forshowingleadershipandsettinghighgoalsforKDDandDataScience,whichhelpedinspire
meandmanythousandsofotherstodotheirbest.
HadleyWickham,forhisfantasticworkonDataScienceandDataVisualizationinR,includingdplyr,ggplot2,
andRstudio.
TherearetoomanyexcellentstartupsinDataSciencearea,butIwillnotlistthemheretoavoidaconflictof
interest.
Hereissomeofourpreviouscoverageofstartups.

Q3.Howwouldyouvalidateamodelyoucreatedtogenerateapredictivemodelofa
quantitativeoutcomevariableusingmultipleregression.

AnswerbyMatthewMayo.
Proposedmethodsformodelvalidation:
Ifthevaluespredictedbythemodelarefaroutsideoftheresponsevariablerange,thiswouldimmediately
indicatepoorestimationormodelinaccuracy.
Ifthevaluesseemtobereasonable,examinetheparametersanyofthefollowingwouldindicatepoor
estimationormulticollinearity:oppositesignsofexpectations,unusuallylargeorsmallvalues,or
observedinconsistencywhenthemodelisfednewdata.
Usethemodelforpredictionbyfeedingitnewdata,andusethecoefficientofdetermination(Rsquared)
asamodelvaliditymeasure.
Usedatasplittingtoformaseparatedatasetforestimatingmodelparameters,andanotherforvalidating
predictions.
Usejackkniferesamplingifthedatasetcontainsasmallnumberofinstances,andmeasurevaliditywithR
squaredandmeansquarederror(MSE).
Pages:123
http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook

4/6

16/06/2016

21MustKnowDataScienceInterviewQuestionsandAnswers

Previouspost
Nextpost

Mostpopularlast30days
Mostviewed
1.7StepstoMasteringMachineLearning
WithPython
2.RvsPythonforDataScience:The
Winneris...
3.Poll:Whatsoftwareyouusedfor
Analytics,DataMining,DataScience,
MachineLearningprojectsinthepast12
months?
4.TensorFlowDisappointsGoogleDeep
Learningfallsshallow
5.WhatistheDifferenceBetweenDeep
LearningandRegularMachine
Learning?
6.9MustHaveSkillsYouNeedtoBecomea
DataScientist
7.Top10DataAnalysisToolsforBusiness

Mostshared
1.WhatistheDifferenceBetweenDeep
LearningandRegularMachine
Learning?
2.HowtoExplainMachineLearningtoa
SoftwareEngineer
3.DataScienceofVariableSelection:A
Review
4.R,PythonDuelAsTopAnalytics,Data
SciencesoftwareKDnuggets2016
SoftwarePollResults
5.5MachineLearningProjectsYouCanNo
LongerOverlook
6.MachineLearningKeyTerms,Explained
7.HowtoBuildYourOwnDeepLearning
Box

MoreRecentStories
MiningTwitterDatawithPythonPart1:CollectingData
KDnuggets16:n21,Jun15:WhatBigData,DataSciencetools...
WhatBigData,DataScience,DeepLearningsoftwaregoestoget...
10UsefulPythonDataVisualizationLibrariesforAnyDiscipline
DataScienceSummit,July1213,SanFranciscoKDnugge...
10DataAcquisitionStrategiesforStartups
MachineLearningClassic:ParsimoniousBinaryClassificationT...
CrowdfundingAnalytics=NewRevelationsAhead
Webcast:Learnhowstatisticianscanworkacrossdisciplines.
TopStories,June612:DataScienceofVariableSelectionR,...
HowtoSelectSupportVectorMachineKernels
ApacheSparkKeyTerms,Explained
PPMIDataChallenge2016HelpSolveParkinsonsDisease
http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook

5/6

16/06/2016

21MustKnowDataScienceInterviewQuestionsandAnswers

ABriefPrimeronLinearRegressionPart2
MetisDataScienceOpenHouse,Jun13,NewYorkCity
DataInsightLeadersSummit,Barcelona,1213Oct
AIG&ZurichonMachineLearninginInsurance
ProjectMurphyMicrosoftBotFrameworkAI
DoingDataScience:AKaggleWalkthroughPart4DataTrans...
BuildYourOwnAudio/VideoAnalyticsAppWithHPEHavenOnDema...
KDnuggetsHomeNews2016FebTutorials,Overviews21MustKnowDataScienceInterview
QuestionsandAnswers(16:n06)
2016KDnuggets.AboutKDnuggets

SubscribetoKDnuggetsNews
|Follow
X

@kdnuggets|

http://www.kdnuggets.com/2016/02/21datascienceinterviewquestionsanswers.html?utm_content=bufferc54a6&utm_medium=social&utm_source=facebook

6/6

You might also like