You are on page 1of 69

SupervisedandUnsupervised

Learning
CiroDonalek
Ay/Bi199April2011

Summary
KDDandDataMiningTasks
Findingtheop?malapproach
SupervisedModels
NeuralNetworks
Mul?LayerPerceptron
DecisionTrees

UnsupervisedModels

DierentTypesofClustering
DistancesandNormaliza?on
Kmeans
SelfOrganizingMaps

Combiningdierentmodels

CommiOeeMachines
IntroducingaPrioriKnowledge
SleepingExpertFramework

KnowledgeDiscoveryinDatabases
KDDmaybedenedas:"Thenontrivialprocessof
iden2fyingvalid,novel,poten2allyuseful,and
ul2matelyunderstandablepa9ernsindata".
KDDisaninterac?veanditera?veprocessinvolving
severalsteps.

Yougotyourdata:whatsnext?

Whatkindofanalysisdoyouneed?Whichmodelismoreappropriateforit?

Cleanyourdata!
Datapreprocessingtransformstherawdata
intoaformatthatwillbemoreeasilyand
eec?velyprocessedforthepurposeofthe
user.
Sometasks

sampling:selectsarepresenta?vesubset
fromalargepopula?onofdata;
Noisetreatment
strategiestohandlemissingdata:some?mes
yourrowswillbeincomplete,notall
parametersaremeasuredforallsamples.
normaliza2on
featureextrac2on:pullsoutspecieddata
thatissignicantinsomepar?cularcontext.

Usestandard
formats!

MissingData
Missingdataareapartofalmostallresearch,andweallhaveto
decidehowtodealwithit.
CompleteCaseAnalysis:useonlyrowswithallthevalues
AvailableCaseAnalysis
Subs?tu?on
MeanValue:replacethemissingvaluewiththe
meanvalueforthatpar?cularaOribute
RegressionSubs?tu?on:wecanreplacethe
missingvaluewithhistoricalvaluefromsimilarcases
MatchingImputa?on:foreachunitwithamissingy,
ndaunitwithsimilarvaluesofxintheobserved
dataandtakeitsyvalue
MaximumLikelihood,EM,etc

SomeDMmodelscandealwithmissingdatabeOerthanothers.
Whichtechniquetoadoptreallydependsonyourdata

DataMining
CrucialtaskwithintheKDD
DataMiningisaboutautoma?ngtheprocessof
searchingforpaOernsinthedata.
Moreindetails,themostrelevantDMtasksare:
associa?on
sequenceorpathanalysis
clustering
classicaDon
regression
visualiza?on

FindingSoluDonviaPurposes
Youhaveyourdata,whatkindofanalysisdoyouneed?
Regression

predictnewvaluesbasedonthepast,inference
computethenewvaluesforadependentvariablebasedonthe
valuesofoneormoremeasuredaOributes

Classica?on:

dividesamplesinclasses
useatrainedsetofpreviouslylabeleddata

Clustering

par??oningofadatasetintosubsets(clusters)sothatdatain
eachsubsetideallysharesomecommoncharacteris?cs

Classica?onisinasomewaysimilartotheclustering,butrequires
thattheanalystknowaheadof?mehowclassesaredened.

ClusterAnalysis

Howmanyclustersdoyouexpect?

SearchforOutliers

ClassicaDon
Dataminingtechniqueusedtopredictgroupmembershipfor
datainstances.Therearetwowaystoassignanewvaluetoa
givenclass.
CrispyclassicaDon

givenaninput,theclassierreturnsitslabel

ProbabilisDcclassicaDon

givenaninput,theclassierreturnsitsprobabili?estobelongto
eachclass
usefulwhensomemistakescanbemore
costlythanothers(givemeonlydata>90%)
winnertakeallandotherrules
assigntheobjecttotheclasswiththe
highestprobability(WTA)
butonlyifitsprobabilityisgreaterthan40%
(WTAwiththresholds)

Regression/ForecasDng
Datatablesta?s?calcorrela?on
mappingwithoutanypriorassump?ononthefunc?onal
formofthedatadistribu?on;
machinelearningalgorithmswellsuitedforthis.

Curvegng
ndawelldenedandknown
func?onunderlyingyourdata;
theory/exper?secanhelp.

MachineLearning
Tolearn:togetknowledgeofbystudy,experience,
orbeingtaught.
TypesofLearning
Supervised
Unsupervised

UnsupervisedLearning
Themodelisnotprovidedwiththecorrectresults
duringthetraining.
Canbeusedtoclustertheinputdatainclasseson
thebasisoftheirsta?s?calproper?esonly.
Clustersignicanceandlabeling.
Thelabelingcanbecarriedoutevenifthelabelsare
onlyavailableforasmallnumberofobjects
representa?veofthedesiredclasses.

SupervisedLearning
Trainingdataincludesboththeinputandthe
desiredresults.
Forsomeexamplesthecorrectresults(targets)are
knownandaregivenininputtothemodelduring
thelearningprocess.
Theconstruc?onofapropertraining,valida?onand
testset(Bok)iscrucial.
Thesemethodsareusuallyfastandaccurate.
Havetobeabletogeneralize:givethecorrect
resultswhennewdataaregivenininputwithout
knowingapriorithetarget.

GeneralizaDon
Referstotheabilitytoproducereasonableoutputs
forinputsnotencounteredduringthetraining.

Inotherwords:NOPANICwhen
"neverseenbefore"dataaregiven
ininput!

Acommonproblem:OVERFITTING
Learnthedataandnottheunderlyingfunc?on
Performswellonthedatausedduringthetraining
andpoorlywithnewdata.

Howtoavoidit:usepropersubsets,earlystopping.

Datasets
Trainingset:asetofexamplesusedforlearning,
wherethetargetvalueisknown.
ValidaDonset:asetofexamplesusedtotunethe
architectureofaclassierandes?matetheerror.
Testset:usedonlytoassesstheperformancesofa
classier.Itisneverused
duringthetrainingprocess
sothattheerroronthetest
setprovidesanunbiased
es?mateofthegeneraliza?on
error.

IRISdataset
IRIS
consistsof3classes,50instanceseach
4numericalaOributes(sepalandpetallengthandwidth
incm)
eachclassreferstoatypeofIrisplant(Setosa,Versicolor,
Verginica)
therstclassislinearlyseparable
fromtheothertwowhilethe2nd
andthe3rdarenotlinearly
separable

ArDfactsDataset
PQAr?facts
2mainclassesand4numericalaOributes
classesare:trueobjects,ar?facts

DataSelecDon
Garbagein,garbageout:training,valida?onand
testdatamustberepresenta?veoftheunderlying
model
Alleventuali?esmustbecovered
Unbalanceddatasets

sincethenetworkminimizestheoverallerror,thepropor?on
oftypesofdatainthesetiscri?cal;
inclusionofalossmatrix(Bishop,1995);
onen,thebestapproachistoensureevenrepresenta?onof
dierentcases,thentointerpretthenetwork'sdecisions
accordingly.

ArDcialNeuralNetwork
AnAr?cialNeuralNetworkisan
informa?onprocessingparadigm
thatisinspiredbytheway
biologicalnervoussystemsprocess
informa?on:
alargenumberofhighly
interconnectedsimpleprocessing
elements(neurons)working
togethertosolvespecic
problems

AsimplearDcialneuron
Thebasiccomputa?onalelementisonencalledanodeorunit.It
receivesinputfromsomeotherunits,orfromanexternalsource.
Eachinputhasanassociatedweightw,whichcanbemodiedso
astomodelsynap?clearning.
Theunitcomputessomefunc?onoftheweightedsumofits
inputs:

NeuralNetworks
ANeuralNetworkisusuallystructuredintoaninputlayerofneurons,oneor
morehiddenlayersandoneoutputlayer.
Neuronsbelongingtoadjacentlayersareusuallyfullyconnectedandthe
varioustypesandarchitecturesareiden?edbothbythedierenttopologies
adoptedfortheconnec?onsaswellbythechoiceoftheac?va?onfunc?on.
Thevaluesofthefunc?onsassociatedwiththeconnec?onsarecalled
weights.
ThewholegameofusingNNsisinthefact
that,inorderforthenetworktoyield
appropriateoutputsforgiveninputs,the
weightmustbesettosuitablevalues.
Thewaythisisobtainedallowsafurther
dis?nc?onamongmodesofopera?ons.

NeuralNetworks:types
Feedforward:SingleLayerPerceptron,MLP,ADALINE(Adap?veLinear
Neuron),RBF
SelfOrganized:SOM(KohonenMaps)
Recurrent:SimpleRecurrentNetwork,
HopeldNetwork.
Stochas?c:Boltzmannmachines,RBM.
Modular:CommiOeeofMachines,ASNN
(Associa?veNeuralNetworks),
Ensembles.
Others:InstantaneouslyTrained,Spiking
(SNN),Dynamic,Cascades,NeuroFuzzy,
PPS,GTM.

MulDLayerPerceptron
TheMLPisoneofthemostusedsupervisedmodel:
itconsistsofmul?plelayersofcomputa?onalunits,
usuallyinterconnectedinafeedforwardway.
Eachneuroninonelayerhasdirectconnec?onsto
alltheneuronsofthesubsequentlayer.

LearningProcess
BackPropaga?on

theoutputvaluesarecomparedwiththetargettocomputethevalue
ofsomepredenederrorfunc?on
theerroristhenfedbackthroughthenetwork
usingthisinforma?on,thealgorithmadjuststheweightsofeach
connec?oninordertoreducethevalueoftheerrorfunc?on

Anerrepea?ngthisprocessforasucientlylargenumberoftrainingcycles,
thenetworkwillusuallyconverge.

HiddenUnits
Thebestnumberofhiddenunitsdependon:

numberofinputsandoutputs
numberoftrainingcase
theamountofnoiseinthetargets
thecomplexityofthefunc?ontobelearned

theac?va?onfunc?on

Toofewhiddenunits=>hightrainingandgeneraliza?onerror,dueto
undergngandhighsta?s?calbias.
Toomanyhiddenunits=>lowtrainingerrorbuthighgeneraliza?on
error,duetoovergngandhighvariance.
Rulesofthumbdon'tusuallywork.

AcDvaDonandErrorFuncDons

AcDvaDonFuncDons

Results:confusionmatrix

Results:completenessandcontaminaDon

Exercise:computecompletenessandcontamina?onforthepreviousconfusionmatrix(testset)

DecisionTrees
Isanotherclassica?onmethod.
Adecisiontreeisasetofsimplerules,suchas"ifthe
sepallengthislessthan5.45,classifythespecimenas
setosa."
Decisiontreesarealsononparametricbecausetheydo
notrequireanyassump?onsaboutthedistribu?onof
thevariablesineachclass.

Summary
KDDandDataMiningTasks
Findingtheop?malapproach
SupervisedModels
NeuralNetworks
Mul?LayerPerceptron
DecisionTrees

UnsupervisedModels

DierentTypesofClustering
DistancesandNormaliza?on
Kmeans
SelfOrganizingMaps

Combiningdierentmodels

CommiOeeMachines
IntroducingaPrioriKnowledge
SleepingExpertFramework

UnsupervisedLearning
Themodelisnotprovidedwiththecorrectresults
duringthetraining.
Canbeusedtoclustertheinputdatainclasseson
thebasisoftheirsta?s?calproper?esonly.
Clustersignicanceandlabeling.
Thelabelingcanbecarriedoutevenifthelabelsare
onlyavailableforasmallnumberofobjects
representa?veofthedesiredclasses.

TypesofClustering
Typesofclustering:
HIERARCHICAL:ndssuccessiveclustersusingpreviously
establishedclusters
agglomera?ve(boOomup):startwitheachelementinaseparatecluster
andmergethemaccordinglytoagivenproperty
divisive(topdown)

PARTITIONAL:usuallydeterminesallclustersatonce

Distances
Determinethesimilaritybetweentwoclustersand
theshapeoftheclusters.

Incaseofstrings
TheHammingdistancebetweentwostringsofequallengthis
thenumberofposi?onsatwhichthecorrespondingsymbols
aredierent.
measurestheminimumnumberofsubs2tu2onsrequiredto
changeonestringintotheother

TheLevenshtein(edit)distanceisametricformeasuringthe
amountofdierencebetweentwosequences.
isdenedastheminimumnumberofeditsneededtotransform
onestringintotheother.

1001001
1000100
HD=3

LD(BIOLOGY,BIOLOGIA)=2
BIOLOGY>BIOLOGI(subsDtuDon)
BIOLOGI>BIOLOGIA(inserDon)

NormalizaDon
VAR:themeanofeachaOribute
ofthetransformedsetofdata
pointsisreducedtozeroby
subtrac?ngthemeanofeach
aOributefromthevaluesofthe
aOributesanddividingtheresult
bythestandarddevia?onofthe
aOribute.

RANGE(MinMaxNormalizaDon):subtractstheminimumvalueofanaOributefromeachvalue
oftheaOributeandthendividesthedierencebytherangeoftheaOribute.Ithasthe
advantageofpreservingexactlyallrela?onshipinthedata,withoutaddinganybias.
SOFTMAX:isawayofreducingtheinuenceofextremevaluesoroutliersinthedatawithout
removingthemfromthedataset.Itisusefulwhenyouhaveoutlierdatathatyouwishto
includeinthedatasetwhiles?llpreservingthesignicanceofdatawithinastandarddevia?on
ofthemean.

KMeans

KMeans:howitworks

Kmeans:ProandCons

LearningK
Findabalancebetweentwovariables:thenumberof
clusters(K)andtheaveragevarianceoftheclusters.
Minimizebothvalues
Asthenumberofclustersincreases,theaverage
variancedecreases(uptothetrivialcaseofk=nand
variance=0).
Somecriteria:
BIC(BayesianInforma?onCriteria)
AIC(AkaikeInforma?onCriteria)
DavisBouldinIndex
ConfusionMatrix

SelfOrganizingMaps

SOMtopology

SOMPrototypes

SOMTraining

CompeDDveandCooperaDveLearning

SOMUpdateRule

Parameters

DMwithSOM

SOMLabeling

LocalizingData

ClusterStructure

ClusterStructure2

ComponentPlanes

RelaDveImportance

Howaccurateisyourclustering

Trajectories

CombiningModels

CommideeMachines

Aprioriknowledge

SleepingExperts

You might also like