You are on page 1of 13

DataminingwithWEKA

Ausecasetohelpyougetstarted

Charalampos Mavroforakis
BUCS105,Fall2011
StartingWEKA

OpenWeka :Start>AllPrograms>Weka 3.x.x>Weka 3.x


Fromthe"Weka GUIChooser",pick"Explorer".Thisisthe
mainWEKAtoolthatwearegoingtouse.
Openingadataset

Toopenadataset(a.csv fileinourcase),weclick"Openfile..."inthe
Preprocess tabandopenthefilethatcontainsourdata.Remember thatin
theopenmenuyouhavetochoosecsv ifyourfilewassavedassuch.Lets
openSPECT.csv
Transformingvaluestonominal(ifneeded)

Weka classifiedeveryattributeinourdatasetasnumeric,sowehavetomanuallytransform
themtonominal.Todoso,wewilluseafilter.WenavigatetoNumericToNominal,whichisin
Unsupervised >attribute.Ifweclickonthat,wewillgettotheoptionsofthatfilter.Mainly,the
mostinterestingonehereistheattributeIndices,whichenumeratesalltheattributesthatyou
wantthefiltertobeappliedon.Tofinish,weclickApply.
Splittingthedataset

Wehavetosplitthedatasetintotwo,30%testingand70%training.Todothat,wefirst
Randomize thedataset(Unsupervised >Instance),sothatwecreatearandompermutation.
Splittingthedataset

ThenweapplyRemovePercentage (Unsupervised >Instance)withpercentage30andsavethe


resultingdatasetastraining.
Splittingthedataset

Afterthat,weundoandapplythesamefilterchoosinginvertSelection thistime.Thiswillpick
therestofthedata(30%)sowesavethemasthetesting.
Trainingmodels

Fromnowonwewillbeusingthetrainingdataset.Weswitchtothetab"Classify"andwe
pickaclassifier.Let'sstartwithOneR,whichisthesamewiththeonewesawintheclass.
Trainingmodels

Wehavetospecifytheattributethatwewanttopredictandthetestingprocedure.Wefirst
wanttoseehowgoodOneR isasamodel,soweusecrossvalidation.,andonlyafterthat
willwegoandcheckwhatitpredictsontheunseendata.
Trainingmodels

Intheoutput,wegetinformationabouttheaverageaccuracyandtheconfusionmatrixof
ourmodel.
Trainingmodels

Inordertocheckhowwellwedoontheunseendata,weselect"supplied test set",weopen


thetestingdatasetthatwehavecreatedandwespecifywhichattributeistheclass.Werun
thealgorithmagainandwenoticethedifferencesintheconfusionmatrixandtheaccuracy.
Associationlearning

Ifallofourattributesarenominal(incasetheyarenot,wecandiscretizetheminthe
Preprocesstab)wecanalsodoassociationlearning.Inordertodothat,weswitchtothe
Association tabandwechoosetheApriori algorithm.Youcanplayaroundwithits
parametersifyouwant.
Associationlearning

Wecouldsetcar toTrue(sothatitproducesrulesthatpredicttheclassattribute)and
specifytheindexoftheattributethatwillbeconsideredasclass.minMetric setsthe
thresholdofconfidenceandnumRules limitsthenumberofrulesthatwillbecreated.The
resultwillbeasetofrulesthatpredicttheclass,togetherwiththeirconfidence.

You might also like