You are on page 1of 24

http://Ka.rsten-Winkler.

de

Home hypKNOWsys Project WUM WUM Tutorial E-Mail

The Unfortunately Incomplete WUM Tutorial
This small tutorial should enable you to start the Web Utiliation Miner WUM! to create a
ne" demo minin# base! to import the $irst demo lo# $ile that comes "ith this distribution!
to create the %isitors& sessions contained in this lo# $ile! to build the a##re#ated lo# and to
e'ecute your $irst M(NT )uery "ith the M(NT )uery processor* (t co%ers the basic
techni)ues that you should +no" about be$ore minin# your o"n lo# $iles "ith WUM*
,d%anced techni)ues in usin# WUM are co%ered by the second part o$ the tutorial* (t is
stron#ly recommended to "or+ your "ay throu#h both parts o$ the tutorial be$ore startin#
your o"n minin# session*
(t is assumed that you success$ully installed the Web Utiliation Miner on your system and
modi$ied all necessary con$i#uration $iles* ($ you did not install WUM yet! please re$er to the
(nstallation -uide that is part o$ this User .ocumentation and continue "ith the installation
o$ this minin# so$t"are* The demo %ersion o$ WUM is supposed to be pure /a%a* There$ore
is should run "ithout di$$iculties on all e'istin# /a%a 0irtual Machines supportin# /a%a 1*2*2
or hi#her* Please note that this Web Utiliation Miner is a beta %ersion intended $or use in
research and education* The WUM team "ould really appreciate to #et all +inds o$ bu#
reports and $eature su##estions $or the $uture de%elopment o$ this so$t"are* 3imply drop
us an e-mail* -ood luc+ in e'plorin# WUM4 The Web Utiliation Miner*
,lternati%ely! you may be interested in readin# "hat others "rite about WUM4
5eli' 3chendel* Web-Usa#e-Minin#4 ,nalyse %orhandener Technolo#ien und
+ombinierter Einsat $6r +ennahl- und e$$iienorientierte ,nalyse %on 3er%er-
7o#$iles* Proje+tdo+umentation! 5achbereich Wirtscha$t! Hochschule Wismar*
Wismar! -ermany! /anuary 2889* (n -erman* :P.5 5ile! Mail! Web;
How to tart WUM
UN(< and 7inu'4 Open a ne" <-Terminal and ma+e sure that your current "or+in#
directory is the bin= subdirectory o$ the WUM_HOME directory* (n the #i%en e'ample! the
en%ironment %ariable JAVA_HOME is set to /usr/local/jdk1.2.2 and WUM_HOME is set to /users
/kwinkler/WUM.v60* The miner can be started as a bac+#round process by e'ecutin# the shell
script wumgui*
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
1 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
2 de 24 30/07/2014 10:28
Windo"s >?=>@=NT4 Open the Windo"s E'plorer by ri#ht-clic+in# the Start icon o$ your
tas+ bar and selectin# Explorer! open the home directory o$ WUM by bro"sin# the tree
%ie" o$ your $ile system and $inally double-clic+ the icon correspondin# to the $ile
startwum.pif* Usin# 7inu' and the K-.es+top En%ironment! the main $rame o$ the Web
Utiliation Miner may loo+ li+e this* The main "indo" o$ WUM can be resied or mo%ed on
your des+top "ithout di$$iculties*
How to !reate a "ew Minin# $ase
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
3 de 24 30/07/2014 10:28
Each minin# project re)uires a minin# base "ithin WUM* , minin# base contains
descripti%e in$ormation as "ell as an Object 3tore P3E Pro database and %arious other $iles
created by the miner durin# the minin# process* (n order to create a ne" minin# base $or
this tutorial! please open the File menu and select Create Mining Base.
There are $i%e te't $ields $or the parameters o$ the ne" minin# base* Each minin# base
must ha%e a uni)ue name that may include blan+ spaces and numbers* The correspondin#
"eb ser%er UA7 can optionally be stored $or $uture use*
Each minin# base must ha%e its o"n directory to store the database and other related $iles*
(t is recommended to create a subdirectory in the directory data $or each ne" minin# base
be$ore startin# the miner* The minin# base o$ this tutorial "ill be stored in the e'istin#
directory data/demoWebSite* Blic+ on the button (Directory) ... to open a $ile dialo# o$ your
operatin# system* (n order to select the necessary directory websites/demoWebSite! please
select the directory and clic+ OK* ,lternati%ely! the name o$ an e'istin# directory can be
entered in the correspondin# te't $ield*
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
4 de 24 30/07/2014 10:28
,$ter selectin# or enterin# the home directory o$ the ne" minin# base! the current dialo#
should - more or less - loo+ li+e this4
,dditionally! the local directory containin# the lo# $iles o$ your Web ser%er must be
speci$ied* The demo lo# $ile AccessLog.txt is stored in the same directory as the database*
There$ore! clic+ on the button (Log Files:) ... to open the $ile dialo# o$ your operatin#
system* Open the directory data/demoWebSite and $inally clic+ OK* to select the lo# $ile
directory*
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
5 de 24 30/07/2014 10:28
,$ter chec+in# the entered parameters! please clic+ the button OK in order to create the
ne" minin# base $or this tutorial* Blic+in# Bancel "ould abort the creation o$ a ne" minin#
base* (n this case! the $ocus "ould be returned to the main "indo" o$ WUM*
,$ter success$ully creatin# a ne" minin# base! the title o$ the main "indo" contains the
name o$ the ne" minin# base in brac+ets* The ne" minin# base is no" open and can be
used $or $urther operations* There can be only one open minin# base at a time* The Object
3tore P3E Pro database consists o$ three $iles WUM.MiningBase.* that are stored in the same
directory* Please do not edit! modi$y or delete these $iles*
Please note that the underlyin# Object 3tore P3E Pro is a sin#le user database only* The
.CM3 o$ Object 3tore P3E Pro uses a loc+in# mechanism to secure that each minin# base
is accessed by e'actly one user at a time* The database o$ an open minin# base is loc+ed
by creatin# a subdirectory WUM.MiningBase.odx in its home directory*
($ the pre%ious minin# session ended abnormally! the loc+ directory can be deleted by
WUM in order to start the miner* Ce$ore unloc+in# a database by $orce! ma+e sure that
there is no other user "or+in# "ith the correspondin# minin# base*
How to Import a %o# &ile
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
6 de 24 30/07/2014 10:28
,$ter creatin# a ne" or openin# an allready e'istin# minin# base! HTTP ser%er lo# $iles "ith
increasin# time stamps can subse)uently be imported into the minin# base* The import
module per$orms basic data cleanin# operations on each lo# $ile line and updates the
database "ith data o$ ne" %isitors and Web pa#es* (n order to import the small demo lo#
$ile into the tutorial minin# base! please open the File menu and select Import Log File.
The user inter$ace o$ the import module is depicted in the ne't picture* There are a $e"
parameters that must be speci$ied by the user be$ore a lo# $ile can be imported* ,part
$rom simply enterin# the lo# $ile name and its $ormat! all parameters concernin# the data
cleanin# process should be considered %ery care$ully*
The te't $ield Filename contains the de$ault directory o$ HTTP ser%er lo# $iles* Cy clic+in#
the button (Filename) ...! you can speci$y the lo# $ile to be imported usin# the $ile dialo# o$
your operatin# system* ,$ter choosin# the correct $ile and clic+in# OK! the complete lo# $ile
name "ill be sho"n in the te't $ield*
WUM currently supports $our "ide-spread lo# $ile $ormats* There is an e'ample lo# $ile line
o$ each $ile $ormat in the $ollo"in# table4 The e'ample lo# $ile AccessLog.txt corresponds to
the common lo# $ile $ormat* There$ore! please chec+ the Common Log File radio button*
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
7 de 24 30/07/2014 10:28
The $ollo"in# table contains an e'ample lo# $ile line $or each lo# $ile $ormat supported by
WUM4
Bommon
picasso.wiwi.hu-berlin.de - - [10/Dec/1999:23:06:31 +0200] "GET /index.html
HTTP/1.0" 200 3540
E'tended
picasso.wiwi.hu-berlin.de - - [10/Dec/1999:23:06:31 +0200] "GET /index.html
HTTP/1.0" 200 3540 "http://www.berlin.de/" "Mozilla/3.01 (Win95; I)"
Boo+ie
picasso.wiwi.hu-berlin.de - - [10/Dec/1999:23:06:31 +0200] "GET /index.html
HTTP/1.0" 200 3540 "http://www.berlin.de/" "Mozilla/3.01 (Win95; I)"
"VisitorID=10001; SessionID=20001"
M3-((3
picasso.wiwi.hu-berlin.de, -, 10.12.99, 23:06:31, W3SVC2, WWW,
100.100.100.100, 547, 444, 0, 200, 0, GET, /index.html, -,
(n order to reduce the number o$ "eb pa#es "ithin the WUM database! HTTP re)uests can
be truncated by cuttin# o$ all characters startin# at the $irst occurence o$ &D& EHTM7
anchorsF or &G& EB-( parametersF* E'amples4 ($ the option r!ncate "e#!ests: $ML
%nc&ors is enabled! the re)uests H-ET =contact*htmlDaddressH and H-et
=contact*htmlDemailH "ill both be shortened to H-ET =contact*htmlH and "ill there$ore be
treated as re)uests concernin# the same "eb pa#e* ($ the option r!ncate "e#!ests: C'I
(arameter is enabled! the re)uests HPO3T =c#i-bin=do"nload*c#iGuseridI12JK%ersionIaH
and HPO3T =c#i-bin=do"nload*c#iGuseridI9?LK%ersionIbH "ill both be shortened to HPO3T
=c#i-bin=do"nload*c#iH*
The WUM distribution contains a %ery small lo# $ile AccessLog.txt that is to be used in this
tutorial* :The tutorial is hope$ully to be continued at some point in time* .o you "ant to
helpG;
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
8 de 24 30/07/2014 10:28


Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
9 de 24 30/07/2014 10:28
Please +eep in mind that the import module o$ WUM per$orms only basic substrin#
operations on each lo# $ile line* ,ccordin# to the user&s indi%idual minin# #oals!
preprocessin# the ra" lo# $ile "ith the help o$ user speci$ic Perl scripts etc* can be
e'tremely use$ul*
How to 'naly(e a %o# &ile
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
10 de 24 30/07/2014 10:28
How to )isuali(e the !ontents
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
11 de 24 30/07/2014 10:28
The #enerated HTM7 report can be $ound here*
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
12 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
13 de 24 30/07/2014 10:28
(ma#e o$ Bomplete ,##re#ated 7o#
How to *+ecute MI"T ,ueries
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
14 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
15 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
16 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
17 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
18 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
19 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
20 de 24 30/07/2014 10:28
How to *+it from WUM
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
21 de 24 30/07/2014 10:28
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
22 de 24 30/07/2014 10:28
-emarks and an 'dditional *+ample
WUM accepts as input a template! i*e* an ordered list o$ %ariables and "ildcards! and a
conjunction o$ constraints on the statistics o$ those %ariables* (t $inds all se)uences! "hich
ta+en to#ether build a pattern Eactually a directed acyclic #raphF that satis$ies the template
and the constraints*
E'ample4 We are interested in an e%ent ' that occurs a$ter y "ith probability at least >?M*
This e%ent y should appear in at least 188 o$ our se)uences* ' needs not occur
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
23 de 24 30/07/2014 10:28
immediately a$ter y! but it should not be more than ? e%ents a"ay $rom y* This
speci$ication produces the template y [0;5] x "here x and y are %ariables* The "ildcard
[0;5] stands $or any number o$ e%ents! and the inter%al [0;5] constraints the "ildcard
bet"een ero and up to ? e%ents* The constraints on x and y result in t"o restrictions4
y.support >= 100
and ( x.support / y.support ) > 0.95
To $ind the se)uences satis$yin# this template and constraints! issue the $ollo"in# M(NT
)uery4
select t
from node as a b, template y [0;5] x as t
where y.support >= 100
and ( x.support / y.support ) > 0.95
Nou can use this )uery in the demo* Cut you ha%e to reduce the support o$ y! because
there is no e%ent that appears in more than @ se)uences* This "as just an e'ample* 5or
the $ormal de$initions and the description o$ the miner at "or+! please re$er to the
publications about the Web Utiliation Miner WUM*
When issuin# a M(NT )uery! WUM $inds all acceptable bindin#s $or the template %ariables*
, bindin# is a list o$ e%ents! i*e* o$ %alues! bound to the %ariables* , bindin# is acceptable i$
the e%ents comprisin# it appear in se)uences "hich4
con$orm to the template&s structure
ta+en to#ether constitute a #roup! the statistics o$ "hich satis$y the )uery
constraints
(n the abo%e e'ample! a UA7 HN*htmlH in the dataset could be bound to %ariable y* , UA7
H<*htmlH could then be bound to x! only i$ there e'ists a se)uence "here <*html appears
"ithin L positions a$ter N*html* 5or the bindin# to be acceptable! there should be at least
188 se)uences containin# N*html and >? o$ them should contain <*html in at most L
positions a$ter N*html* Those se)uences HcontributeH the bindin# EN*html! <*htmlF*
WUM disco%ers all acceptable bindin#s $or the )uery and builds a Hna%i#ation patternH $or
each bindin#* , na%i#ation pattern is a directed acyclic #raph comprised o$ the se)uences
contributin# the bindin#4 the se)uences ha%e been mer#ed at common pre$i' and at each
e%ent o$ the bindin#*
The %isualiation tool o$ WUM can display a na%i#ation pattern in t"o "ays4
The Htemplate treeH consists only o$ the e%ents comprisin# the bindin#* The e%ents
are annotated "ith the number o$ contributin# se)uences*
This $ormat #i%es an o%er%ie" o$ the e%ents that satis$y our )uery! "ithout
in$ormation on the surroundin# e%ents*
,n Ha##re#ate treeH is a set o$ subse)uences mer#ed on common pre$i'* 5or t"o
consecuti%e e%ents in the bindin#! the a##re#ate tree sho"s the $ra#ments o$ the
contributin# se)uences bet"een those t"o e%ents*
WUM cannot yet display #raphs* 3o! a na%i#ation pattern is split into a##re#ate
trees! one per e%ent in the bindin#* This e%ent is then the root o$ the a##re#ate
tree*

Top o$ the Pa#e O 7e#al Notice .ecember J! 2889
Karsten Winkler http://ka.rsten-winkler.de/hypknowsys/wum/wumTutorial.html
24 de 24 30/07/2014 10:28

You might also like