Professional Documents
Culture Documents
OBJECTIVE
To study the performance of the Genetic algorithm for association rule mining by varying the estimation parameters.
Data Mining
Extraction of interesting information or patterns from data in large databases is known as data mining.
ASSOCIATION ANALYSIS
Association analysis is the discovery of what are commonly called association rules.
Association Rules
Tid
Items bought Milk, Nuts, Sugar Milk, Coffee, Sugar Milk, Sugar, Eggs Nuts, Eggs, Bread
Nuts, Coffee, Sugar , Eggs, Bread Customer buys both
10 20 30 40 50
Find all the rules X Y with minimum support and confidence Support, s, probability that a transaction contains X Y Confidence, c, conditional probability that a transaction having X also contains Y
Genetic Algorithm
A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology. Genetic algorithms use biologically inspired techniques such as genetic inheritance, natural selection, mutation, and sexual reproduction (recombination, or crossover).
Methodology
[Start] Generate random population of n chromosomes. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population. [New population] Create a new population by repeating the following steps until the new population is complete. A.[Selection] Select two parent chromosomes from a population according to their fitness. B. [Crossover] With a crossover probability alter the parents to form a new offspring. C.[Mutation] With a mutation probability mutate new offspring at each locus. D.[Accepting] Place new offspring in a new population [Replace] Use newly generated population for a further run of the algorithm [Test] If the end condition is satisfied, stop, and return the best solution in current population [Loop] Go to step 2. 7
Experimental Study
Encoding Population Size : : Binary Set to three different values. Random
Rs + Rc =1 (Rs 0Rc 0) and Suppmin, Confmin are respective values of minimum support and minimum confidence.
Experimental Study
Crossover Probability :
contd..
Fixed ( Tested with 3 values) No mutation Lenses, Iris, Haberman from UCI Irvine repository. Fitness becomes constant.
: :
Termination Condition
Software
MATLAB R2008a
10
Results Analysis
Comparison based on variation in population Size.
No. of Instances No. of Instances * 1.25 No. of Instances *1.5
Lenses
Haberman
75 71
7 114
82 68
12 88
95 64
17 70
Iris
77
88
87
53
82
45
80
70
60
Accuracy
50
40
30
20
10
Lenses
Haberman
Iris
12
No. of Gen
Accuracy
No. of Gen.
Accuracy
No. of Gen.
Accuracy
No. of Gen
Lenses
Haberm an
22 45 40
20 68 28
49 58 59
11 83 37
70 71 78
21 90 48
95 62 87
18 75 55
13
Iris
Accuracy
50 40 30 20 10 0
14
Cross Over
Pc = .25
Accuracy % No. of
Generations
Pc = .5
Accuracy % No. of
Generations
Pc = .75
Accuracy % No. of
Generations
Lenses Haberman
95 69
8 77
95 71
16 83
95 70
13 80
Iris
84
45
86
51
87
55
15
Crossover Vs Accuracy
100 90 80 70
Accuracy
60
50 40 30 20 Pc = 0.25 Pc = 0.5 Pc = 0.75
10
0
Lenses
Haberman
Iris
16
70
60
Accuracy
30
20
10
Lenses
Haberman
Iris
17
Dataset
No.
of No.
of
Populatio
Minimum
Support
Minimum confidence
Lenses
Haberman
24 306
4 3
36 306
0.2 0.9
0.9 0.2
0.25 0.5
95 71
Iris
150
225
0.2
0.9
0.75
87
18
Inferences
Values of minimum support, minimum confidence and population size decides upon the accuracy of the system than other GA parameters. Crossover rate affects the convergence rate rather than the accuracy of the system. The optimum value of the GA parameters varies from data
19
Conclusion
The setting of optimum value of the GA parameters varies from data to data and the fitness function plays a major role in optimizing the results. The size of the dataset and relationship between attributes in data contributes to the setting up of the parameters. The efficiency of the methodology could be further explored on more datasets with varying attribute sizes.
20
References
Cattral, R., Oppacher, F., Deugo, D. : Rule Acquisition with a Genetic Algorithm. In : Proceedings of the 1999 Congress on Evolutionary Computation,. CEC 99, 1999. Saggar, M., Agrawal, A.K., Lad, A. : Optimization of Association Rule Mining. In : IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, Page(s): 3725 3729, 2004. Zhou Jun, Li Shu-you, Mei Hong - yan, Liu Hai xia. : A Method for Finding Implicating Rules Based on the Genetic Algorithm. In : Third International Conference on Natural Computation, Volume: 3, Page(s): 400 405, 2007. Genxiang Zhang, Haishan Chen. : Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining. In : International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 345, 2009.
21
References
contd..
Gonzales, E., Mabu, S., Taboada, K., Shimada, K., Hirasawa, K.: Mining Multi-class Datasets using Genetic Relation Algorithm for Rule Reduction. In : IEEE Congress on Evolutionary Computation, CEC '09, Page(s): 3249 3255, 2009. Xian-Jun Shi, Hong Lei. : Genetic Algorithm-Based Approach for Classification Rule Discovery. In : International Conference on Information Management, Innovation Management and Industrial Engineering, ICIII '08, Volume: 1 , Page(s): 175 178, 2008. Haiying Ma, Xin Li. : Application of Data Mining in Preventing Credit Card Fraud. In : International Conference on Management and Service Science, MASS '09, Page(s): 1 6, 2009.
Hong Guo, Ya Zhou. : An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application. In : 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 120, 2009.
22
References
contd..
Hua Tang, Jun Lu. : Hybrid Algorithm Combined Genetic Algorithm with Information Entropy for Data Mining. In: 2nd IEEE Conference on Industrial Electronics and Applications, Page(s): 753 757, 2007.
Wenxiang Dou, Jinglu Hu, Hirasawa, K., Gengfeng Wu. : Quick Response Data Mining Model using Genetic Algorithm. In: SICE Annual Conference, Page(s): 1214 1219, 2008.
23
Thank You
24