S16 1569431433

Association Rule Mining Using Genetic Algorithm: The role of Estimation Parameters
Presented By K.Indira Research Scholar, Dept. Of CSE, Pondicherry Engineering College.

Authors K.Indira & Dr. S. Kanmani
OBJECTIVE
To study the performance of the Genetic algorithm for association rule mining by varying the estimation parameters.
Data Mining
Extraction of interesting information or patterns from data in large databases is known as data mining.
ASSOCIATION ANALYSIS
Association analysis is the discovery of what are commonly called association rules.
It studies the frequency of items occurring together

in transactional databases Association rule mining provides valuable
information in assessing significant correlations.
Association Rules
Tid
Items bought Milk, Nuts, Sugar Milk, Coffee, Sugar Milk, Sugar, Eggs Nuts, Eggs, Bread
Nuts, Coffee, Sugar , Eggs, Bread Customer buys both
10 20 30 40 50
Customer buys sugar
Find all the rules X Y with minimum support and confidence Support, s, probability that a transaction contains X Y Confidence, c, conditional probability that a transaction having X also contains Y
Customer buys milk
Genetic Algorithm
A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology. Genetic algorithms use biologically inspired techniques such as genetic inheritance, natural selection, mutation, and sexual reproduction (recombination, or crossover).
Methodology
[Start] Generate random population of n chromosomes. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population. [New population] Create a new population by repeating the following steps until the new population is complete. A.[Selection] Select two parent chromosomes from a population according to their fitness. B. [Crossover] With a crossover probability alter the parents to form a new offspring. C.[Mutation] With a mutation probability mutate new offspring at each locus. D.[Accepting] Place new offspring in a new population [Replace] Use newly generated population for a further run of the algorithm [Test] If the end condition is satisfied, stop, and return the best solution in current population [Loop] Go to step 2. 7
Experimental Study
Encoding Population Size : : Binary Set to three different values. Random
Selection Method : Fitness Function :
Rs + Rc =1 (Rs 0Rc 0) and Suppmin, Confmin are respective values of minimum support and minimum confidence.
Experimental Study
Crossover Probability :
contd..
Fixed ( Tested with 3 values) No mutation Lenses, Iris, Haberman from UCI Irvine repository. Fitness becomes constant.
Mutation Probability Dataset
: :
Termination Condition
Software
MATLAB R2008a
Flow chart of the GA
10
Results Analysis
Comparison based on variation in population Size.
No. of Instances No. of Instances * 1.25 No. of Instances *1.5
No. of No. of No. of Accuracy Accuracy Accuracy Generations Generations Generations % % %
Lenses
Haberman
75 71
7 114
82 68
12 88
95 64
17 70
Iris
77
88
87
53
82
45
Population size Vs Accuracy

100 90
80
70
60
Accuracy
50
No. of instances No. of instances *1.25 No. of instances *1.5
40
30
20
10
Lenses
Haberman
Iris
12
Comparison based on variation in Minimum Support and Confidence

Minimum Support & Minimum Confidence Sup = 0.4 & con Sup =0.9 & con Sup = 0.9 & con Sup = 0.2 & con =0.4 =0.9 = 0.2 = 0.9
Accuracy %
No. of Gen
Accuracy
No. of Gen.
Accuracy
No. of Gen.
Accuracy
No. of Gen
Lenses
Haberm an
22 45 40
20 68 28
49 58 59
11 83 37
70 71 78
21 90 48
95 62 87
18 75 55
13
Iris
Minimum Support and Confidence Vs Accuracy

100
90 80 70 60
Accuracy
50 40 30 20 10 0
Lenses Haberman Iris
Sup = 0.4 & con =0.4
Sup = 0..9 & con =0.9
Sup = 0.9 & con =0.2
Sup = 0.2 & con =0.9
14
Comparison based on variation in Crossover Probability
Cross Over
Pc = .25
Accuracy % No. of
Generations
Pc = .5
Accuracy % No. of
Generations
Pc = .75
Accuracy % No. of
Generations
Lenses Haberman
95 69
8 77
95 71
16 83
95 70
13 80
Iris
84
45
86
51
87
55
15
Crossover Vs Accuracy
100 90 80 70
Accuracy
60
50 40 30 20 Pc = 0.25 Pc = 0.5 Pc = 0.75
10
0
Lenses
Haberman
Iris
16
Crossover Vs Convergance Rate

90 80
70
60
Accuracy
50 Pc = 0.25 40 Pc = 0.5 Pc = 0.75
30
20
10
Lenses
Haberman
Iris
17
Comparison of the optimum value of Parameters for maximum Accuracy achieved
Dataset
No.
of No.
of
Populatio
Minimum
Instances attributes n Size
Support
Minimum confidence
Crossover Accuracy in % rate
Lenses
Haberman
24 306
4 3
36 306
0.2 0.9
0.9 0.2
0.25 0.5
95 71
Iris
150
225
0.2
0.9
0.75
87
18
Inferences
Values of minimum support, minimum confidence and population size decides upon the accuracy of the system than other GA parameters. Crossover rate affects the convergence rate rather than the accuracy of the system. The optimum value of the GA parameters varies from data
to data and the fitness function plays a major role in

optimizing the results.
19
Conclusion
The setting of optimum value of the GA parameters varies from data to data and the fitness function plays a major role in optimizing the results. The size of the dataset and relationship between attributes in data contributes to the setting up of the parameters. The efficiency of the methodology could be further explored on more datasets with varying attribute sizes.
20
References
Cattral, R., Oppacher, F., Deugo, D. : Rule Acquisition with a Genetic Algorithm. In : Proceedings of the 1999 Congress on Evolutionary Computation,. CEC 99, 1999. Saggar, M., Agrawal, A.K., Lad, A. : Optimization of Association Rule Mining. In : IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, Page(s): 3725 3729, 2004. Zhou Jun, Li Shu-you, Mei Hong - yan, Liu Hai xia. : A Method for Finding Implicating Rules Based on the Genetic Algorithm. In : Third International Conference on Natural Computation, Volume: 3, Page(s): 400 405, 2007. Genxiang Zhang, Haishan Chen. : Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining. In : International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 345, 2009.
21
References
contd..
Gonzales, E., Mabu, S., Taboada, K., Shimada, K., Hirasawa, K.: Mining Multi-class Datasets using Genetic Relation Algorithm for Rule Reduction. In : IEEE Congress on Evolutionary Computation, CEC '09, Page(s): 3249 3255, 2009. Xian-Jun Shi, Hong Lei. : Genetic Algorithm-Based Approach for Classification Rule Discovery. In : International Conference on Information Management, Innovation Management and Industrial Engineering, ICIII '08, Volume: 1 , Page(s): 175 178, 2008. Haiying Ma, Xin Li. : Application of Data Mining in Preventing Credit Card Fraud. In : International Conference on Management and Service Science, MASS '09, Page(s): 1 6, 2009.
Hong Guo, Ya Zhou. : An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application. In : 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 120, 2009.
22
References
contd..
Hua Tang, Jun Lu. : Hybrid Algorithm Combined Genetic Algorithm with Information Entropy for Data Mining. In: 2nd IEEE Conference on Industrial Electronics and Applications, Page(s): 753 757, 2007.
Wenxiang Dou, Jinglu Hu, Hirasawa, K., Gengfeng Wu. : Quick Response Data Mining Model using Genetic Algorithm. In: SICE Annual Conference, Page(s): 1214 1219, 2008.
23
Thank You
24

S16 1569431433

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

S16 1569431433

Uploaded by

Copyright:

Available Formats

Association Rule Mining Using Genetic Algorithm: The role of Estimation Parameters

Presented By K.Indira Research Scholar, Dept. Of CSE, Pondicherry Engineering College.

It studies the frequency of items occurring together

information in assessing significant correlations.

Customer buys sugar

Customer buys milk

Selection Method : Fitness Function :

Mutation Probability Dataset

Flow chart of the GA

No. of No. of No. of Accuracy Accuracy Accuracy Generations Generations Generations % % %

Population size Vs Accuracy

No. of instances No. of instances *1.25 No. of instances *1.5

Comparison based on variation in Minimum Support and Confidence

Minimum Support and Confidence Vs Accuracy

Lenses Haberman Iris

Sup = 0.4 & con =0.4

Sup = 0..9 & con =0.9

Sup = 0.9 & con =0.2

Sup = 0.2 & con =0.9

Comparison based on variation in Crossover Probability

Crossover Vs Convergance Rate

50 Pc = 0.25 40 Pc = 0.5 Pc = 0.75

Comparison of the optimum value of Parameters for maximum Accuracy achieved

Instances attributes n Size

Crossover Accuracy in % rate

to data and the fitness function plays a major role in

You might also like

No. of instances No. of instances 1.25 No. of instances 1.5