You are on page 1of 24

Association Rule Mining Using Genetic Algorithm: The role of Estimation Parameters

Presented By K.Indira Research Scholar, Dept. Of CSE, Pondicherry Engineering College.


Authors K.Indira & Dr. S. Kanmani

OBJECTIVE

To study the performance of the Genetic algorithm for association rule mining by varying the estimation parameters.

Data Mining
Extraction of interesting information or patterns from data in large databases is known as data mining.

ASSOCIATION ANALYSIS
Association analysis is the discovery of what are commonly called association rules.

It studies the frequency of items occurring together


in transactional databases Association rule mining provides valuable

information in assessing significant correlations.

Association Rules
Tid

Items bought Milk, Nuts, Sugar Milk, Coffee, Sugar Milk, Sugar, Eggs Nuts, Eggs, Bread
Nuts, Coffee, Sugar , Eggs, Bread Customer buys both

10 20 30 40 50

Customer buys sugar

Find all the rules X Y with minimum support and confidence Support, s, probability that a transaction contains X Y Confidence, c, conditional probability that a transaction having X also contains Y

Customer buys milk

Genetic Algorithm

A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology. Genetic algorithms use biologically inspired techniques such as genetic inheritance, natural selection, mutation, and sexual reproduction (recombination, or crossover).

Methodology
[Start] Generate random population of n chromosomes. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population. [New population] Create a new population by repeating the following steps until the new population is complete. A.[Selection] Select two parent chromosomes from a population according to their fitness. B. [Crossover] With a crossover probability alter the parents to form a new offspring. C.[Mutation] With a mutation probability mutate new offspring at each locus. D.[Accepting] Place new offspring in a new population [Replace] Use newly generated population for a further run of the algorithm [Test] If the end condition is satisfied, stop, and return the best solution in current population [Loop] Go to step 2. 7

Experimental Study
Encoding Population Size : : Binary Set to three different values. Random

Selection Method : Fitness Function :

Rs + Rc =1 (Rs 0Rc 0) and Suppmin, Confmin are respective values of minimum support and minimum confidence.

Experimental Study
Crossover Probability :

contd..
Fixed ( Tested with 3 values) No mutation Lenses, Iris, Haberman from UCI Irvine repository. Fitness becomes constant.

Mutation Probability Dataset

: :

Termination Condition

Software

MATLAB R2008a

Flow chart of the GA

10

Results Analysis
Comparison based on variation in population Size.
No. of Instances No. of Instances * 1.25 No. of Instances *1.5

No. of No. of No. of Accuracy Accuracy Accuracy Generations Generations Generations % % %

Lenses
Haberman

75 71

7 114

82 68

12 88

95 64

17 70

Iris

77

88

87

53

82

45

Population size Vs Accuracy


100 90

80

70

60

Accuracy

50

No. of instances No. of instances *1.25 No. of instances *1.5

40

30

20

10

Lenses

Haberman

Iris

12

Comparison based on variation in Minimum Support and Confidence


Minimum Support & Minimum Confidence Sup = 0.4 & con Sup =0.9 & con Sup = 0.9 & con Sup = 0.2 & con =0.4 =0.9 = 0.2 = 0.9
Accuracy %

No. of Gen

Accuracy

No. of Gen.

Accuracy

No. of Gen.

Accuracy

No. of Gen

Lenses
Haberm an

22 45 40

20 68 28

49 58 59

11 83 37

70 71 78

21 90 48

95 62 87

18 75 55
13

Iris

Minimum Support and Confidence Vs Accuracy


100
90 80 70 60

Accuracy

50 40 30 20 10 0

Lenses Haberman Iris

Sup = 0.4 & con =0.4

Sup = 0..9 & con =0.9

Sup = 0.9 & con =0.2

Sup = 0.2 & con =0.9

14

Comparison based on variation in Crossover Probability

Cross Over
Pc = .25
Accuracy % No. of
Generations

Pc = .5
Accuracy % No. of
Generations

Pc = .75
Accuracy % No. of
Generations

Lenses Haberman

95 69

8 77

95 71

16 83

95 70

13 80

Iris

84

45

86

51

87

55

15

Crossover Vs Accuracy
100 90 80 70

Accuracy

60
50 40 30 20 Pc = 0.25 Pc = 0.5 Pc = 0.75

10
0

Lenses

Haberman

Iris

16

Crossover Vs Convergance Rate


90 80

70

60

Accuracy

50 Pc = 0.25 40 Pc = 0.5 Pc = 0.75

30

20

10

Lenses

Haberman

Iris
17

Comparison of the optimum value of Parameters for maximum Accuracy achieved

Dataset

No.

of No.

of

Populatio

Minimum

Instances attributes n Size

Support

Minimum confidence

Crossover Accuracy in % rate

Lenses
Haberman

24 306

4 3

36 306

0.2 0.9

0.9 0.2

0.25 0.5

95 71

Iris

150

225

0.2

0.9

0.75

87

18

Inferences
Values of minimum support, minimum confidence and population size decides upon the accuracy of the system than other GA parameters. Crossover rate affects the convergence rate rather than the accuracy of the system. The optimum value of the GA parameters varies from data

to data and the fitness function plays a major role in


optimizing the results.

19

Conclusion
The setting of optimum value of the GA parameters varies from data to data and the fitness function plays a major role in optimizing the results. The size of the dataset and relationship between attributes in data contributes to the setting up of the parameters. The efficiency of the methodology could be further explored on more datasets with varying attribute sizes.

20

References
Cattral, R., Oppacher, F., Deugo, D. : Rule Acquisition with a Genetic Algorithm. In : Proceedings of the 1999 Congress on Evolutionary Computation,. CEC 99, 1999. Saggar, M., Agrawal, A.K., Lad, A. : Optimization of Association Rule Mining. In : IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, Page(s): 3725 3729, 2004. Zhou Jun, Li Shu-you, Mei Hong - yan, Liu Hai xia. : A Method for Finding Implicating Rules Based on the Genetic Algorithm. In : Third International Conference on Natural Computation, Volume: 3, Page(s): 400 405, 2007. Genxiang Zhang, Haishan Chen. : Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining. In : International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 345, 2009.
21

References

contd..

Gonzales, E., Mabu, S., Taboada, K., Shimada, K., Hirasawa, K.: Mining Multi-class Datasets using Genetic Relation Algorithm for Rule Reduction. In : IEEE Congress on Evolutionary Computation, CEC '09, Page(s): 3249 3255, 2009. Xian-Jun Shi, Hong Lei. : Genetic Algorithm-Based Approach for Classification Rule Discovery. In : International Conference on Information Management, Innovation Management and Industrial Engineering, ICIII '08, Volume: 1 , Page(s): 175 178, 2008. Haiying Ma, Xin Li. : Application of Data Mining in Preventing Credit Card Fraud. In : International Conference on Management and Service Science, MASS '09, Page(s): 1 6, 2009.

Hong Guo, Ya Zhou. : An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application. In : 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 120, 2009.
22

References

contd..

Hua Tang, Jun Lu. : Hybrid Algorithm Combined Genetic Algorithm with Information Entropy for Data Mining. In: 2nd IEEE Conference on Industrial Electronics and Applications, Page(s): 753 757, 2007.
Wenxiang Dou, Jinglu Hu, Hirasawa, K., Gengfeng Wu. : Quick Response Data Mining Model using Genetic Algorithm. In: SICE Annual Conference, Page(s): 1214 1219, 2008.

23

Thank You

24

You might also like