You are on page 1of 29

The Use of Evolutionary Algorithms in Data Mining

Ayush Joshi MScISE


Jordan Wallwork BScAICS
Khulood AlYahya MScISE
Sultanah AlOtaibi MScACS

1
Abstract

With the huge amount of data being generated in the world every day, at a rate far higher
than by which it can be analyzed by human comprehension alone, data mining becomes an
extremely important task for extracting as much useful information from this data as
possible. The standard data mining techniques are satisfactory to a certain extent but they
are constrained by certain limitations, and it is for these cases that evolutionary approaches
are both more capable and more efficient. In this paper we present the use of nature
inspired evolutionary techniques to data mining augmented with human interaction to
handle situations for which concept definitions are abstract and hard to define, hence not
quantifiable in an absolute sense. Finally, we propose some ideas for these techniques for
future implementations.

Keywords: data mining, knowledge discovery, evolutionary algorithms, interactive


evolutionary algorithms, genetic algorithms, genetic programming, co-evolutionary
algorithms, rule discovery, classification, clustering, data mining tasks, data mining
algorithms

2
Table of Contents

1 Introduction ............................................................................................................................. 4
2 Overview of Data mining and Knowledge Discovery ............................................... 5
2.1 Data Mining Pre-processing ....................................................................................... 5
2.2 Data Mining Tasks .......................................................................................................... 6
2.2.1 Models and Patterns ............................................................................................. 6
2.3 Conventional Techniques of Data Mining ............................................................. 8
3 Evolutionary Algorithms and Data Mining ................................................................... 8
3.1 Genetic Algorithms ........................................................................................................ 9
3.2 Genetic Programming ................................................................................................... 9
3.3 Co-evolutionary Algorithms .................................................................................... 10
3.4 Representation and Encoding ................................................................................ 11
3.4.1 Rules Representation ........................................................................................ 11
3.4.2 Fuzzy Logic Based Rules Representation .................................................. 12
3.5 Genetic Operators........................................................................................................ 12
3.5.1 Crossover ............................................................................................................... 12
3.5.2 Mutation ................................................................................................................. 13
3.5.3 Fuzzy logic Operators ....................................................................................... 13
3.6 Fitness Evaluation ....................................................................................................... 15
3.6.1 Objective Fitness Evaluation .......................................................................... 15
3.6.2 Subjective Fitness Evaluation (Interactive Evolutionary Algorithms)
17
3.7 Selection and Replacement...................................................................................... 18
3.8 Integrating Conventional Techniques with Evolutionary Algorithms.... 19
4 Applications of Data Mining Using IEA ....................................................................... 19
4.1 Extracting Knowledge from a Text Database ................................................... 19
4.2 Extracting Marketing Rules from User Data ..................................................... 22
4.3 Fraud Detection Using Data Mining and IEA Techniques ............................ 24
4.4 Some current work being done.............................................................................. 25
5 Conclusion and Future Work .......................................................................................... 25
6 References .............................................................................................................................. 27

3
1 Introduction

In recent years, the massive growth in the amount of stored data has increased the demand
for effective data mining methods to discover the hidden knowledge and patterns in these
data sets. Data mining means to “mine” or extract relevant information from any available
data of concern to the user. Data mining is not a new technique but has been around for
centuries and has been used for problems like regression analysis, or knowledge discovery
from records of various types.

As computers invaded almost all conceivable fields of human knowledge and occupation,
their advantages were advocated all over, but what was observed soon enough was that
with the increasing amounts of data that could be generated, stored and analysed there was
a need to define some way to sift through it and grab the important stuff out. During the
earlier days a human or a group of humans would sit down to analyse the data by going
through it manually and using statistical techniques, but the curve of data generation was
far steeper than what could realistically be processed by hand. This led to the emergence of
the field of data mining, which was essentially to define and formalize standard techniques
to extract data from large data warehouses. As data mining evolved it was observed that the
data at hand was almost always never perfect or suitable to be fed to data mining engines
and needed several steps of pre-processing before it could be put through “mining”.
Generally these inconsistencies would be in data format, level of noise or incorrect data,
unnecessary data, redundant data etc. These steps would clean, integrate, discretize and
select the most relevant attributes before performing any mining.

A whole new area called Intelligent data analysis has emerged which utilises efficient
techniques for mining data from large sets keeping in mind that the knowledge obtained is
useful at the same time also remembering that time for mining is constrained and the user
requires data as soon as possible. Some of the methods used to mine data include support
vector machines, decision trees, nearest neighbour analysis, Bayesian classification, and
latent semantic analysis.

With the problems associated with conventional data mining techniques, clever new ways to
overcome these were needed, and the application of AI techniques to the field resulted in a
very powerful hybrid of techniques. Evolutionary optimization techniques provided with a
useful and novel solution to these issues, and once data mining was enhanced with using EC
many of the previously mentioned problems were no longer big issues.

Some of applications of evolutionary algorithms in data mining, which involves human


interaction, are presented in this paper. When dealing with concepts that are abstract and
hard to define or cases where there are a large or variable number of parameters, we still do
not have reliable methods for finding solutions. For certain cases where we are unable to
quantify what we want to measure, for instance ‘beauty’ in images or ‘pleasantness’ in
music, we almost always require a human to drive the solutions through his choices. In these
situations we use a combination of Evolutionary computation along with data mining but
with a human sitting and interacting with the engine to steer the computation towards
solutions or answers he is looking for.

4
This paper begins by describing some concepts in data mining and general evolutionary
algorithms by giving relevant concepts and descriptions. In the later sections we discuss
some of the areas where these are implemented and lastly we give a few ideas of where
these techniques may be implemented in the future.

2 Overview of Data mining and Knowledge Discovery

Knowledge discovery and data mining as defined by Fayyad et al. (1996) is “the process of
identifying valid, novel, useful, and understandable patterns in data”. Data mining has
emerged particularly in situations where analysing the data manually or by using simple
queries is either impossible or very complicated (Cant´u-Paz & Kamath, 2001). Data mining is
a multi-disciplinary field that incorporates knowledge from many disciplines, mainly from
machine learning, artificial intelligence, statistics, signal and image processing, mathematical
optimization, and pattern recognition (ibid.).

Knowledge discovery and data mining consist of three main steps to convert a collection of
raw data to valuable knowledge. These three steps are data pre-processing, knowledge
extraction, and data post-processing (Freitas, 2003). The discovered knowledge should be
accurate, comprehensible, relevant and interesting for the end user in order to consider the
data mining process as successful (Cant´u-Paz & Kamath, 2001).

This section gives an overview of data mining pre-processing, data mining tasks, and the
conventional techniques for data mining.

2.1 Data Mining Pre-processing

The purpose of using data mining pre-processes is to eliminate the outliers, inconsistency
and incompleteness of data in order to obtain accurate results (Freitas, 2003). These pre-
processes are listed below:

Data cleaning: involves preparing data to the following process by removing


irrelevant data and as much noise as possible from the data. It is done to guarantee
the accuracy and the validity of the data.

Data integration: removes redundant and inconsistent data from data that is
collected from different sources.

Discretization: converts continuous values of attributes to discrete values e.g. for


the attribute Age we can set minimum value equal to 21 and maximum value equal
to 60.

Attribute selection: selects the relevant data to the analysis process from all the
data sets.

Data mining: after doing all the previous steps, data mining algorithms or
techniques can be applied to the data in order to extract the desirable knowledge.

5
2.2 Data Mining Tasks

It is very important to define the data mining task that the algorithm should address before
designing it for application to a particular problem. There are several tasks of data mining
and each of them has specific purposes in terms of the knowledge to be discovered (Freitas,
2002).

2.2.1 Models and Patterns

In data mining the term model “is a high level description of the data set” (Hand, 20001). A
model can be either descriptive or predictive. As the names imply, the descriptive model is
an unsupervised model that aims to describe the data, while predictive model is a
supervised model that aims to predict values from the data.

Patterns are used to define the important and interesting features of the data. Unusual
combination of purchased items in supermarket is an example of a pattern. Models are used
to describe the whole data set, while patterns are used to highlight particular aspects of
data.

2.2.1.1 Predictive Models

According to Kambert (2001), data analysis generally can be either in a classification


or a prediction form. Regression analysis is an example of prediction tasks, namely
numeric prediction. The difference between classification and regression is that the
target value (response variable) is a quantitative value in regression modeling, while
it is a qualitative or categorical value in classification modeling.

Classification Task

Some terms need to be introduced in order to describe classification tasks. The data
sets that the classification techniques or algorithms are applied to are composed of
a number of instances/objects. Each instance has a number of attributes, which
have discrete values. The records in databases tables, for example represent the
instances and the fields represent the attributes. In other words, each row
represents an object and the columns describe this object in terms of its attributes.
Classification is tasked with being able to extract the hidden knowledge from some
attributes values in form of patterns in order to predict the value of particular field
or attribute. This target value is known as the class (Dzeroski & Lavrac, 2001)

The inputs for the classification algorithm are data instances and the outputs are the
patterns that are used to predict the class that this instance belongs to. Here is an
example of classification rule (Freitas, 2002):

IF (a _given_set_of_conditions_ is _satisfied_by _an_instant) Antecedent part


THEN (predict_a_certian_class_for_that_instance) Consequent part

6
The data in the classification task is divided into two “mutually exclusive” data sets,
the training dataset and the testing dataset (Freitas, 2003). The training dataset is
used to build the classification model and the test dataset is used to evaluate the
predictive performance of the model. Overfitting occurs when the model is over
trained on the training dataset and it is simply “memorizing” it, which would result
in poor predictive performance on the testing dataset. By contrast, underfitting
occurs when the model is undertrained and did not learn well from the training
data. In underfiting situations, the model consists of number of rules that cover too
many training instances (ibid.).

Regression Task

In general, regression modeling is very similar to classification modeling, except that


the target value in regression modeling is a continuous or ordered value. In
statistics, regression analysis models the relation of response value (output or target
value of the predictor) to specified predictor values (input variable for the
predictor). Regression analysis can be linear or multiple regression. Both are used to
approximate the relation of a single response variable to a single continuous
predictor value in the former and multiple continuous predictor values in the latter
(Larose, 2006)

2.2.1.2 Descriptive model

Clustering Task

Clustering simply means grouping, placing data instances into different groups or
clusters such that instances from the same clusters are similar together and easily
distinguished from the instances that belong to the other clusters (Zaki et al., 2010).

Association Analysis Task

Association analysis refers to the process of extracting association rules from a data
set that describe some interesting relations hidden in this data set. For further
illustration, imagine the market basket transactions example, where we have two
items A and B and the following rule is extracted from the data: {A} -> {B}. This rule
suggests that there is a strong relation between item A and item B in terms of the
frequency of their occurrence together (Tan et al., 2006). This means if there is an
item A in the basket then there is a high probability that item B will be in the basket
as well.

7
2.3 Conventional Techniques of Data Mining

Several tools and techniques are available for data mining and knowledge discovery. These
techniques have been developed from two main fields: statistics and machine learning.
Multivariate analysis, logistic regression, liner discrimination, ID3, k-nearest neighbor,
Bayesian classifiers, principal component analysis, and support vector machines are
examples of these techniques. These techniques are designed to discover accurate and
comprehensible rules, but most of them are not designed to discover interesting rules
(Freitas, 2003).

Statistics and machine learning techniques are considered to be the most used techniques
for data mining but these techniques have some drawbacks. The models or rules discovered
using these techniques are not always optimal. This is due to their sensitivity to the noise in
the data set, which may cause them to overfit the data (Vafaie & Jong, 1994). They also tend
to generate models with a larger number of features than really necessary, which increases
the computational cost of the model (ibid.).Another drawback is they typically assume a
priori knowledge about the data set, which is not available in most cases. Statistical methods
have another problem that is that they assume linearity of the models and distribution of
the data (Terano & Ishino, 1996).

3 Evolutionary Algorithms and Data Mining

Evolutionary algorithms have several features that make them attractive for the data mining
process (Freitas, 2003; Vafaie & Jong, 1994). They are a domain independent technique,
which makes them ideal for applications where domain knowledge is difficult to provide.
They have the ability to explore large search spaces finding consistently good solutions. In
addition, they are relatively insensitive to noise, and can manage attribute interaction better
than the conventional data mining techniques.

Therefore, several works have been done, in recent years, to develop new techniques for
data mining using evolutionary algorithms. These attempts used evolutionary algorithms for
different tasks of data mining such as feature extraction, feature selection, classification,
and clustering (Cant´u-Paz & Kamath, 2001). The main role of evolutionary algorithms in
most of these approaches is optimization. They are used to improve the robustness and
accuracy of some of the traditional data mining techniques.

Different types of evolutionary algorithms have been developed over the years such as
genetic algorithms, genetic programming, evolution strategies, evolutionary programming,
evolution strategies, differential evolution, cultural evolution algorithms and co-evolutionary
algorithms (Engelbrecht, 2007). Some of these types that are used in data mining are genetic
algorithms, genetic programming and co-evolutionary algorithms. Genetic algorithms are
used for data preprocessing and for post processing the discovered knowledge, while
genetic programming is used for rule discovery and data preprocessing (Freitas, 2003).

This section will give a general overview of genetic algorithms, genetic programming, and
co-evolutionary algorithms, followed by an overview of different representation schemes,
genetic operators, and fitness evaluation for the purpose of data mining. Finally a brief
discussion of integrating conventional data mining techniques with evolutionary algorithms
is given.

8
3.1 Genetic Algorithms

Genetic algorithms are those that “have been originally proposed as a general model of
adaptive processes, but by far the largest application of the techniques is in the domain of
optimization” (Back et al., 1997). They consist of a population of individual solutions that are
acted upon by a series of genetic operators in order to generate new, and hopefully better,
solutions to a particular problem, and are inspired by natural evolution.
The term ‘genetic algorithm’ was coined in the early 70s by John Holland, who had been
working with systems that generate populations of potential solutions using natural
methods since the early 60s. In his paper “Outline for a Logical Theory of Adaptive Systems”,
Holland (1962) describes a system where a “generation tree” of populations is generated. By
applying a number of solutions (the population) to a number of problems (the environment),
the solutions that are able to successfully solve the problems are given reward/activation
scores, which enable solutions to be compared with one another, and the best of these are
used in the generation of the next branch of the generation tree.
Virtually all modern evolutionary systems have the same general stages:
• A random population of solutions is generated, to be used as the initial population
• The solutions within the population are evaluated to determine their ‘fitness’
• Solution pairs are first selected, based on their fitness, and then are combined to
create offspring, which are added to the next generation of the population
• Other genetic operators, such as mutation, are also applied to offspring

3.2 Genetic Programming

Genetic programming is a specific application of genetic algorithms, used to evolve


computer programs. The paradigm was named and developed by John Koza in the early 90s,
who initially used genetic programming to evolve LISP programs. (Koza, 1992) Whilst the
general premise of genetic programming is the same as the basic genetic algorithm –
selecting the fittest members of a given population and then crossing and mutating them.
The representation of the solutions, however, is radically different, which results in the need
for alternative crossover methods. Instead of representing solutions as chromosomes, which
are fixed length set where each gene has a specific meaning and a limited value, genetic
program represents programs as strings, which can grow infinitely longer. This means that
crossover cannot simply occur randomly anywhere in the program, as this would probably
simply break it, so more careful crossover algorithms need to be developed.

Figure 1: Examples of evolved LISP programs. Fitness calculated by number of outputs closer than 20% to
correct output. (Mitchell, 1998)

Koza realized that not only could genetic algorithms be used to evolve programs, but also
other complex structures, such as equations and rule sets. This is particularly useful when

9
looking at genetically evolving data mining techniques, since we can use the principles of
genetic programming in the evolution of rule constructs.

In data mining, genetic programming is considered as a more open-ended search technique


that can produce many different combinations of attributes. Hence it is very useful for
classification and prediction tasks(Freitas, 2003).

3.3 Co-evolutionary Algorithms

In co-evolutionary algorithms, two populations are evolved together, with the fitness
function involving the relationship with other individuals. In this algorithm, the individuals of
the two populations evolve through either competing against each other or through
cooperation with each other (Engelbrecht, 2007). The competitive approach is used to
“obtain exclusivity on a limited resource” (Tan et al, 2005), while a cooperative approach is
used to “gain access to some hard to attain resourse”(ibid.). In the competitive approach,
the fitness of an individual in one population is based on the direct competition with the
fitness of individuals in the other population. The cooperative approach, on the other hand,
the fitness of an individual in one population is based on the how much does it cooperate
with the individuals in the other population. Co-evolutionary approaches, particularly the
cooperative approach, can address some of the problems of evolutionary algorithms with a
single population, such as poor performance and convergence to local optima when dealing
with problems that have complex solution (Tan et al, 2005).

Several attempts have been made to apply co-evolutionary algorithms to the field of data
mining. One of them is the distributed evolutionary classifier for knowledge discovery in
data mining proposed by Tan et al (2005). In their approach, they use a cooperative
evolutionary algorithm to evolve two populations. The individuals of the first population
represent a single rule. Each individual of the second population represents a set of rules.
They validated their approach using six datasets. Their classifier preformed better than C4.5
classifier (a well-known algorithm for generating decision trees). The proposed co-
evolutionary approach reduces the computation time through sharing the workload among
multiple computers. It has also achieved a smaller number of rules for the rule set compared
with other classification techniques, which increases the comprehensibility of the
classification model. Moreover, it is more robust to noise in the data and has robust
predication accuracy.

Another approach that applies co-evolutionary algorithms to data mining is the co-
evolutionary system for discovering fuzzy classification rules developed by Mendes et al
(2001). They used two evolutionary algorithms in their system: a genetic programming
algorithm and an evolutionary algorithm to co-evolve two populations. The genetic
programming algorithm evolves a population of fuzzy rule sets and the evolutionary
algorithm evolves a population of membership function definitions. The advantage of using
the co-evolutionary process is the discovery of fuzzy rule sets and the membership function
definitions that are more adjusted to each other.

10
3.4 Representation and Encoding

The traditional method of encoding the genetic rules, which perhaps resembles most closely
the way evolution occurs in nature, is to use a direct representation scheme to encode the
population data as a series of bitstrings – a binary string representative of the genes, which
build each chromosome in a population. An 8-bit binary string would be representative of a
population whose genetic data consisted of 8 Boolean values, where each bit had some
specific meaning. For example, a system looking to design a new car could use its first bit to
represent whether or not the car has two or four doors, the second to represent whether it
has 3 or 4 wheels, the third to represent whether the car has a spoiler, etc. In this system, a
population member with a value 011xxxxx would represent a car with two doors, four
wheels and a spoiler. The key issue with this kind of representation, however, is that it
defines a very specific search space, with a set number of genetic ‘parameters’ and a very -
restricted set of values that each of these parameters can be.
A simple way to make the genetic algorithm considerably more powerful is to alter the
representation so that rather than being encoded as a set of Boolean values, it is stores a
number, such as an integer or a floating point value. This means that the search space
defined by the system described before could be hugely expanded, allowing for a far greater
number of genetic possibilities in its populous. For example, the second bit was
representative of whether the car had three or four wheels; by using an integer rather than
a Boolean we broaden the system so that it can represent a car with any number of wheels.
Expanding the representation in this way comes with an additional memory overhead – a
gene encoded with a 1-byte integer is 8 times the size of a binary gene – however, the
number of possible values leaps to 255, so the memory increase is a small cost to pay for
significant improvement to the representation.

Of course, not all genes need to be encoded in the same way; a chromosome can be
constructed by any combination of data types that best fit the space being represented. For
example, it would not make sense to represent whether a car has a spoiler or not with an
integer, as there are only two possibilities, so a Boolean would be sufficient.

3.4.1 Rules Representation

Classification is the most common application of evolutionary algorithms in data mining.


There are many techniques to perform classification task. Rule-based technique is preferred
over other classification techniques because rules are more comprehensible (Freitas, 2003).
There are two approaches to represent individuals when using evolutionary algorithms for
rule discovery: Michigan and Pittsburgh approach (ibidi.). In the Michigan approach each
individual represent a single rule, where in the Pittsburgh approach each individual
represent a set of rules. The Pittsburgh approach is more suitable for classification tasks
because the quality of the rule set will be evaluate as a whole, rather than the quality of a
single rule. On the other hand, the Michigan approach is more suitable for other kinds of
data mining tasks such as find a small set of high-quality prediction rules because each rule is
evaluated independently of the other rules (ibid.).

11
3.4.2 Fuzzy Logic Based Rules Representation

Fuzzy logic based rules are not only more readable by


humans, but are also easier to evolve than classic rule
types. Fuzzy rules use unary functions to classify
variables – for example, IS_LOW(Age) could be
equivalent to Age < 27. The advantage of using this
fuzzy representation is that it allows us to classify
variables without needing to know explicitly the range
of values that could be realistically expected, as they
are simply classified into three distinct groups, low,
medium and high, defined by the normalized values of Figure 2: Fuzzy membership
all the data contained in the database. An example of a classification (Walter, 2000)
fuzzy classification function is depicted in Figure 2,
where the x-axis denotes the normalized data values, and the y-axis denotes the degree of
membership in the fuzzy groups.

For our rule set, therefore, we will use just two binary operators, AND and OR, four unary
operators, NOT, LOW, MEDIUM and HIGH, and an integer value to represent each data
variable. We will assign a numerical value to each of these, so for instance values 0-5 could
represent the operators, and the variables could be 6+. We will use 1 byte binary
representations of these, which gives us up to 249 possible variables; if we require more
variables, we can simply chose to use a larger representation (i.e. 2 bytes gives us 65529
variables).
If we consider a sample rule, LOW(Age) AND
NOT(HIGH(Height) OR HIGH(Weight)), we can see how the
binarized form of the rule, 0000 0011 0110 0010 0001 0101
1000 0101 0111, can be parsed and understood (in this
example, we have used a 4 bit representation).

3.5 Genetic Operators


Figure 3
New population members can be evolved by applying a number of genetic operators in
order to combine existing chromosomes. There are two operators which mimic natural
methods of recombining genetic material: crossover, which merges two sets of
chromosomes in much the same way as sexual reproduction does between animals, and
mutation, where genes alter randomly, which happens in real life after organisms
reproduce.

3.5.1 Crossover

A common method for crossover is called one-point crossover (Rawlins, 1991). Bitstring
representations will be discussed for simplicity here, but whatever the data type used, the
methods do not vary. In one-point crossover, the two parents chromosomes are split in the
same place, and half of one set of genes is combined with the other half of the other. The
crossover point tends to be randomized each time a pair of chromosomes reproduce. As an

12
example, consider the two parent chromosomes 01100101 and 10011100. Since there are
eight genes in the chromosome, the crossover point can be anywhere between bits 1-2 and
7-8. Say the crossover point is between bits 3-4, the two halves of each parent will be
011|00101 and 100|11100. Depending on which way the parents are combined, the
offspring will be either 011|11100 or 100|00101.

Generalizing / Specializing Crossover

The purpose of this type of crossover is to generalize a rule when it is overfitting or


specialize the rule when it is underfiting the data (Freitas, 2003). If binary encoding is used,
the generalization crossover is done using logical OR and the specialization crossover is done
using logical AND.

Example of generalizing and specializing crossover, where the symbol “|” illustrates the
crossover points (ibid.).

Parents Offspring (generalizing Offspring (specializing


0 1| 0 1 | 11 crossover OR) crossover AND)
0 0| 1 0 | 10 0 1 | 1 1 | 11 0 1| 0 0 | 11
0 0 | 1 1 | 10 0 0| 0 0 | 10

3.5.2 Mutation

Mutation is a fairly simple operator, where bits are flipped to alter a chromosome’s genetic
makeup. The mutation rate affects how often these mutations occur – a system with a high
mutation rate will result in lots of mutated offspring. Mutation is necessary as it provides
renewable variety: it allows the system to explore solutions that may not be available by
recombination alone.

Generalizing / Specializing Mutation

Different mutation operators can be performed to generalize or specialize a rule. A simple


mutation for generalizing a rule can be done through deleting one of the conditions in its
antecedent part. In the opposite, adding a condition to the rule’s antecedent will be a
specialization mutation (Freitas, 2003). Another generalizing/specializing mutation operator
is done by subtracting or adding randomly generated value to “attribute-value conditions”
(ibid.). Example if the condition is (years_ of_ experience > 20), then subtracting a randomly
generated value from this condition (e.g. years_ of_ experience > 10) will be a generalizing
mutation and adding a randomly generated value (e.g. years_ of_ experience > 25) will be a
specializing mutation.

3.5.3 Fuzzy logic Operators

With fuzzy logic representation, we need to come up with new ways to cross and mutate the
individuals in our population, ensuring that the rules are still valid within the structure of the
grammar.

13
Mutation

Mutation is a simple enough process: we can interchange the binary functions AND and OR,
leaving a syntactically correct rule; we can add or remove NOT before any of the operators,
whether unary or binary; and we can substitute any of the fuzzy classification functions
LOW, MEDIUM or HIGH with one another.

Crossover

Crossover, however, is a more difficult problem. One point crossover is not a suitable
method here, since it can result in syntactically incorrect rules. Figure 3 shows the result of
crossing the rule shown in Figure 2 with another rule, NOT( (MEDIUM(Age) OR LOW(Age))
OR (LOW(Height)) ) at a random point.

Rule 1: 0000 0011 0110 0010 0001 0101 | 1000 0101 0111
Rule 2: 0010 0001 0001 0100 0110 | 0011 0110 0011 1000
Rule 3: 0000 0011 0110 0010 0001 0101 | 0011 0110 0011 1000

Figure 4

As you can see, crossing at this point has cut a binary operator in half, resulting in a rule that
cannot be parsed. For this reason, it is important to look more carefully at the crossover
points. Crossover may not occur after a fuzzy classification function, however, it may occur
at any point where an AND, OR, or NOT branch. In addition to this method of merging rules,
other systems have used a method of occasionally simply combining rules using ‘AND’ or
‘OR’. This can be a useful technique if used infrequently, so we can use this combination
method 10% of the time, and the rest use the merging method (Walter, 2000).

14
3.6 Fitness Evaluation

Each member of the population in an evolutionary system will have a fitness level, which is
defined by how effective the solution is deemed to be at solving a particular problem. The
general aim of any genetic algorithm is to adapt the parameters of its population in order to
evolve solutions with maximal fitness. Fitness functions can be extremely complicated, as
they require some method for quantitatively and qualitatively evaluating solutions where
often the knowledge of what makes a good solution is not known.

In data mining, the fitness function is used to evaluate the fitness of the prediction rules. As
mentioned earlier in this paper, prediction accuracy, comprehensibility and interestingness
represent the quality criteria of the discovered rules and can be used to measure their
fitness. Two main types of fitness functions are used in data mining to evaluating the fitness
of an individual: objective and subjective fitness evaluation.

A major issue with using evolutionary algorithms in data mining that needs to be considered
when designing the fitness function is the interesting of the discovered rules. Because
evolutionary algorithms are powerful techniques that can perform global search and
generate huge numbers of rules. However, these rules can be trivial and not interesting.

3.6.1 Objective Fitness Evaluation

This section discusses the objective or the quantitative approaches to evaluate the fitness of
discovered rules (Freitas, 2003). Different approaches have been proposed to design
effective objective fitness functions. These approaches will be organized according to the
quality criteria of discovered rule mentioned earlier. The examples used to illustrate these
approaches are represented using Michigan scheme, which means each individual
represents a single rule.

Prediction Accuracy Criteria

One of the approaches to measure the prediction accuracy of a rule is to use confidence
factor CF (Freitas, 2003). Suppose that the rule to be evaluated is as follow:

IF A THEN B.

Then the confidence factor can be calculated as:

CF=|A&B|/|A|

Where |A| represents the number of the instances in the data that satisfy all the conditions
in the antecedent part A of the rule and |A&B| represents the number of the instances in
the data that satisfy all the conditions in A and are classified to be of class B (Freitas, 2003).

Here is an example of how to calculate CF. IF |A|= 100 and |A&B|=60, then CF will be 60%
which can give an insight of how accurate the rule is. Therefore, the higher the rule accuracy
in the training set, the more likely that it will be selected. This is a very simple approach to
define the prediction accuracy, but one obvious drawback of such approach is that it most
likely to overfit the data which would results in poor prediction performance on the testing
data set.

15
Another approach mentioned in (Freitas, 2003) is to use a confusion matrix. This matrix is a 2
x 2 matrix used to describe the “predictive performance” of the rule.

Recall the previous rule example:


IF A THEN B.

The confusion matrix of this rule is:

Predicted class Actual Class


B Not B
B TP FP
Not B FN TN

Where:

“TP = True Positives = Number of examples satisfying A and B


FP = False Positives = Number of examples satisfying A but not B
FN = False Negatives = Number of examples not satisfying A but satisfying B
TN = True Negatives = Number of examples not satisfying A or B” (Freitas, 2003).

Using this matrix, CF can be computed as following:

CF=TP/ (TP+FP).

One important advantage of the previous approach is that it introduces a measurement of


the rule completeness “Comp”:
Comp=TP/ (TP+FN).

Now the rule fitness can be calculated as follow:

Fitness=CF*Comp

Comprehensibility Criteria

Fitness function can be extended in order to cover the comprehensibility criteria as follow
(Freitas, 2003):

Fitness= w1* (CF*Comp.)+w2*Simp.

Where W1 and W2 are user defined weights. Simp refers to the simplicity measurement of a
rule. One obvious way to measure the simplicity is to compute the number of the conditions
in the rule. The smaller the number of conditions the simpler the rule is.

For data mining approached that uses genetic programming, the simplicity of a rule can be
measured by counting the number of nodes. Possible method to measure the rule simplicity
mentioned in (Freitas, 2003), is to define a maximum number of nodes in a tree (individual)
and then calculate the simplicity as follow:

16
Simp= (MaxNodes -0.5 NumNodes – 0.5)/(Maxnodes -1)

Interestingness Criteria

Noda et al (1999) have proposed a fitness function that composed of two parts: the first part
is to measure the degree of interestingness and the second part to measure the predictive
accuracy. The degree of interestingness part also consists of two parts. Users are supposed
to set the weights of the degree of interestingness and the predictive accuracy parts.

Another measurement of interestingness of rule has been introduced by Piatetsky-Shapiro


cited in (Gebhardt, 1991) which is called PS measure:

PS = |A&B| - |A||B|/N.

According to Piatetsky-Shapir summarized in (Gebhardt, 1991), there are three principles for
rule interestingness (RI) measures:

• RI = 0 if |A & B| = |A| |B| / N, when the antecedent and the consequent of the rule
are statistically independent.

• RI monotonically increases with |A&B| when other parameters are fixed, namely
|A| and |B|. In this case the CF and Comp factors are increased also which means
more interesting rule.

• RI monotonically decreases with |A| or |B| when other parameters are fixed,
namely |A&B|. In this case the CF and Comp factors are decreased also which
means less interesting rule.

3.6.2 Subjective Fitness Evaluation (Interactive Evolutionary Algorithms)

Writing an appropriate objective fitness function can be a very hard task. This is particularly
true for situations where the domain knowledge or a prior knowledge is not available, which
makes the decision of what is considered as an interesting knowledge difficult. In such cases,
a subjective fitness function evaluation could be very useful. Subjective fitness evaluation is
done by human experts.

In data mining, domain experts evaluate the fitness of the discovered rules according to
their interesting feature. Rules can be interesting if they are unexpected and actionable for
the user (Liu et al. , 1997). In many different domains, however, knowledge about the
domain data can vary from one user to another. User’s prior knowledge of the domain can
be either general impression (GI) when the user has feelings about the domain or it can be
reasonably precise knowledge (RPK) when the user has definite idea. Generally, discovered
rules are evaluated and ranked against these two types of concept (ibid.).

A major problem with data mining is that the discovered models do not necessarily contain
important or interesting rules. They sometime include trivial rules or even worse they can
have counterintuitive rules (Pazzani, 2002). Some of the previous attempt to address this
problem is to interact with the domain experts to evaluate the models and to find what is

17
interesting and important. The found model, then, will be adjusted according to the
feedback from the domain experts (for example by adding or removing variables) until an
acceptable model is found (ibid). Subjective fitness evaluation and Interactive evolutionary
algorithms accelerates this process and probably generates more interesting rules by
involving the domain expert in the search process to bias the search toward models that are
more novel and comprehensible.

Figure 5 (Pazzani, 2002)

Using subjective function evaluation in data mining offers many opportunities for future
research. For example, Pazzani’s idea (2002) illustrated in Figure 5 could be accomplished
through the use of subjective fitness evaluation. In his idea, he describes how the different
fields of artificial intelligence, statistics, data base and cognitive psychology should be
combined together to improve the performance of the multi-disciplinary field of data
mining. Interactive evolutionary algorithms can allow the use of cognitive psychology in
developing tools and techniques for data mining and knowledge discovery through involving
the human cognitive process into the search for interesting patterns and the discovery of
new knowledge from data sets.

3.7 Selection and Replacement

Selection is the process of choosing which individuals in the population to use for
reproduction, and replacement is the process of selecting which individuals in the
population will go through to the next generation. Whilst selection and replacement can use
the same methods almost interchangeably, they do not both need to be implemented the
same way in a particular genetic system: a system may use the ‘roulette wheel’ method for
selection, and the ‘absolute’ method for replacement. There are a number of different
methods for making these selections:

Absolute
The n fit individuals in the population are chosen for breeding, or the n least fit
individuals are replaced. Whilst this seems like a good strategy, it can result in losing
individuals that, while being less fit, hold genetic material that could be useful in

18
evolving even better strategies. It is important to keep a mix or genetic material
within the population to stop the solutions converging prematurely at local optima.

Random
The opposite of absolute selection is random selection. Here, no regard is given at all
to the fitness – all individuals are selected with uniform probability. Whilst this
method does preserve variety, it also means that it can take a very long time to find
a good solution, and if it is used as a replacement strategy, good solutions have as
much chance of being overlooked as bad ones, meaning that good solutions may
never develop.

Roulette Wheel
By looking at the previous two methods, we can see that whilst it is important to
focus on the individuals with higher fitness levels, we also need to ensure that we do
not throw away potentially useful solutions. The roulette wheel method addresses
this by picking randomly, but in proportion to the fitness of the individuals, so that
very fit individuals have a higher chance of being selected for breeding, and less fit
individuals have a higher chance of being replaced.

3.8 Integrating Conventional Techniques with Evolutionary Algorithms

Several hybrid approaches have been proposed that integrate evolutionary algorithms with
one of the conventional techniques to tackle some of the problems with the conventional
techniques such as minimizing the number of selected features and selecting more
interesting features. One of the successful attempts to integrate evolutionary algorithms
with data mining is the approach developed by Terano and Ishino (1996). Their approach
integrates evolutionary algorithm with one of the machine learning data mining techniques,
namely inductive learning technique that generates decision trees. They used the inductive
learning algorithm to find rules from the data, then they used interactive evolutionary
algorithm to refine these rules. Their work will be discussed in greater depth in the following
section.

4 Applications of Data Mining Using IEA

In this section we examine some areas where data mining with interactive evolutionary
algorithms IEA techniques has been successfully applied.

The first approach detailed is very general in terms that it can be used to classify any text
based data and hence is not limited to any specific discipline. The approach requires textual
data in the form of reports, which can be just normal text files corresponding to the
database for which the knowledge needs to be extracted.

4.1 Extracting Knowledge from a Text Database

This technique proposed by Sakurai (2001) details a means to extract knowledge from any
database with the help of domain dependent dictionaries. The particular application in the
paper deals with text mining from daily business reports generated by some institution and

19
classification of the reports based on some knowledge dictionaries. In their experiment, two
kinds of knowledge dictionaries were used, one is called the key concept dictionary, and the
other is the concept relation dictionary.

The daily business reports generated from any source are decomposed into words using
lexical analysis and the words are checked for entry in the key concept dictionary. All reports
are then classified with particular concepts; according to the words in the report, which
represent the concept in the key concept dictionary. Also each report is then checked if its
key concepts are assigned in the concept relation dictionary. Reports are then classified
according to the set of concept relations, and reports having the same text class are put into
the same group. This facilitates the end users as they can read only those reports, which are
put into groups with topics matching their interests; also it gives them and indication of the
trends of topics in reports.

The key concept dictionary contains concepts having common features, concepts and
related keywords, and expressions and phrases concerned with the target problem. An
example of the key concept dictionary can be seen in the figure below concept relation
dictionary contains a relation, which describes a condition and a result. This is a mapping
from key concepts to classes. Since creating a dictionary is time consuming and prone to
errors the paper describes an automatic way of creating a concept relation dictionary.

Figure 6 (Sakurai et al., 2001)

The relation in concept relation dictionary is like a rule and can be acquired by inductive
learning if training examples are available, to do so words are extracted from the document
by lexical analysis and these words are checked if they match a expression in key concept
dictionary. Thus we have the following assumptions, concept classes are attributes, concepts
are values and test classes given by the reader are the result classes we want, this forms a
training example. Also for all those attributes, which do not have values, 0 is assigned. An
overview of this is clearly depicted in the figure below

20
Figure 7 (Sakurai et al., 2001)

For the inductive learning to work we need a fuzzy algorithm, as reports, which are written
by humans, are not strict in accordance with descriptions. Thus the method described for
the learning is the IDF algorithm, which is a fuzzy algorithm. This algorithm makes rules from
the generated training examples and the rules, which are generated, have the genotype of a
tree.

21
The whole process can be seen in figure 8 below which shows the inputs, and the processes,
which go into getting the final outputs from the input dictionaries and data.

Figure 8 (Sakurai et al., 2001)

The algorithm was tested on daily reports for a business concerning retail sales into 3 classes
concerned with describing a sales opportunity as best, missed or other. The key concept
dictionary was composed of 13 concept classes and each concept class has its subset of
concepts. Those reports which contained contradicting descriptions were regarded as
unnecessary and training example from them were not generated. And the results showed
that by using 10 fold cross validation they were successfully able to generate the concept
relation dictionary and obtain better results than IDF on the reports generated for retailing.

4.2 Extracting Marketing Rules from User Data

Since marketing decisions require optimum rules from customer data, which can be really
noisy, Simulated breeding and inductive learning methods have been tested to create such
rules, which have been able to generate simple and easy to understand results in the form
which can be used directly by the marketing agent.

This work has been developed by Terano and Ishino (1996). The conventional method to
solve the problem of generating efficient decision making rules was to use statistical
methods but these prove to be weak since they assume that the mining data is based on
linear models. Multivariate analysis, which is popularly used, fails to satisfy the need for

22
both quantitative as well as qualitative analysis of data. AI techniques on the other hand
focus on the problem of feature selection, which is based on machine learning and aims to
find the optimal number of features to describe the target concept. This does not work for
the current problem hence we cannot apply well-known standard techniques to choose the
appropriate features.

Hence the smart way proposed by the authors is to use both simulated breeding and
inductive learning techniques. The Inductive learning is used to generate the decision rules
from data to give emphasis on relationship between product and feature, while simulated
breeding to get the effective features. This work was the first of its kind that specifically
address the problem of clarifying the relationship between the product image and features
using user questionnaire data.

Simulated breeding is a GA based technique to evolve offspring. The offspring, which are
judged by human expert to have some, desired features are allowed to breed. The judgment
is done interactively. It is used in cases where fitness function is hard to define. Inductive
learning is used to generate the rules in the form of a decision tree as output for the analysis
of features and attribute value pairs. This specific implementation used C4.5.

Since marketing decisions must be made by analysts who need to make promotion
strategies for their product according to an abstract image of their product. The things they
need to keep in mind are that the data gathered from users is inherently noisy and the data
is based on complicated models hence simple rules are needed to explain the characteristics
of the products. Also the features of the product to realize the image are left on intuition of
the experts and there is no clear way to do this. So, the information needs to be organized in
a clear manner to understand the relationship between the feature and image of the
product.

The algorithm proposed consists of the following steps:


1. Inductive learning to classify data
2. Genetic operators to enhance flexibility of feature selection
3. Decision tree selection based on human judgment
4. Developing decision trees with small number of features which fully explain data

Automation of offspring selection is not done to promote human creativeness, which


incorporates appropriate explanations, and also as our problem needs subjective judgment
which makes it really very hard to define a formal fitness function.

This analysis was carried out on oral care products and 2300 users filled the questionnaire
used, the knowledge obtained was tested by a domain expert at the manufacturing
company. The domain expert must know basic principles if IL , stats , and must understand
outputs obtained. Using the outputs of decision trees she interactively evaluates the quality
of the obtained knowledge.

23
4.3 Fraud Detection Using Data Mining and IEA Techniques

An interesting application of Genetic Programming and rule based data mining can be seen
in the work done by Bentley (2000) where a system is designed to analyze data provided by
a bank and discover the cases of fraud; in this particular case, for insurance applications.

The logic that goes into this paper stems from a very pressing issue that is the increase in
fraud in all forms of financial institutions. For a large bank this is typically hard to handle as
the number of fraudulent cases is masked by the other large number of true applicants and
hence they just slip under the hands. An effective method is needed to find such cases from
huge amounts of data where data mining comes in. The evolutionary computation
techniques are used to generate certain rules, which might be the underlying explanations
for fraud cases.

In this experiment first the data was clustered into 3 segments, which then correspond to
the domains of the rule generation membership functions. These functions give the “degree
of membership” of the input data into fuzzy logic sets of “LOW”, “MEDIUM” and “HIGH”. A
GP is used for the purpose of evolving rules and the representation of a rule is done in the
form of a tree, where each tree will correspond to a particular rule. After a set of rules has
been generated, they are evaluated by an expert system and assigned a score before they
are applied on training data. This is the data for which the bank has certified the number of
fraud cases and this is used to generate rules, which can be accurate to describe fraud cases.
The fitness function checks the scores and describes different fitness values with a key
objective to ensure that there are as less as possible of misclassified items, differentiate
between “suspicious” and “unknown” classes ensuring that “suspicious” are given more
relevance and finally ensuring that the rule generated is concise and yet understandable.

One single run of GP generated one rule which might not classify all suspicious items hence
it is run several times until all suspicious items are classified and therefore we get more than
one rule. Any of them, which misclassified a number of claims, is removed from the final set.

Now comes the role of the human interaction in the process, since the variables in our
evolutionary system are large each can have an effect on our outcome and there is no single
selection of settings, which will classify every data set correctly. Cluster size choices,
membership function choices, rule interpreters, fitness functions, GA settings, etc all of
them can be tweaked. Therefore to help the human decision maker four versions of this
system with different setting are run in parallel all the results generated are presented. The
human has the task to select the best results from the four by performing a series of task, to
find the most accurate, intelligent and, accurate and intelligent rule sets. Then this rule set is
finally evaluated on the global data and if need be the settings of our parameters needed to
be adjusted by the human to generate better rules.

With this setting and data obtained from a bank the research team was able to get results
up to an accuracy of 60% which is impressive as the training data which contained reported
incidents of frauds was less and also was spread over a number of years while the data to be
tested was for the past couple of months and the percentage of suspicious items was
unknown.

24
4.4 Some current work being done

This is a brief glimpse of research going on at the TAKAGI lab, Kyushu University that
involves interactive evolutionary computation and data mining from different sources.

Constructing image or music feature space and impression space. Neural networks are used
to learn mapping from features to impressions and search a point in feature space from
impression space. By doing this the aim is to retrieve images or music based on human
impression.

Investigation of diagnostic data by measuring psychological dynamic range, such as happy,


sad, using IEC of mental illness patient (with a therapist as the trained individual) .

5 Conclusion and Future Work

In this paper we have discussed the use of different evolutionary algorithms in data mining
and knowledge discovery field. The main motive behind using evolutionary algorithms in
data mining is their attractive features that enable them to resolve some of the drawbacks
in conventional data mining techniques and enable them to discover novel solutions, such as
their robustness when dealing with noisy data, and their ability to interpret data without any
a priori knowledge. The difficulty of discovering novel and interesting knowledge is one of
the main issues in data mining, and interactive evolutionary algorithms have been used to
address this problem. Interactive evolutionary algorithms provide a promising research area
for data mining and knowledge discovery, and there are a wide range of applications that
use evolutionary algorithms in data mining; a number of which have been presented in this
paper.

Possible Future Applications

In this section we propose some applications where Data mining along with IEA methods can
be fruitful to implement and it is hoped that this will become evident in the near future.

Generating effective strategies for Share/Stock market data analysis

For the analysis of stock market data at any given point the number of variables to be taken
into account can be enormous and often each of these variables in turn can have a lot of
different choices. An expert needs to make a decision on what subset of these variables he
must consider before trying to make any assumptions. An example of a variable is selection
of trading rules which are constrains or choices which define when to buy sell or hold a
stock. Several rules are available like Filter rules, moving averages, support and resistance,
abnormal return, etc. Another variable, which is more evident in share market, is the effect
of other stocks on the stock being considered. The number of other stocks to consider, etc.
hence it is almost never true that at a given time a strategy will fit all situations. We propose
that in the future we will see implementations where humans will specify the fitness
functions by selection the best rules for a given situation by specifying the number and types
of variables to be chosen at any given time and then test the effectiveness by deploying
them.

25
Intrusion Detection in Networks or Websites

Another application, which we propose might be evident, sometime soon is intrusion


detection in networks or websites. Taking some idea from the approach proposed by Li
(2004), assume that the intrusion detection systems of the future will use interactive
techniques to enhance their already implemented systems, which use genetic algorithms,
which are used to define rules based on the network traffic activity. The system described in
the paper is such that GA is capable of generating rules which classify traffic activity as
suspicious, normal or other. Almost all suspicious activity is added to the IDS to prevent
further attacks of that kind and for any such system in use no human intervention would be
required, but there is a shortcoming in this technique mentioned which does not catch any
novel attacks. For a network administrator it is these novel and unique attacks, which can
cause the most harm as these lie in the unknown range and are unlooked upon. We assume
that in our interactive system the suspicious category will be exclusively handles by the GA
and for the purpose of unknown cases a network administrator will analyze the traffic and
make subsequent changes to the GA and hence the IDS so that newer attacks are being
added into the system. The human in charge will look at the novel attack and modify the GA
to include rules which would be able to capture such attacks in the future by modifying the
fitness function accordingly.

26
6 References

Zaki, M. J., Yu, J. X., Ravindran, B., & Pudi, V. (Eds.). (2010). Advances in Knowledge
Discovery and Data Mining, Part I, Proceedings of the 14th Pacific-Asia Conference
(PAKDD 2010). Springer.

Walter, D. a. (2000). ClaDia: A Fuzzy Classifier System for Disease Diagnosis.


Proceedings of the Congress on Evolutionary Computation .

Vafaie, H., & Jong, K. D. (1994). Improving a rule induction system using genetic
algorithms. In R. S. Michalski, R. S. Michalski, & G. Tecuci, Machine learning: a
Multistrategy Approach (pp. 453-470). San Francisco, CA: Morgan Kaufmann.

Back, T., Hammel, U., & Schwefel, H.-P. (1997). Evolutionary Computation:
Comments on the History and Current State. IEEE Transactions on Evolutionary
Computation, 1 (1) , 3-17.

Bentley, P. J. (2000). “Evolutionary, my dear watson” investigating committee-based


Evolution of fuzzy rules for the detection of suspicious insurance claims. Genetic and
Evolutionary Computation Conf. (GECCO-2000). Morgan Kaufmann.

Cant´u-Paz, E., & Kamath, C. (2001). On the use of evolutionary algorithms in data
mining. In H. A. Abbass, R. A. Sarker, & C. Sincla (Eds.), Data mining: a heuristic
approach (pp. 48-71). Idea Group Inc .

Engelbrecht, A. P. (2007). Computational Intelligence: An Introduction (2nd ed.).


Sussex: Wiley.

Dzeroski, S., & Lavrac, N. (2001). Relational Data Mining. Secaucus, NJ: Springer.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in
Knowledge Discovery and Data Mining. Melno Park,Calif: The MIT Press.

Freitas, A. A. (2003). A Survey of Evolutionary Algorithms for Data Mining and


Knowledge. In A. Ghosh, & S. Tsutsui, Advances in Evolutionary Computing: Theory
and Applications (pp. 819-846). New York, NY: Springer-Verlag.

Freitas, A. A. (2002). Data Mining and Knowledge Discovery with Evolutionary


Algorithms. Berlin: Springer-Verlag.

Gebhardt, F. (1991). Choosing among competing generalizations. Knowledge.


Knowledge Acquisition , 3 (4), 361-380.

Hand, D. M. (20001). Principles of Data Mining. MIT Press.

Holland, J. (1962). Outline for a Logical theory of Adaptive Systems, 9 (3). Journal of
the ACM , 297-314.

27
Kambert, J. H. (2001). Data Mining: Concepts and Techniques. San Francisco: Morgan
Kaufmann.

Koza, J. (1992). Genetic Programming: On the Programming of Computers by Means


of Natural Selection. Massachusets: The MIT Press.

Larose, D. T. (2006). Data Mining: Methods and Models. New York: Wiley
Interscience, Inc.

Liu, B., Hsu, W., & Chen, S. (1997). Using general impressions to analyze discovered
classification rules. Knowledge Discovery & Data Mining, (pp. 31-3).

Li, W. (2004). Using Genetic Algorithm for network intrusion detection. United States
Department of Energy Cyber Security Group 2004 Training Conference, (pp. 24-27).
Kansas City, Kansas.

Noda, E., Freitas, A. A., & Lopes, H. S. (1999). Discovering interesting prediction rules
with a genetic algorithm. Conference on Evolutionary Computation 1999 (CEC-99),
(pp. 1322-1329). Washington D.C.

Mendes, R. R., Voznika, F. d., Freitas, A. A., & Nievola, J. C. (2001). Discovering fuzzy
classification rules with genetic programming and co-evolution. In D. Raedt, & S. A.
Luc (Eds.), Principles of Data Mining and Knowledge Discovery (Vol. 2168, pp. 314-
325). Heidelberg, Berlin: Springer-Verlag.

Mitchell, M. (1998). An Introduction to Genetic Algorithms. Massachusetts: MIT


Press.

Pazzani, M. J. (2002). Knowledge discovery from data? Intelligent Systems and their
Applications, IEEE , 15 (2), 10-12.

Sakurai, S., Ichimura, Y., Suyama, A., & Orihara, R. (2001). Acquisition of a knowledge
dictionary for a text mining system using an inductive learning method. IJCAI 2001
Workshop on Text Learning: Beyond Supervision, (pp. 45–52).

Rawlins, G. J. (1991). Foundations of Genetic Algorithms. California: Morgan


Kaufmann Publishers, Inc.

Tan, K. C., Yu, Q., & Lee, T. H. (2005). A distributed evolutionary clasifier for
knowledge and discovery in data mining. IEEE Trans. on Systems, Man, and
Cybernetics: Part C - Applica- tions and Reviews , 35 (2), 131-142.

(2006). Association analysis basic concept and algorithms. In P.-N. Tan, M. Steinbach,
& V. Kumar, Introduction to data mining. Pearson Addison Wesley.

28
Terano, T., & Ishino, Y. (1996). Knowledge acquisition from questionnaire data using
simulated breeding and inductive learning methods. Expert Systems with
Applications , 11 (4), 507-518.

29

You might also like