You are on page 1of 7

IEEE - 31661

A Study On Classification Techniques in Data


Mining
G.Kesavaraj#, Dr.S.Sukumaran*
# Assistant Professor,Department of Computer Applications,Vivekanandha College Of Arts And Sciences For
Women,kesavaraj2020@gmail.com
*Associate professor,Department of Computer Science,Erode Arts and Science College,Erode.
prof_sukumar@yahoo.co.in

Abstract: Data mining is a process of inferring classification task begins with a data set in which the
knowledge from such huge data. Data Mining has class assignments are known. For example, a
three major components Clustering or Classification, classification model that predicts credit risk could be
Association Rules and Sequence Analysis. By simple developed based on observed data for many loan
definition, in classification/clustering analyze a set of applicants over a period of time. In addition to the
data and generate a set of grouping rules which can historical credit rating, the data might track
be used to classify future data. Data mining is the employment history, home ownership or rental, years
process is to extract information from a data set and of residence, number and type of investments, and so
transform it into an understandable structure. It is on. Credit rating would be the target, the other
the computational process of discovering patterns in attributes would be the predictors, and the data for
large data sets involving methods at the intersection each customer would constitute a case.
of artificial intelligence, machine learning, statistics, Classifications are discrete and do not imply order.
and database systems. The actual data mining task is Continuous, floating-point values would indicate a
the automatic or semi-automatic analysis of large numerical, rather than a categorical, target. A
quantities of data to extract previously unknown predictive model with a numerical target uses a
interesting patterns. Data mining involves six regression algorithm, not a classification algorithm.
common classes of tasks. Anomaly detection, The simplest type of classification problem is binary
Association rule learning, Clustering, Classification, classification. In binary classification, the target
Regression, Summarization. Classification is a major attribute has only two possible values: for example,
technique in data mining and widely used in various high credit rating or low credit rating. Multiclass
fields. Classification is a data mining (machine targets have more than two values: for example, low,
learning) technique used to predict group medium, high, or unknown credit rating. In the model
membership for data instances. In this paper, we build (training) process, a classification algorithm
present the basic classification techniques. Several finds relationships between the values of the
major kinds of classification method including predictors and the values of the target. Different
decision tree induction, Bayesian networks, k-nearest classification algorithms use different techniques for
neighbor classifier, the goal of this study is to finding relationships. These relationships are
provide a comprehensive review of different summarized in a model, which can then be applied to
classification techniques in data mining a different data set in which the class assignments are
unknown. Classification has many applications in
Keywords- classification algorithms, Decision customer segmentation, business modeling,
tree, KNN, Support vector machine, Data Mining. marketing, credit analysis, and biomedical and drug
response modeling. Data classification is defined as
I.INTRODUCTION two-step process shown in figure 1.

Classification is used to classify each item in a set


of data into one of predefined set of classes or
groups. The data analysis task classification is where
a model or classifier is constructed to predict
categorical labels (the class label attributes).
Classification is a data mining function that assigns
items in a collection to target categories or classes.
The goal of classification is to accurately predict the
target class for each case in the data. For example, a Figure: 1 illustrating classification task.
classification model could be used to identify loan
applicants as low, medium, or high credit risks. A

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India
IEEE - 31661

Step 1: A classifier is built describing a Step2: If all instances in C are in class P, create a
predetermined set of data classes or concepts. node P and stop, otherwise select a feature or
(This is also known as supervised learning). attribute F and create a decision node.
Step 2: Here, the model is used for classification. Step3: Partition the training instances in C into
First, the predictive accuracy of the classifier is subsets according to the values of V.
estimated. (This is also known as unsupervised Step4: Apply the algorithm recursively to each of
learning). the subsets C.
The commonly used methods for data mining These algorithms usually employ a greedy strategy
classification tasks can be classified into the that grows a decision tree by making a series of
following groups.1.Decision tree induction methods, locally optimum decisions about which attribute to
2.Rule-based methods, 3.Memory based learning, 4. use for partitioning the data. For example, Hunt's
Neural networks, 5.Bayesian network, 6.Support algorithm, id3, c4.5, cart, sprint are greedy decision
vector machines. tree induction algorithms.

II.DECISION TREE INDUCTION A. Hunt's Algorithm


Hunt's algorithm grows a decision tree recursively
Decision tree learning is a method commonly used
by partitioning a training data set into smaller, purer
in data mining. The goal is to create a model that
subsets. This algorithm contains two steps in order to
predicts the value of a target variable based on
construct a decision tree.
several input variables. Each interior node
Step 1: which is the terminating step for the
corresponds to one of the input variables; there are
recursive algorithm, checks if every record in a node
edges to children for each of the possible values of
is of the same class. If so, the node is labeled as a leaf
that input variable. This is illustrating in Figure 1.
node with its classification the class name of all the
Each leaf represents a value of the target variable
records within.
given the values of the input variables represented by
Step 2: If a node is not pure then selects/creates an
the path from the root to the leaf.
attribute test condition to partition the data into two
Decision tree induction algorithms are function
purer data sets. From here a child node is created for
recursively. First, an attribute must be selected as the
each subset.
root node. In order to create the most efficient (i.e.,
The algorithm recurses until the all leaf nodes are
smallest) tree, the root node must effectively split the
found. A common means of deciding which attribute
data. Each split attempts to pare down a set of
test condition should be used is the notion of the
instances (the actual data) until they all have the same
"best split." This concept boils down to choosing the
classification. The best split is the one that provides
test condition that results in subsets that are purer
what is termed the most information gain.
(where purity is richer when the set contains records
of the same class). Three common formulas for
calculating impurity is entropy, gini, and
classification error.

B. (Iterative Dichotomiser 3) Algorithm


ID3 (Iterative Dichotomiser 3) is an algorithm
invented by Ross Quinlan used to generate a decision
tree. ID3 is the precursor to the C4.5 algorithm.
The ID3 algorithm can be summarized as follows:

Take all unused attributes and


count their entropy concerning test samples
Choose attribute for which entropy
Fig 2: Decision tree induction
is minimum (or, equivalently, information
gain is maximum)
Make node containing that attribute
The basic algorithm for decision tree induction is a
greedy algorithm that constructs decision trees in a C. C4.5 Algorithm
top-down recursive, divide-and-conquer manner. The C4.5 is an algorithm used to generate a decision
Tree induction algorithm, summarized as follows. tree developed by Ross Quinlan.[1] C4.5 is an
Step1: The algorithm operates over a set of extension of Quinlan's earlier ID3 algorithm. The
training instances, C. decision trees generated by C4.5 can be used for

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India
IEEE - 31661

classification, and for this reason, C4.5 is often r2: (give birth = no) ∧ (live in water = yes) →
referred to as a statistical classifier. fishes
In pseudocode, the general algorithm for building r3: (give birth = yes) ∧ (blood type = warm)
decision trees is: → mammals
r4: (give birth = no) ∧ (can fly = no) →
1. Check for base cases reptiles
2. For each attribute a
3. Find the normalized information gain from iv. memory based learning
splitting on a memory-based learning is a family of learning
4. Let a_best be the attribute with the highest algorithms that, instead of performing explicit
normalized information gain generalization, compare new problem instances with
5. Create a decision node that splits on a_best instances seen in training, which have been stored in
6. Recurse on the sublists obtained by splitting memory. it is called instance-based because it
on a_best, and add those nodes as children of node constructs hypotheses directly from the training
instances themselves. this means that the hypothesis
D. RndTree (Random Forest) complexity can grow with the data. in the worst case,
Random Forests grows many classification trees. a hypothesis is a list of n training items and the
To classify a new object from an input vector, put the computational complexity of classification a single
input vector down each of the trees in the forest. Each new instance is o(n). one advantage memory-based
tree gives a classification, and we say the tree "votes" learning has over other methods of machine learning
for that class. The forest chooses the classification is its ability to adapt its model to previously unseen
having the most votes (over all the trees in the forest). data.a simple example of an instance-based learning
Each tree is grown as follows: algorithm is the k-nearest neighbor algorithm.
If the number of cases in the training set is N, sample
N cases at random - but with replacement, from the A.k-nearest neighbor and memory-based reasoning
original data. This sam t split on these m is used to (mbr)
split the node. The value of m is held constant during when trying to solve new problems, people often
the forest growing. look at solutions to similar problems that they have
III.RULE-BASED METHOD previously solved. k-nearest neighbor (k-nn) is a
Rule-based classifiers classify data by using a classification technique that uses a version of this
collection of ―if . . . then . . .‖ rules. The rule same method. it decides in which class to place a new
antecedent or condition is an expression made of case by examining some number — the ―k‖ in k-
attribute conjunctions. The rule consequent is a nearest neighbor — of the most similar cases or
positive or negative classification. In order to build a neighbors. it counts the number of cases for each
rule-based classifier we can follow a direct method to class, and assigns the new case to the same class to
extract rules directly from data. The advantages of which most of its neighbors belong.
rule-based classifiers are that they are extremely
expressive since they are symbolic and operate with
the attributes of the data without any transformation.
Rule-based classifiers, and by extension decision
trees, are easy to interpret, easy to generate and they
can classify new instances efficiently.
Classify records by using a collection of
―if…then…‖ rules
Rule: (Condition) → y
Where Condition is a conjunction of attributes and
Y is the class label
LHS: rule antecedent or condition fig 3: k-nearest neighbor. n is a new case. it would be
RHS: rule consequent assigned to the class x because the seven x’s within the ellipse
outnumber the two y’s
Examples of classification rules k-nn models are very easy to understand when
(Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds there are few predictor variables. they are also useful
(Taxable Income < 50K) ∧ (Refund=Yes) → for building models that involve non-standard data
Evade=No types, such as text. the only requirement for being
rule-based classifier (example) able to include a data type is the existence of an
r1: (give birth = no) ∧ (can fly = yes) → birds appropriate metric.

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India
IEEE - 31661

IV .NEURAL NETWORKS Similar ideas may be applied to undirected, and


In more practical terms neural networks are non- possibly cyclic, graphs; such are called Markov
linear statistical data modeling tools. They can be networks.
used to model complex relationships between inputs Bayesian approaches are a fundamentally
and outputs or to find patterns in Data. Using neural important DM technique. Given the probability
networks as a tool, data warehousing firms are distribution, Bayes classifier can provably achieve
harvesting information from datasets in the process the optimal result. Bayesian method is based on the
known as data mining. The difference between these probability theory. Bayes Rule is applied here to
data warehouses and an ordinary database is that calculate the posterior from the prior and the
there is actual manipulation and cross-fertilization of likelihood, because the later two is generally easier to
the data helping users makes more informed be calculated from a probability modelOne example
decisions. of five attributes Bayes net is shown in figure 5.

A.Feed forward Neural Network


One of the simplest feed forward neural networks
(FFNN), such as in Figure 4, consists of three layers:
an input layer, hidden layer and output layer. In each
layer there are one or more Processing Elements
(PEs). PEs is meant to simulate the neurons in the
brain and this is why they are often referred to as
neurons or nodes. A PE receives inputs from either
the outside world or the previous layer. There are
connections between the PEs in each layer that have a
weight (parameter) associated with them. This weight
is adjusted during training. Information only travels
in the forward direction through the network - there
are no feedback loops.
Fig 5 : A Bayes net for 5 attributes.
A general model can be followed below for
build Bayes net:
Step1: Choose a set of relevant variables.
Step2: Choose an ordering for the variables.
Step3: Assume the variables are X1, X2, ..., Xn
(where X1 is the first, and Xi is the ith).
Step4: for i = 1 to n:
Step5: Add the Xi vertex to the network
Step6: Set Parent(Xi) to be a minimal subset of
Fig 4::FIFO in Neural Networks X1, ..., Xi-1, such that we have conditional
independence of Xi and all other members of X1, ...,
V.BAYESIAN NETWORK Xi-1 given Parents(Xi).
A Bayesian network, Bayesian networks Step7: Define the probability table of P(Xi=k |
are directed acyclic graphs(DAG) whose nodes Assignments of Parent(Xi)).
represent random variables in the Bayesian sense: There are many choices of how to select relevant
they may be observable quantities, latent variables, variables, as well as how to estimate the conditional
unknown parameters or hypotheses. Edges represent probabilities. If the network as a connection of Bayes
conditional dependencies; nodes which are not classifiers, then the probability estimation can be
connected represent variables which are conditionally done applying some PDF like Gaussian. In some
independent of each other. Each node is associated cases, the design of the network can be rather
with a probability function that takes as input a complicated. There are some efficient ways of getting
particular set of values for the node's parent variables relevant variables from the dataset attributes. Assume
and gives the probability of the variable represented the coming signal to be stochastic will give a nice
by the node. For example, if the parents way of extracting the signal attributes. And normally,
are Boolean variables then the probability the likelihood weighting is another way to getting
function could be represented by a table attributes.
of entries, one entry for each of the possible
combinations of its parents being true or false.

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India
IEEE - 31661

VI.SUPPORT VECTOR MACHINES classification algorithms ― on the dataset ―social side


of the internet‖ for two set. For sub set 1, the
Support Vector Machines (SVMs, also support features selected by Feature ranking and for sub set 2
vector networks) are supervised learning models with reliefF filtering. The features selected by feature
associated learning algorithms that analyze data and reduction techniques are chosen as input attributes
recognize patterns. The basic SVM takes a set of with necessary class variables as target attribute and
input data and predicts, for each given input, which various classifications algorithms were executed for
of two possible classes forms the output, making it a all selected features one by one and there error rates
non-probabilistic binary linear classifier are tabled below. In this research their conclusion
SVM models have similar functional form was Rnd Tree performed well for their taken dataset.
to neural networks and radial basis functions, both The following table 1 describes the comparison of
popular data mining techniques. However, neither of various classifications algorithms with error rate
these algorithms has the well-founded theoretical Table 1. Comparisons on classification algorithms
approach to regularization that forms the basis of with error rate
SVM. The quality of generalization and ease of Classification Error rates
training of SVM is far beyond the capacities of these Algorithms
more traditional methods.The support vector machine Face book Twiteer
(SVM) is a training algorithm for learning
classification and regression rules from data, for C4.5 0.0860 0.1042
example the SVM can be used to learn polynomial, C-RT 0.1798 0.1976
radial basis function (RBF) and multi-layer
CS-CRT 0.1798 0.1976
perceptron (MLP)classifiers.
CS-MC4 0.1059 0.1107
ID3 0.1650 0.2084
KNN 0.1871 0.2097
RND 0.004 0.004

[Aman Kumar Sharma] [Suruchi Sahni ] [10] in


the study of classification algorithms they compared
above said algorithms and came out the following
results and Table I demonstrate the classification
Fig 6:SVM algorithm accuracy results of four classification decision tree
[Koliastasis C] [D.K. Despotis ][6] In the algorithms.
classification task, data characteristics tend to Table 2. Classification Accuracy
discriminate better the algorithms. However, in the
regression task, things tend to be more similar Algorithms Correctly Incorrectly
referring to the error rate. On the basis of the analysis classified classified
of the experimental results, some conclusions can be instances instances
drawn for the classification task, as follows: The
deeper is the C5.0 tree the higher is the relative error ID3 4100(89.1111%) 433(9.41%)
of the NN to the error of the C5.0; the lower is the
number of nominal variables the higher is the relative J48 4268(92.7624%) 333(7.23%)
error of DA to the error of C5.0 tree; the higher is the
number of classes the higher is the relative error of ADTree 4183(90.915%) 418(9.08%)
LR to the error of C5.0 tree.
[Matthew N. Anyanwu][Sajjan G. Shiva ][7] The SimpleCART 4262(92.63) 339(7.36%)
experimental analysis also shows that SPRINT and
C4.5 algorithms have a good classification accuracy
compared to other algorithms used in the study. The # Total number of instance 4601
variation of data sets class size, number of attributes It is evident from the table I that J48 has the
and volume of data records is used to determine highest classification accuracy (92.7624%) where
which algorithm has better classification accuracy 4268 instances have been classified correctly and 333
between IDE3 and CART algorithms. instances have been classified incorrectly. The
[p.nancy] [R.geethamani ][8] presents a Second highest classification accuracy for CART
―comparative study of various data mining algorithm is 92.632% in which 4262 instances have

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India
IEEE - 31661

been classified correctly. Moreover the ADTree This can be attributed to more number of useful
showed a classification accuracy of 90.915%. The attributes like Total bilirubin, Direct bilirubin,
ID3 algorithm results in lowest classification Indirect bilirubin, Albumin, Gender, Age and Total
accuracy which is 89.111% among the four proteins are available in the AP liver dataset
algorithms. The ID3 was also not able to classify 68 compared to the UCLA dataset. The common
instances. So the J48 outperforms the CART, attributes for AP liver data and Taiwan data are Age,
ADTree and ID3 in terms of classification accuracy. Sex, SGOT, SGPT, ALP, Total Bilirubin, Direct
Bilirubin, Total Proteins and Albumin are crucial in
[P. Bhargavi ][ S. Jyothi ][9] In this paper, we deciding liver status. With the selected dataset,
have compared the effectiveness of the classification KNN, Back propagation and SVM are giving better
algorithms Genetic algorithm, Fuzzy classification results with the entire feature set combinations.
and Fuzzy clustering.
Table 3. The Average Accuracy Rates Of Supervised VII. CONCLUSION
And Unsupervised Algorithms The goal of classification algorithms is to generate
Method Average Accuracy Rate more certain, precise and accurate system results.
Numerous methods have been suggested for the
Genetic Algorithm 46.67% creation of ensemble of classifiers. Classification
Rules methods are typically strong in modeling interactions.
Fuzzy Classification 099% Several of the classification methods produce a set of
Fuzzy Clustering 090.90% interacting logic that best predict the phenotype.
However, a straightforward application of
classification methods to large numbers of markers
The achieved performances are compared and has a potential risk picking up randomly associated
analyzed on the collected Supervised and markers. But still it is difficult to recommend any one
Unsupervised Soil data shown in Table 3. GATree technique as superior to others as the choice of a
and Fuzzy Classification rules were used for dataset. Finally, there is no single classification
supervised learning. However, classification based on algorithms is best for all kind of dataset.
Fuzzy rules gives much performance than GATree. Classification algorithms are specific in their problem
For Unsupervised learning Fuzzy C-Means algorithm domain.
was used for classifying the soil data. This helps one
to classify soil texture based on soil properties REFERENCES
effectively, which influences fertility, drainage, water [1] Sundar.C, M.Chitradevi and Dr.G.Geetharamani
holding capacity, aeration, tillage, and bearing ―Classification of Cardiotocogram Data using Neural Network
strength of soils. based Machine Learning Technique‖ International Journal of
Computer Applications (0975 – 888) Volume 47– No.14, June
2012
[Bendi Venkata Ramana1] [M.Surendra Prasad
Babu][ Venkateswarlu ][4] in their study, popular [2] Smitha .T, V. Sundaram ―Comparative Study Of Data
Classification algorithms were considered for Mining Algorithms For High Dimensional Data Analysis‖
evaluating their classification performance in terms International Journal Of Advances In Engineering & Technology,
Sept 2012.IJAET ISSN: 2231-1963
of Accuracy, Precision, Sensitivity and Specificity in [3] K Priya, R. Geetha Ramani and Shomona Gracia Jacob
classifying liver patients dataset. Accuracy, ―Data Mining Techniques for Automatic recognition of Carnatic
Precision, Sensitivity and Specificity are better for Raga Swaram notes‖ International Journal of Computer
the AP Liver Dataset compared to UCLA liver Applications (0975 – 8887) Volume 52– No.10, August 2012
[4] Bendi Venkata Ramana1, Prof. M.Surendra Prasad
datasets with all the selected algorithms. Babu2, Prof. N. B. Venkateswarlu ―A Critical Study of Selected
Classification Algorithms for Liver Disease Diagnosis‖
Table 4: Performance of Classification Algorithms International Journal of Database Management Systems ( IJDMS ),
with all features of UCLA Liver Dataset Vol.3, No.2, May 2011
[5] Andrew Secker1, Matthew N. Davies2, Alex A.
Classifica Accuracy Precision Sensitivity Specificit
Freitas1, Jon Timmis3, Miguel Mendao3, Darren R. Flower2 ―An
t-ion y
Experimental Comparison of Classification Algorithms for
Algorithm
theHierarchical Prediction of Protein Function‖
NBC 56.5 48.9 77.93 41 [http://www.cs.kent.ac.uk/projects/biasprofs]
C 4.5 68.6 65.8 53.1 80 [6] Koliastasis C and D.K. Despotis ―Rules for Comparing
Back 71.5 69.7 57.24 82 Predictive Data Mining Algorithms by Error Rate‖ OPSEARCH,
VOL. 41, No. 3, 2004 Operational Research Society of India
propaga [7] Matthew N. Anyanwu, Sajjan G. Shiva ―Comparative
tion Analysis of Serial Decision Tree Classification Algorithms‖
K-NN 62.8 55.7 56.51 67.5 International Journal of Computer Science and Security, (IJCSS)
SVM 58.2 1 68.9 1 Volume (3) : Issue (3) page number 230

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India
IEEE - 31661

[8] Mrs.P.Nancy, R. Geetha Ramani and Shomona Gracia Computer Applications‖ 60(12):26-31, December 2012. Published
Jacob ―A Comparison on Performance of Data Mining Algorithms by Foundation of Computer Science.
in Classification of Social Network Data‖ International Journal of [25] G. Th. Papadopoulos K. Chandramouli, V. Mezaris, I.
Computer Applications (0975 – 8887) Volume 32– No.8, October Kompatsiaris, E. Izquierdo3 and M.G. Strintzis ―A Comparative
2011 Study of Classification Techniques for Knowledge-Assisted Image
[9] P. Bhargavi 1, Dr. S. Jyothi ―Soil Classification Using Analysis ―978-0-7695-3130-4/08 ,2008 IEEE.
Data Mining Techniques: A Comparative Study ― International
Journal of Engineering Trends and Technology- July to Aug Issue
2011
[10] Aman Kumar Sharma Suruchi Sahni ― Comparative AUTHORS PROFILE
Study of Classification Algorithms for Spam Email Data Analysis‖
International Journal on Computer Science and Engineering Dr.S.Sukumaran is working as an Associate
(IJCSE) ISSN : 0975-3397 Vol. 3 No. 5 May 2011 1890-1895. Professor in the Department of Computer Science
[11] Thair Nu Phyu ―Survey of Classification Techniques in
Data Mining‖ Proceedings of the International Multi Conference of
and Mathematics, Erode Arts and Science College,
Engineers and Computer Scientists 2009 Vol IIMECS 2009, Erode, India. He has more than 20 years of teaching
March 18 - 20, 2009, Hong Kong. and research experience. His area of specialization
[12] Muskan Kukreja, Stephen Albert Johnston and Phillip includes Data mining, Evolutionary algorithms and
Stafford ―Comparative study of classification algorithms for
immunosignaturing data‖ Kukreja et al. BMC Bioinformatics
network security. He has over 20 Publications in
2012, 13:139 http://www.biomedcentral.com/1471-2105/13/139 international conferences and journals to his credit.
[13] Mahendra Tiwari, Manu Bhai Jha, OmPrakash Yadav
―Performance analysis of Data Mining algorithms in Weka‖ IOSR Mr. G.Kesavaraj completed his M.Sc.,(Computer
Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661,
ISBN: 2278-8727 Volume 6, Issue 3 (Sep-Oct. 2012), PP 32-41
Science) at Sri Vasavi College, Erode and Completed
www.iosrjournals.org his M.C.A and M.Phil.,(Computer Science)at
[14] N. Sivaram,K. Ramar ―Applicability of Clustering and Bharathiar and Periyar University respectively. He
Classification Algorithms for Recruitment Data Mining‖ has more than 10 years of teaching experience.
International Journal of Computer Applications (0975 – 8887)
Volume 4 – No.5, July 2010
Presently he is pursuing his Ph.D in Computer
[15] Anurag Upadhayay , Suneet Shukla and Sudsanshu Science at Manonmaniam Sundaranar University,
Kumar ―Empirical Comparison by data mining Classification Tirunelvelli. His areas of interest include Data
algorithms (C 4.5 & C 5.0) for thyroid cancer data set‖ Anurag mining, Software Engineering and Operating System.
Upadhayay et al, International Journal of Computer Science &
Communication Networks,Vol 3(1), 64-68 ISSN:2249-5789
He has attended and presented few papers in National
[16] Eric bauer, Ron kohavi ―An Empirical Comparison of and International Level.
Voting Classification Algorithms: Bagging, Boosting, and
Variants‖ Mining and Visualization, Silicon Graphics Inc. 2011 N.
Shoreline Blvd, Mountain View, CA. 94043
[17] Narendran Rajasekar Callura, Dr. Sin Wee Lee ―DATA
MINING - CLASSIFICATION ALGORITHM – EVALUATION‖
MSC Internet Systems Engineering University of East London,
May 8 th , 2009.
[18] Venkatadri.M, Lokanatha C. Reddy ―A Comparative
Study On Decision Tree classification Algorithms In Data Mining
―International Journal Of Computer Applications Engineering,
Technology And Sciences (Ij-Ca-Ets) Issn: 0974-3596 April ’10 –
Sept ’2010 , Volume 2 : Issue 2 , Page:24
[19] Neslihan Dogan, Zuhal Tanrikulu ― A Comparative
Framework for Evaluating Classification Algorithms‖ Proceedings
of the World Congress on Engineering 2010 Vol I WCE 2010,
June 30 - July 2, 2010, London, U.K. ISBN: 978-988-17012-9-9
[20] Kukreja M, Johnston SA, Stafford P ―Comparative
study of classification algorithms for immunosignaturing data‖
BMC Bioinformatics. 2012 Jun 21;13:139. doi: 10.1186/1471-
2105-13-139.
[21] Seongwook Youn and Dennis McLeod ―A Compara

-
N Dogan - 2010
[22] Hetal Bhavsar, Amit Ganatra ‖ A Comparative Study of
Training Algorithms for Supervised Machine Learning‖
International Journal of Soft Computing and Engineering (IJSCE)
ISSN: 2231-2307, Volume-2, Issue-4, September 2012.
[23] R. D. King, R. Henery and C. Feng ― A Comparative
Study of Classification Algorithms: Statistical, Machine Learning
and Neural Network‖
[24] Karthikeyani, Parvin Begum, K Tajudin and Shahina
Begam. ―Comparative of Data Mining Classification Algorithm
(CDMCA) in Diabetes Disease Prediction. International Journal of

4th ICCCNT - 2013


July 4 - 6, 2013, Tiruchengode, India

You might also like