You are on page 1of 1

Fig.

1 illustrates the concepts in terms of a simple example of data about a group


of college friends. The data
table includes the following columns: name of the student; colour of hair; height
and weight; a record of whether the
student has been using lotion when exposed to sun; a record of whether the student
gets sunburned when on the beach; a
record about the proximity of the living locations of the students; transaction
reference numbers; and student address.
The double dotted line contours the portion of the data table which will be
considered in the predictive modelling
approach. As the �Name� column contains unique identifiers, it will be ignored, and
the data mining task will be to
develop a model of the student from this college with respect to the attributes
�hair�, �height�, �weight�, �lotion� and
�on the beach�. In an unsupervised approach, the students will be clustered into
groups and the analyst ends up with the
description of the different groups. In this case, the analyst is interested in
predicting whether a new student will get
sunburned when visiting the beach. The attribute �on the beach� is selected as the
�output� (or �target�) and the
attributes �hair� to �lotion� form the input vector. Given the values for the
attributes �hair� to �lotion� for a new
student, the resultant classifier should be able to predict whether the student
will get sunburned or not. The key measure
of the quality of the model is the accuracy of predictions, rather than the theory
that may explain the phenomena
through the relations between the values of the attributes.

You might also like