1 illustrates the concepts in terms of a simple example of data about a group
of college friends. The data table includes the following columns: name of the student; colour of hair; height and weight; a record of whether the student has been using lotion when exposed to sun; a record of whether the student gets sunburned when on the beach; a record about the proximity of the living locations of the students; transaction reference numbers; and student address. The double dotted line contours the portion of the data table which will be considered in the predictive modelling approach. As the �Name� column contains unique identifiers, it will be ignored, and the data mining task will be to develop a model of the student from this college with respect to the attributes �hair�, �height�, �weight�, �lotion� and �on the beach�. In an unsupervised approach, the students will be clustered into groups and the analyst ends up with the description of the different groups. In this case, the analyst is interested in predicting whether a new student will get sunburned when visiting the beach. The attribute �on the beach� is selected as the �output� (or �target�) and the attributes �hair� to �lotion� form the input vector. Given the values for the attributes �hair� to �lotion� for a new student, the resultant classifier should be able to predict whether the student will get sunburned or not. The key measure of the quality of the model is the accuracy of predictions, rather than the theory that may explain the phenomena through the relations between the values of the attributes.