Professional Documents
Culture Documents
Amir R Razavi
Department of Biomedical Engineering, Division of Medical Informatics
Linköpings universitet, Linköping, Sweden
• Introduction
• From data to Knowledge
– Data pre-processing - (Paper I)
– Data mining - (Paper II)
– Validating Predictive models - (Paper III)
• Discussion
• Future works
• Data Mining
– “…the process of discovering meaningful new
correlations, patterns, and trends by sifting through
large amounts of data…” (Gartner Group)
– “…the analysis of observational data sets to find
unsuspected relationships and to summarize data in
novel ways…” (Hand et al.)
– “…is an interdisciplinary field bringing together
techniques from machine learning, pattern recognition,
statistics, databases, and visualization…” (Cabana et
al.)
– …
• Data reduction:
– Obtain a reduced representation of the dataset
that is much smaller in volume but yet produce
the same or almost the same analytical results.
• Why to do it?
– The dataset may be gigantic in volume
– Processing time
• Dimension reduction
– Removes unimportant attributes: Canonical
Correlation Analysis (CCA)
• Data Compression
• Reducing the number of instances
• Discretization and concept hierarchy
generation
• DTI
– Pros
• Reasonable training time
• Fast application
• Easy to interpret
• Easy to implement
• Can handle large number of features
– Cons
• Cannot handle complicated relationship between
features
• Validating methods:
– Examining an independent dataset.
– Cross validation:
• Divides the whole data by random sampling into n
folds (partitions) and perform n times testing.
– At each testing, one partition of data is used as the testing
set and the rest is training set.
• Leave-one-out cross-validation
–…
• 3699 patients
• A decision tree was trained with all patients except
for 100 cases and tested with those 100 cases.
• Two domain experts were asked to give their
opinion about the probability of recurrence of a
certain outcome for these 100 patients.
• ROC curves and area under the ROC curves
(AUC) for predictions were computed and
compared.
80
60
Sensitivity
DTI_J48
Oncologist_1
Oncologist_2
40
20
0
0 20 40 60 80 100
100-Specificity
amira@imt.liu.se
Dept of Biomedical Engineering, Medical Informatics
Linköpings universitet, Linköping, Sweden 51
http://www.imt.liu.se