Professional Documents
Culture Documents
Omar M. Osama
Abstract
This report presents a study of various classication methods ap-
plied on South Africa heart disearse problem. It is found that the
neural network method is much better, i.e., the error down in range
between 15.5% and 16.5%.
1 Introduction
This report presents three dierent classiers that estimate the response of
the South Africe heart disearse problem. In this problemt there are nine
features, systolic blood pressure, cumulative tobacco, low densiity lipopro-
tein, cholesterol, adiposity, family history of heart disease (Present, Absent),
type-A behavior, obesity, current alcohol consumption and age at onset. And
the response is coronary heart disease. The classiers that are used is Linear
Discriminant Analysis, Logistic Regression, and Neural Netwrk.
(C2 ) C
C2 (C1 )
1
1
so if (C2 ) > (C1 ) we assign to C1
and if (C2 ) < (C1 ) we assign to C2 . Which makes a lot of sense.
since (C1 ) = L12 P (C1 |X = x) + L22 P (C2 |X = x)
and (C2 ) = L11 P (C1 |X = x) + L21 P (C2 |X = x).
L12 : is the loss when you assign to C2 but it is C1 .
and L21 : is the loss when you assign to C1 but it is C2 .
L11 and L22 is right decisions.
so
L12 P (C1 |X = x)+L22 P (C2 |X = x) C
C2 L11 P (C1 |X = x)+L21 P (C2 |X = x)[Lii = 0usualy]
1
after calculating
P (C1 |X = x) C1 L21
P (C2 |X = x) C2 L12
P (X |C1 ) C1 2 L21
P (X |C2 ) C2 1 L12
f1 (X) C1 2 L21
f2 (X) C2 1 L12
f2 (X)
h (X) C
C2 th
1
(1)
so:
if f1 (X) > th we assign to C1 where if f2 (X) > th we assign to C2 . Which
makes alot of sense as we assign according to the greatest prior.
Now let's begin with LDA.
2
2.2 How does it work?
3
N
X
L() = {yi xi log (1 + expxi )}
i=1
4
devide the rst over the second then nd new new . Do it again and so on.
The algorithm tunes value to minimize score. When score is close to zero
then new w old . When that happens satises the model, Which makes
alot of sense.
M
!
X
(2) (2) (1)
Yk = (2) w0k + wmk (1) wm0 + Wm X
m=1
M
!
(2)
X (2)
Yk = (2) w0k + wmk Zm
m=1
5
The main idea is to project the data on the weights directions. Which makes
us look to data from diferent sides. Because data could be more understand-
able from that side.
(1) is the sigmoid function. Which adds exibility to the model. If we
use linear function instead of sigmoid function, the large values in x-axis will
be theoritically unacceptable.
(2) is the identity function or soft-max function, for regression or classi-
cation respectively.
We train on some data with some weights [w1 w2 . . . wI ] and test on another
using wi which minimize error. Hoping that we trained on some data which
are so close to the population. The question now is how to nd that w.
N
X
E(w) = (y (xi , w) yi )2
i=1
N M
! !2
X X
(2) (2) (1)
E(w) = (2) w0k + wmk (1) wm0 + Wm X yi
i=1 m=1
tansig function used as (2) instead of softmax which works in a very strange
manner. Training stops when the performance gradient falls below 104
6
4.3.1 Before Regularization
Figure 1:
Best error before regularization
The best error is in range between 16.5% and 17% after many tries.
Figure 2:
The best error in range between 15.5% and 16.5 after many tries.