Professional Documents
Culture Documents
Ying Wu
Electrical Engineering and Computer Science
Northwestern University
Evanston, IL 60208
http://www.eecs.northwestern.edu/~yingwu
1 / 24
Connectionism
in 1940s
in 1950s
in 1960s
again in 1980s
Expert systems
again in 1990s
where to go?
3 / 24
Outline
Neuron Model
Multi-Layer Perceptron
Radial Basis Function Networks
4 / 24
Neuron: the Basic Unit
x
1
x
2
x
d
w
1
w
2
w
d
.
.
.
Input x = [1, . . . , x
d
]
T
R
d+1
i =0
w
i
x
i
= w
T
x
We can use
f (x) = sgn(x) =
_
1 x 0
1 x < 0
(x) = 1 f
2
(x)
Or
f (x) =
1
1 + e
x
, f (x) (0, 1)
its derivative
f
x
1
.
.
.
x
2
input layer
output layer
x
d
z
1
z
c
Desired output t = [t
1
, . . . , t
c
]
T
R
c
Actual output z
i
= w
T
i
x, i = 1, . . . , c
Learning (Widrow-Ho)
w
i
(t + 1) = w
i
(t) + (t
i
z
i
)x = w
i
(t) + (t
i
w
T
i
x)x
.
.
.
x
1
.
.
.
x
2
input layer hidden layer output layer
x
d
Hidden layer y
j
= f (w
T
j
x), j = 1, . . . , n
H
Output layer z
k
= f (w
T
k
y), k = 1, . . . , c
j =1
w
kj
f
_
d
i =1
w
ji
x
i
+ w
j 0
_
+ w
k0
_
_
Larger n
H
results in overtting
Smaller n
H
leads to undertting
10 / 24
Training the Network
In a general form
w(k + 1) = w(k)
J
w
w
kj
is the weight between output node k and hidden node j
J
w
kj
=
J
net
k
net
k
w
kj
i
=
J
net
i
k
=
J
net
k
=
J
z
k
z
k
net
k
= (t
k
z
k
)f
(net
k
)
As net
k
=
n
H
j =1
w
kj
y
j
, it is clear that
net
k
w
kj
= y
j
So we have w
kj
=
k
y
j
= (t
k
z
k
)f
(net
k
)y
j
As did before
J
w
ji
=
J
y
j
y
j
net
j
net
j
w
ji
. .
easy
k=1
(t
k
z
k
)
2
_
=
c
k=1
(t
k
z
k
)
z
k
y
j
=
c
k=1
(t
k
z
k
)
z
k
net
k
net
k
y
j
=
c
k=1
(t
k
z
k
)f
(net
k
)w
kj
=
c
k=1
k
w
kj
j
=
J
net
j
=
J
y
j
y
j
net
j
= f
(net
j
)
c
k=1
w
kj
k
13 / 24
Why is it Called Back Propagation?
j
1
2
k
c
input layer
hidden layer
output layer
w
kj
Sensitivity
i
reects the information on node i
j
of a hidden node j combines two sources of information
k=1
w
kj
(net
j
)
c
k=1
w
kj
k
x
i
14 / 24
Algorithm: Back-propagation (BP)
Algorithm 1: Stochastic Back-propagation
Init: n
H
, w, stop criterion , , k = 0
Do k k + 1
x
k
randomly pick
forward compute y and then z
backward compute {
k
} and then {
j
}
w
kj
w
kj
+
k
y
j
w
ji
w
ji
+
j
x
i
Until J(w) <
Return w
x
[g
k
(x; w)t
k
]
2
=
x
k
[g
k
(x; w)1]
2
+
x /
k
[g
k
(x; w)0]
2
Softmax activation
z
k
=
e
net
k
c
m=1
e
net
m
17 / 24
Practice: Number of Hidden Units
.
.
.
x
1
.
.
.
x
2
input layer hidden layer output layer
x
d
K
K
K
w
kj
2
2
2
}
The output
z
k
(x) =
n
H
j =0
w
kj
K(x, x
j
)
22 / 24
Interpretation
The weights W