2007 02 01b Janecek Perceptron

The Simple Perceptron
Artificial Neural Network
Information processing architecture loosely

modelled on the brain
Consist of a large number of interconnected
processing units (neurons)
Work in parallel to accomplish a global task
Generally used to model relationships between
inputs and outputs or find patterns in data
Artificial Neural Network
3 Types of layers
Single Processing Unit
Activation Functions
Function which takes the total input and

produces an output for the node given some
threshold.
Network Structure
Two main network structures

1. Feed-Forward
Network
2. Recurrent
Network
Network Structure
Two main network structures

1. Feed-Forward
Network
2. Recurrent
Network
Learning Paradigms
Supervised Learning:
Given training data consisting of pairs of
inputs/outputs, find a function which correctly
matches them
Unsupervised Learning:
Given a data set, the network finds patterns and
categorizes into data groups.
Reinforcement Learning:
No data given. Agent interacts with the
environment calculating cost of actions.
Simple Perceptron
The perceptron is a single layer feed-forward

neural network.
Simple Perceptron
Simplest output function
Used to classify patterns said to be linearly

separable
Linearly Separable
Linearly Separable
The bias is proportional to the offset of the plane
from the origin
The weights determine the slope of the line
The weight vector
is perpendicular to
the plane
Perceptron Learning Algorithm
We want to train the perceptron to classify

inputs correctly
Accomplished by adjusting the connecting
weights and the bias
Can only properly handle linearly separable
sets
We have a training set which is a set of input

vectors used to train the perceptron.
During training both wi and (bias) are modified
for convenience, let w0 = and x0 = 1
Let, , the learning rate, be a small positive
number (small steps lessen the possibility of
destroying correct classifications)
Initialise wi to some values

Desired output
1
if
x
nset
A
d n =
1 if x nset B
1. Select random sample from training set as input

2. If classification is correct, do nothing
3. If classification is incorrect, modify the weight
vector w using
w i =w i d n x i n
Repeat this procedure until the entire training set
is classified correctly
Learning Example
Initial Values:
= 0.2
0
w= 1
0.5
0 = w 0w 1 x1w 2 x 2
= 0x 10.5x 2
x 2 = 2x1
Learning Example
= 0.2
0
w= 1
0.5
x1 = 1, x2 = 1
wTx > 0
Correct classification,
no action
Learning Example
= 0.2
0
w= 1
0.5
x1 = 2, x2 = -2
w 0 = w 00.21
w 1 = w 10.22
w 2 = w 20.22
Learning Example
= 0.2
0.2
w = 0.6
0.9
x1 = 2, x2 = -2
w 0 = w 00.21
w 1 = w 10.22
w 2 = w 20.22
Learning Example
= 0.2
0.2
w = 0.6
0.9
x1 = -1, x2 = -1.5
wTx < 0
no action
Learning Example
= 0.2
0.2
w = 0.6
0.9
x1 = -2, x2 = -1
wTx < 0
no action
Learning Example
= 0.2
0.2
w = 0.6
0.9
x1 = -2, x2 = 1
w 0 = w 00.21
w 1 = w 10.22
w 2 = w 20.21
Learning Example
= 0.2
0
w = 0.2
1.1
x1 = -2, x2 = 1
w 0 = w 00.21
w 1 = w 10.22
w 2 = w 20.21
Learning Example
= 0.2
0
w = 0.2
1.1
x1 = 1.5, x2 = -0.5
w 0 = w 00.21
w 1 = w 10.21.5
w 2 = w 20.20.5
Learning Example
= 0.2
0.2
w = 0.5
1
x1 = 1.5, x2 = -0.5
w 0 = w 00.21
w 1 = w 10.21.5
w 2 = w 20.20.5
Perceptron Convergence Theorem

The theorem states that for any data set which is
linearly separable, the perceptron learning rule is
guaranteed to find a solution in a finite number of
iterations.
Idea behind the proof: Find upper & lower

bounds on the length of the weight vector to show
finite number of iterations.

Let's assume that the input variables come from
two linearly separable classes C1 & C2.
Let T1 & T2 be subsets of training vectors which

belong to the classes C1 & C2 respectively.
Then T1 U T2 is the complete training set.

As we have seen, the learning algorithms
purpose is to find a weight vector w such that
wx 0 xC 1
wx 0 xC 2
(x is an input vector)
If the kth member of the training set, x(k), is

correctly classified by the weight vector w(k)
computed at the kth iteration of the algorithm,
then we do not adjust the weight vector.
However, if it is incorrectly classified, we use the
modifier w k 1=w k d k x k

So we get
w k 1 = w k x k if w k x k 0, x k C 2
w k 1 = w k x k if w k x k 0, x k C 1
We can set = 1, as for 1 (>0) just scales

the vectors.
We can also set the initial condition w(0) = 0, as
any non-zero value will still converge, just
decrease or increase the number of iterations.

Suppose that w(k)x(k) < 0 for k = 1, 2, ... where
x(k) T1, so with an incorrect classification we
get
w k 1=w k x k x k C 1
By expanding iteratively, we get
w k 1= x k w k
=x
k
x
k
1w
k
1
.
.
.
=x k ...x 1w 0
.

As we assume linear separability, a solution w*
where wx(k) > 0, x(1)...x(k) T1. Multiply both
sides by the solution w* to get
w w k 1 = w x 1...w x k
These are all > 0,
hence all >= ,
where
Thus we get
=min w xk
w w k 1 k

Now we make use of the Cauchy-Schwarz
inequality which states that for any two vectors
A, B
2
2
2
A B = AB
Applying this we get
2
w w k 1 w w k1
From the previous slide we know

Thus, it follow that
ww k 1 k
2
k
w k 1 2
w
2

We continue the proof by going down another
route.
w j1 = w jx j
for j=1, ... , k with x j T 1
We square the Euclidean norm on both sides

2
w j1 =w j x j
2
2
= w j x j 2w jx j
Thus we get
w j12w j2 x j2
incorrectly
classified,
so < 0

Summing both sides for all j
2
w j1 w j x j
w j 2w j12 x j12
.
.
.
We get
w 1 w 0 x 1
k
w k 1 x j
2
j=1
= max x j

But now we have a conflict between the
equations, for sufficiently large values of k
2
k
w k 1 2
w
w k 1 k
So, we can state that k cannot be larger than

some value kmax for which the two equations are
both satisfied.
k max =
2
2
max
2
w
k max =
2

Thus it is proved that for k = 1, k, w(0) = 0,
given that a solution vector w* exists, the
perceptron learning rule will terminate after at
most kmax iterations.
The End

2007 02 01b Janecek Perceptron

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2007 02 01b Janecek Perceptron

Uploaded by

Copyright:

Available Formats

The Simple Perceptron

Artificial Neural Network

Information processing architecture loosely

Artificial Neural Network

Single Processing Unit

Function which takes the total input and

Two main network structures

Two main network structures

The perceptron is a single layer feed-forward

Simplest output function

Used to classify patterns said to be linearly

Perceptron Learning Algorithm

We want to train the perceptron to classify

Perceptron Learning Algorithm

We have a training set which is a set of input

Perceptron Learning Algorithm

1. Select random sample from training set as input

Perceptron Convergence Theorem

Idea behind the proof: Find upper & lower

Perceptron Convergence Theorem

Let T1 & T2 be subsets of training vectors which

Perceptron Convergence Theorem

If the kth member of the training set, x(k), is

Perceptron Convergence Theorem

We can set = 1, as for 1 (>0) just scales

Perceptron Convergence Theorem

Perceptron Convergence Theorem

Perceptron Convergence Theorem

From the previous slide we know

Perceptron Convergence Theorem

for j=1, ... , k with x j T 1

We square the Euclidean norm on both sides

Perceptron Convergence Theorem

Perceptron Convergence Theorem

So, we can state that k cannot be larger than

Perceptron Convergence Theorem

You might also like