Neural Networks

NEURAL NETWORKS
NEURAL NETWORKS
Vedat Tav
Vedat Tav
ano
ano
lu
lu
What Is a Neural Network?

Work on
Work on
artificial neural networks
artificial neural networks
,
,
commonly referred to as "
commonly referred to as "
neural networks
neural networks
,"
,"
has been motivated right from its inception
has been motivated right from its inception
by the recognition that the
by the recognition that the
brain computes
brain computes
in an entirely different way
in an entirely different way
from the
from the
conventional digital computer.
conventional digital computer.

The struggle to understand the brain owes much
The struggle to understand the brain owes much
to the pioneering work of Ramon y Cajal (1911),
to the pioneering work of Ramon y Cajal (1911),
who introduced the idea of
who introduced the idea of
neurons
neurons
as
as
structural
structural
constituents of the brain.
constituents of the brain.

Typically, neurons are five to six orders of
Typically, neurons are five to six orders of
magnitude slower than silicon logic gates; events
magnitude slower than silicon logic gates; events
in a silicon chip happen in the nanosecond (10
in a silicon chip happen in the nanosecond (10
- -9 9
s)
s)
range, whereas neural events happen in the
range, whereas neural events happen in the
millisecond (10
millisecond (10
- -3 3
s) range.
s) range.

However, the brain makes up for the
However, the brain makes up for the
relatively slow rate of operation of a neuron
relatively slow rate of operation of a neuron
by having a truly staggering number of
by having a truly staggering number of
neurons (
neurons (
nerve cells
nerve cells
) with massive
) with massive
interconnections between them.
interconnections between them.

It is estimated that there must be on the order of
It is estimated that there must be on the order of
10 billion neurons in the human cortex, and 60
10 billion neurons in the human cortex, and 60
trillion synapses or connections (Shepherd and
trillion synapses or connections (Shepherd and
Koch, 1990). The net result is that the brain is an
Koch, 1990). The net result is that the brain is an
enormously efficient structure. Specifically, the
enormously efficient structure. Specifically, the
energetic efficiency
energetic efficiency
of the brain is approximately
of the brain is approximately
10
10
- -16 16
joules (J) per operation per second.
joules (J) per operation per second.

The corresponding value for the best computers
The corresponding value for the best computers
in use
in use
today
today
is about 10
is about 10
- -6 6
joules per operation per
joules per operation per
second (Faggin, 1991).
second (Faggin, 1991).

The brain is a highly
The brain is a highly
complex, nonlinear,
complex, nonlinear,
and parallel computer
and parallel computer
(
(
information
information
-
-
processing system
processing system
). It has the capability of
). It has the capability of
organizing neurons so as to perform certain
organizing neurons so as to perform certain
computations (
computations (
e.g., pattern recognition,
e.g., pattern recognition,
perception, and motor control
perception, and motor control
) many times
) many times
faster than the fastest digital computer in
faster than the fastest digital computer in
existence today.
existence today.

Consider, for example, human
Consider, for example, human
vision
vision
,
,
which is an
which is an
information
information
-
-
processing task (Churchland and
processing task (Churchland and
Sejnowski, 1992; Levine, 1985; Marr, 1982).
Sejnowski, 1992; Levine, 1985; Marr, 1982).

It is the function of the visual system to provide a
It is the function of the visual system to provide a
representation
representation
of the environment around us and,
of the environment around us and,
more important, to supply the information we
more important, to supply the information we
need to
need to
interact
interact
with the environment.
with the environment.

The brain routinely accomplishes perceptual
The brain routinely accomplishes perceptual
recognition tasks (e.g., recognizing a
recognition tasks (e.g., recognizing a
familiar face embedded in an unfamiliar
familiar face embedded in an unfamiliar
scene) in something of the order of 100
scene) in something of the order of 100
-
-
200
200
ms, whereas tasks of much lesser complexity
ms, whereas tasks of much lesser complexity
will take hours on conventional computers.
will take hours on conventional computers.

For another example, consider the
For another example, consider the
sonar
sonar
of a bat.
of a bat.
Sonar is an active echo
Sonar is an active echo
-
-
location system.
location system.

In addition to providing information about how far
In addition to providing information about how far
away a target (e.g., a flying insect) is, a bat sonar
away a target (e.g., a flying insect) is, a bat sonar
conveys information about the relative velocity of
conveys information about the relative velocity of
the target, the size of the target, the size of various
the target, the size of the target, the size of various
features of the target, and the azimuth and
features of the target, and the azimuth and
elevation of the target (Suga, 1990a, b).
elevation of the target (Suga, 1990a, b).

The complex neural computations needed to
The complex neural computations needed to
extract all this information from the target
extract all this information from the target
echo occur within a brain the size of a plum.
echo occur within a brain the size of a plum.
Indeed, an echo
Indeed, an echo
-
-
locating bat can pursue and
locating bat can pursue and
capture its target with a facility and success
capture its target with a facility and success
rate that would be the envy of a radar or
rate that would be the envy of a radar or
sonar engineer.
sonar engineer.

How, then, does a human brain or the brain
How, then, does a human brain or the brain
of a bat do it?
of a bat do it?

At birth, a brain has great structure and the
At birth, a brain has great structure and the
ability to build up its own rules through
ability to build up its own rules through
what we usually refer to as "experience."
what we usually refer to as "experience."

Indeed, experience is built up over the years,
Indeed, experience is built up over the years,
with the most dramatic development (i.e.,'
with the most dramatic development (i.e.,'
hard
hard
-
-
wiring) of the human brain taking place
wiring) of the human brain taking place
in the first two years from birth; but the
in the first two years from birth; but the
development continues well beyond that stage.
development continues well beyond that stage.

During this early stage of development, about 1
During this early stage of development, about 1
million synapses are formed per second.
million synapses are formed per second.

Synapses
Synapses
are elementary structural and
are elementary structural and
functional units that mediate the
functional units that mediate the
interactions between neurons. The most
interactions between neurons. The most
common kind of synapse is a
common kind of synapse is a
chemical
chemical
synapse
synapse
,
,
which operates as follows:
which operates as follows:

A presynaptic process liberates a
A presynaptic process liberates a
transmitter
transmitter
substance that diffuses across the synaptic
substance that diffuses across the synaptic
junction between neurons and then acts on a
junction between neurons and then acts on a
postsynaptic process.
postsynaptic process.

Thus a synapse converts a presynaptic
Thus a synapse converts a presynaptic
electrical signal into a chemical signal and
electrical signal into a chemical signal and
then back into a postsynaptic electrical
then back into a postsynaptic electrical
signal (Shepherd and Koch, 1990).
signal (Shepherd and Koch, 1990).

In electrical terminology, such an element is
In electrical terminology, such an element is
said to be a
said to be a
nonreciprocal two
nonreciprocal two
-
-
port device
port device
.
.

In traditional descriptions of neural
In traditional descriptions of neural
organization, it is assumed that a synapse is
organization, it is assumed that a synapse is
a simple connection that can impose
a simple connection that can impose
excitation
excitation
or
or
inhibition
inhibition
,
,
but not both on the
but not both on the
receptive neuron.
receptive neuron.

A developing neuron is synonymous with a plastic
A developing neuron is synonymous with a plastic
brain:
brain:
Plasticity
Plasticity
(
(
[Latin plasticus, from Greek
[Latin plasticus, from Greek
plastikos, from plastos, molded, from plassein, to
plastikos, from plastos, molded, from plassein, to
mold; see pel
mold; see pel
-
-
2 in Indo
2 in Indo
-
-
European roots.]
European roots.]
)
)
permits
permits
the developing nervous system to adapt to its
the developing nervous system to adapt to its
surrounding environment (Churchland and
surrounding environment (Churchland and
Sejnowski, 1992; Eggermont, 1990). In an adult
Sejnowski, 1992; Eggermont, 1990). In an adult
brain, plasticity may be accounted for by two
brain, plasticity may be accounted for by two
mechanisms: the creation of new synaptic
mechanisms: the creation of new synaptic
connections between neurons, and the
connections between neurons, and the
modification of existing synapses.
modification of existing synapses.

Axons
Axons
,
,
the transmission lines, and
the transmission lines, and
dendrites
dendrites
,
,
the
the
receptive zones, constitute two types of cell
receptive zones, constitute two types of cell
filaments that are distinguished on morphological
filaments that are distinguished on morphological
grounds; an axon has a smoother surface, fewer
grounds; an axon has a smoother surface, fewer
branches, and greater length, whereas a dendrite
branches, and greater length, whereas a dendrite
(so called because of its resemblance to a tree)
(so called because of its resemblance to a tree)
has an irregular surface and more branches
has an irregular surface and more branches
(Freeman, 1975).
(Freeman, 1975).

Neurons come in a wide variety of shapes
Neurons come in a wide variety of shapes
and sizes in different parts of the brain. The
and sizes in different parts of the brain. The
figure illustrates the shape of a
figure illustrates the shape of a
pyramidal
pyramidal
cell
cell
,
,
which is one of the most common types
which is one of the most common types
of cortical neurons.
of cortical neurons.

Like many other types of neurons, it receives
Like many other types of neurons, it receives
most of
most of
its inputs through dendritic spines.
its inputs through dendritic spines.

The pyramidal cell can receive 10,000 or
The pyramidal cell can receive 10,000 or
more synaptic contacts and it can project
more synaptic contacts and it can project
onto thousands of target cells.
onto thousands of target cells.

Just as plasticity appears to be essential to the
Just as plasticity appears to be essential to the
functioning of neurons as information
functioning of neurons as information
processing
processing
units in the human brain, so it is with neural
units in the human brain, so it is with neural
networks made up of artificial neurons.
networks made up of artificial neurons.

In its most general form, a
In its most general form, a
neural network
neural network
is
is
a machine that is designed to
a machine that is designed to
model
model
the way
the way
in which the brain performs a particular task
in which the brain performs a particular task
or function of interest; the network is usually
or function of interest; the network is usually
implemented using electronic components
implemented using electronic components
or simulated in software on a digital
or simulated in software on a digital
computer.
computer.

In most cases the interest is confined largely
In most cases the interest is confined largely
to an important class of neural networks that
to an important class of neural networks that
perform useful computations through a
perform useful computations through a
process of
process of
learning
learning
.
.

To achieve good performance, neural
To achieve good performance, neural
networks employ a massive interconnection
networks employ a massive interconnection
of simple computing cells referred to as
of simple computing cells referred to as
"
"
neurons
neurons
" or "
" or "
processing units
processing units
." We may
." We may
thus offer the following definition of a neural
thus offer the following definition of a neural
network viewed as an adaptive machine:
network viewed as an adaptive machine:

A neural network is a massively parallel distributed
A neural network is a massively parallel distributed
processor that has a natural propensity for storing
processor that has a natural propensity for storing
experiential knowledge and making it available for use.
experiential knowledge and making it available for use.
It resembles the brain in two respects:
It resembles the brain in two respects:
1
1
Knowledge is acquired by the network
Knowledge is acquired by the network
through a learning process .
through a learning process .
2
2
Interneuron connection strengths known as
Interneuron connection strengths known as
synaptic weights are used to store the knowledge.
synaptic weights are used to store the knowledge.

The procedure used to perform the learning
The procedure used to perform the learning
process is called a
process is called a
learning algorithm
learning algorithm
,
,

the function of which is to modify the synaptic
the function of which is to modify the synaptic
weights of the network
weights of the network

in an orderly fashion so as to attain a desired
in an orderly fashion so as to attain a desired
design objective.
design objective.

The modification of synaptic weights provides the
The modification of synaptic weights provides the
traditional method for the design of neural
traditional method for the design of neural
networks. Such an approach is the closest to
networks. Such an approach is the closest to
linear adaptive filter theory
linear adaptive filter theory
, which is already well
, which is already well
established and successfully applied in such
established and successfully applied in such
diverse fields as communications, control, radar,
diverse fields as communications, control, radar,
sonar, seismology, and biomedical engineering
sonar, seismology, and biomedical engineering
(Haykin, 1991; Widrow and Stearns, 1985).
(Haykin, 1991; Widrow and Stearns, 1985).

However, it is also possible for a neural
However, it is also possible for a neural
network to modify its own topology, which
network to modify its own topology, which
is motivated by the fact that neurons in the
is motivated by the fact that neurons in the
human brain can die and that new synaptic
human brain can die and that new synaptic
connections can grow.
connections can grow.

Neural networks are also referred to in the
Neural networks are also referred to in the
literature as
literature as
neurocomputers, connectionist
neurocomputers, connectionist
networks, parallel distributed processors,
networks, parallel distributed processors,
etc.
etc.
Benefits of Neural Networks

From the above discussion, it is apparent that a
From the above discussion, it is apparent that a
neural network derives its computing power
neural network derives its computing power
through:
through:
1. 1.
its
its
massively parallel distributed structure
,
,
1. 1.
its ability to learn and therefore
its ability to learn and therefore
generalize
generalize
;
;
generalization
generalization
refers to the neural network
refers to the neural network
producing reasonable outputs for inputs
producing reasonable outputs for inputs
not
not
encountered during training
encountered during training
(learning).
(learning).
How does the following example help you to generalize ? How does the following example help you to generalize ?
confer confer: L : L. . conferre conferre- - con con- -, together, , together, ferre, ferre, to bring to bring
v.t. v.t. to give, to bestow (to place or put by), to talk or consult tog to give, to bestow (to place or put by), to talk or consult together ether
defer defer: L. : L. differre differre- - dis dis- -, , asunder ( asunder (adv adv. apart, into parts, separately), . apart, into parts, separately), ferre ferre, to bear , to carry , to bear , to carry
v.t. v.t. to put off to another time, to delay to put off to another time, to delay
defer defer: L. : L. deferre deferre- - de de- -, , down, down, ferre ferre, to bear , to bear
v.i. v.i. to yield (to the wishes or opinions of another, or to authorit to yield (to the wishes or opinions of another, or to authority), y), v.t. v.t. to submit or to submit or
to or to lay before somebody to or to lay before somebody
differ differ: L. : L. differre differre- - dif.( dif.( for for dis dis- -), ), apart, apart, ferre ferre, to bear , to bear
v.i. v.i. to be unlike, distinct or various to be unlike, distinct or various
infer infer: L. : L. inferre inferre- - in in- -, , into, into, ferre ferre, to bring , to bring
v.t.. v.t.. to bring on, to drive as a conclusion to bring on, to drive as a conclusion
prefer prefer: L. : L. preaferre preaferre- - prea prea- -,in front of, ,in front of, ferre ferre, to bear , to bear
v.t. v.t. to set in front, to put forward, offer, submit, present, for ac to set in front, to put forward, offer, submit, present, for acceptance or consideration, ceptance or consideration,
to promote to promote
convene convene L. L. convenire convenire, con , con- - together, and together, and venire venire, to come , to come
v.i. v.i. to come together, to come together, v.i. v.i. to call together to call together
c convent onvent v.t. v.t. to convene to convene
convention convention the act of convening, :an assembly, esp. of the act of convening, :an assembly, esp. of
special delegates for some common object, an agreement special delegates for some common object, an agreement
(Geneva Convention) (Geneva Convention)
invent invent L. L. invenire invenire, inventum, in , inventum, in- -, , upon, upon, venire, venire, to come to come
v.t. v.t. to find, to device or contrive to find, to device or contrive
prevent prevent L. L. preavenire preavenire, prea , prea- - in front of, in front of, venire venire, to come , to come
v.t. v.t. to precede, to be, go, act, earlier than, to preclude, to stop, to precede, to be, go, act, earlier than, to preclude, to stop,
keep, or hinder effectually, to keep from coming to pass keep, or hinder effectually, to keep from coming to pass

Synonym
Synonym
:
:
1432 (but rare before 18c.), from L.
1432 (but rare before 18c.), from L.
synonymum,
synonymum,
from Gk.
from Gk.
synonymon
synonymon
"word having the
"word having the
same sense as another," noun use of neut. of
same sense as another," noun use of neut. of
synonymos
synonymos
"having the same name as,
"having the same name as,
synonymous," from
synonymous," from
syn
syn
-
-
"together, same" +
"together, same" +
onyma,
onyma,
Aeolic dialectal form of
Aeolic dialectal form of
onoma
onoma
"name"
"name"
(see
(see
name
name
).
).
Synonymous
Synonymous
is attested from 1610.
is attested from 1610.

Antonym:
Antonym:
1870, created to serve as opposite of
1870, created to serve as opposite of
synonym
synonym
, from Gk.
, from Gk.
anti
anti
-
-
"equal to, instead of,
"equal to, instead of,
opposite" (see
opposite" (see
anti
anti
-
-
) +
) +
-
-
onym
onym
"name" (see
"name" (see
name
name
).
).

Anonym
Anonym
ous
ous
:
:
1601, from Gk.
1601, from Gk.
anonymos
anonymos
"without a
"without a
name," from
name," from
an
an
-
-
"without" +
"without" +
onyma,
onyma,
olic
olic
dialectal form of
dialectal form of
onoma
onoma
"name" (see
"name" (see
name
name
).
).

These two infor
These two infor
mation
mation
-
-
processing
processing
capabilities,i.e.,
capabilities,i.e.,
(1)
(1)
(2)
(2)
the ability to
the ability to
generalize
generalize
make it possible for neural networks to solve
make it possible for neural networks to solve
complex (large
complex (large
-
-
scale) problems that are
scale) problems that are
currently intractable. In practice, however,
currently intractable. In practice, however,
neural networks cannot provide the solution
neural networks cannot provide the solution
working by themselves alone. Rather, they need
working by themselves alone. Rather, they need
to be integrated into a consistent system
to be integrated into a consistent system
engineering approach.
engineering approach.

Specifically, a complex problem of interest is
Specifically, a complex problem of interest is
decomposed
decomposed
into a number of relatively simple tasks,
into a number of relatively simple tasks,
and neural networks are assigned a subset of the tasks
and neural networks are assigned a subset of the tasks
e.g.,
e.g.,
1. 1.
pattern recognition,
pattern recognition,
2. 2.
associative memory,
associative memory,
3. 3.
control,etc.
control,etc.

that
that
match
match
their inherent capabilities. It is important to
their inherent capabilities. It is important to
recognize, however, that we have a long way to go (
recognize, however, that we have a long way to go (
if
if
ever
ever
) before we can build a computer architecture that
) before we can build a computer architecture that
mimics the human brain.
mimics the human brain.
Properties and Capabilities of Neural
Networks
Networks
1.
1.
Nonlinearity
Nonlinearity
A neuron is basically a nonlinear device.
A neuron is basically a nonlinear device.
Consequently, a neural network, made up of an
Consequently, a neural network, made up of an
interconnection of neurons, is itself nonlinear.
interconnection of neurons, is itself nonlinear.
Moreover, the
Moreover, the
nonlinearity
nonlinearity
is of a special kind in
is of a special kind in
the sense that it is
the sense that it is
distributed
distributed
throughout the
throughout the
network. Nonlinearity is a highly important
network. Nonlinearity is a highly important
property, particularly if the underlying physical
property, particularly if the underlying physical
mechanism responsible for the generation of an
mechanism responsible for the generation of an
input signal (e.g., speech signal) is inherently
input signal (e.g., speech signal) is inherently
nonlinear.
nonlinear.
2. Input
2. Input
-
-
Output Mapping
Output Mapping
A popular paradigm of learning called
A popular paradigm of learning called
supervised learning
supervised learning
involves the modification
involves the modification
of the synaptic weights of a neural network by
of the synaptic weights of a neural network by
applying a set of labeled
applying a set of labeled
training samples
training samples
or
or
task examples
task examples
.
.
Each example consists of a
Each example consists of a
unique
unique
input signal
input signal
and the corresponding
and the corresponding
desired response
desired response
.
.
Networks
Networks

The network is presented an example picked at
The network is presented an example picked at
random from the set,
random from the set,
and
and

the synaptic weights(free parameters) of the
the synaptic weights(free parameters) of the
network are modified so as to minimize the
network are modified so as to minimize the
difference between the desired response and the
difference between the desired response and the
actual response of the network
actual response of the network
Networks
Networks

The training of the network is repeated for many
The training of the network is repeated for many
examples in the set until the network reaches a
examples in the set until the network reaches a
steady state, where there are no further significant
steady state, where there are no further significant
changes in the synaptic weights;
changes in the synaptic weights;

The previously applied training examples may be
The previously applied training examples may be
reapplied during the training session but in a
reapplied during the training session but in a
different order.
different order.
Networks
Networks
Thus the network learns from the examples by
Thus the network learns from the examples by
constructing an
constructing an
input
input
-
-
output mapping
output mapping
for the
for the
problem at hand.
problem at hand.
Properties and Capabilities of
Neural Networks
Neural Networks

Such an approach brings to mind the study of
Such an approach brings to mind the study of
nonparametric statistical inference
nonparametric statistical inference
which is a branch of
which is a branch of
statistics dealing with model
statistics dealing with model
-
-
free estimation, or, from a
free estimation, or, from a
biological viewpoint,
biological viewpoint,
tabula rasa
tabula rasa
learning (Geman et al.,
learning (Geman et al.,
1992).
1992).

(
(
tabula rasa:
tabula rasa:
a smoothed or blank tablet, a mind not yet
a smoothed or blank tablet, a mind not yet
influenced by outside impressions and experiences)
influenced by outside impressions and experiences)
(
(
[Medieval Latin tabula r
[Medieval Latin tabula r
sa : Latin tabula,
sa : Latin tabula,
tablet
tablet
+ Latin
+ Latin
r
r
sa, feminine of r
sa, feminine of r
sus,
sus,
erased
erased
.]
.]
Networks
Networks

Consider, for example, a
Consider, for example, a
pattern classification
pattern classification
task,
task,
where the requirement is to assign an input signal
where the requirement is to assign an input signal
representing a physical object or event to one of several
representing a physical object or event to one of several
prespecified categories (classes).
prespecified categories (classes).

In a nonparametric approach to this problem, the
In a nonparametric approach to this problem, the
requirement is to "estimate" arbitrary decision
requirement is to "estimate" arbitrary decision
boundaries in the input signal space for the pattern
boundaries in the input signal space for the pattern
-
-
classification task using a set of examples, and to do so
classification task using a set of examples, and to do so
without
without
invoking a probabilistic distribution model.
invoking a probabilistic distribution model.
Neural Networks
Neural Networks

A similar point of view is implicit in the
A similar point of view is implicit in the
supervised learning paradigm, which suggests
supervised learning paradigm, which suggests
a close analogy between the input
a close analogy between the input
-
-
output
output
mapping performed by a neural network and
mapping performed by a neural network and
nonparametric statistical inference.
nonparametric statistical inference.
Networks
Networks

p
p
aradigm
aradigm
:
:
1.
1.
Grammar.
Grammar.
a.
a.
a set of forms all of which
a set of forms all of which
contain a particular element, esp. the set of all
contain a particular element, esp. the set of all
inflected forms based on a single stem or theme.
inflected forms based on a single stem or theme.
b.
b.
a display in fixed arrangement of such a set, as
a display in fixed arrangement of such a set, as
boy, boy's, boys, boys'.
boy, boy's, boys, boys'.
2.
2.
an example serving as a
an example serving as a
model; pattern. [Origin:
model; pattern. [Origin:
1475
1475
85;
85;
< LL
< LL
parad
parad
gma
gma
< Gk
< Gk
par
par
deigma
deigma
pattern (verbid of
pattern (verbid of
paradeiknnai
paradeiknnai
to show side by side), equiv. to
to show side by side), equiv. to
para
para
-
-
para
para
-
-
1
1
+
+
deik
deik
-
-
,
,
base of
base of
deiknnai
deiknnai
to show (see
to show (see
deictic
deictic
) +
) +
-
-
ma
ma
n.
n.
suffix ]
suffix ]
Networks
Networks

analogy
analogy
:
:
1550, from L.
1550, from L.
analogia,
analogia,
from Gk.
from Gk.
analogia
analogia
"proportion," from
"proportion," from
ana
ana
-
-
"upon,
"upon,
according to" +
according to" +
logos
logos
"ratio," also "word,
"ratio," also "word,
speech, reckoning." A mathematical term
speech, reckoning." A mathematical term
used in a wider sense by Plato.
used in a wider sense by Plato.
3.
3.
Adaptivity
Adaptivity
.
.
Neural networks have a built
Neural networks have a built
-
-
in capability to
in capability to
adapt
adapt
their synaptic weights to changes in the
their synaptic weights to changes in the
surrounding environment. In particular, a neural
surrounding environment. In particular, a neural
network trained to operate in a specific
network trained to operate in a specific
environment can be easily
environment can be easily
retrained
retrained
to deal with
to deal with
minor changes in the operating environmental
minor changes in the operating environmental
conditions
conditions
.
.
Networks
Networks

Moreover, when it is operating in a
Moreover, when it is operating in a
nonstationary
nonstationary
environment (i.e., one whose statistics change
environment (i.e., one whose statistics change
with time), a neural network can be designed to
with time), a neural network can be designed to
change its synaptic weights in real time. The
change its synaptic weights in real time. The
natural architecture of a neural network for
natural architecture of a neural network for
pattern classification, signal processing, and
pattern classification, signal processing, and
control applications, coupled with the adaptive
control applications, coupled with the adaptive
capability of the network, make it an ideal tool for
capability of the network, make it an ideal tool for
use in adaptive pattern
use in adaptive pattern
classification, adaptive
classification, adaptive
signal processing, and adaptive control.
signal processing, and adaptive control.
Networks
Networks

As a general rule, it may be said that the more
As a general rule, it may be said that the more
adaptive we make a system in a properly
adaptive we make a system in a properly
designed fashion, assuming the adaptive system
designed fashion, assuming the adaptive system
is stable, the more robust its performance will
is stable, the more robust its performance will
likely be when the system is required to operate
likely be when the system is required to operate
in a nonstationary environment.
in a nonstationary environment.
Neural Networks
Neural Networks

It should be emphasized, however, that
It should be emphasized, however, that
adaptivity
adaptivity
does not always lead to
does not always lead to
robustness; indeed, it may do the very
robustness; indeed, it may do the very
opposite. For example, an adaptive system
opposite. For example, an adaptive system
with short time constants may change
with short time constants may change
rapidly and therefore tend to respond to
rapidly and therefore tend to respond to
spurious disturbances, causing a drastic
spurious disturbances, causing a drastic
degradation in system performance.
degradation in system performance.
Neural Networks
Neural Networks

To realize the full benefits of
To realize the full benefits of
adaptivity
adaptivity
, the
, the
principal time constants of the system should be
principal time constants of the system should be
long enough for the system to ignore spurious (
long enough for the system to ignore spurious (
L.
L.
spurius
spurius
, false) disturbances and yet short enough
, false) disturbances and yet short enough
to respond to meaningful changes in the
to respond to meaningful changes in the
environment; the problem described here is
environment; the problem described here is
referred to as the
referred to as the
stability
stability
-
-
plasticity dilema
plasticity dilema
(Grossberg, 1988). Adaptivity (or
(Grossberg, 1988). Adaptivity (or
in situ
in situ
,(L.)in
,(L.)in
the original situation
the original situation
training as it is sometimes
training as it is sometimes
referred to) is an open research topic.
referred to) is an open research topic.
Neural Networks
Neural Networks
4. Evidential Response
4. Evidential Response
In the context of pattern classification, a neural
In the context of pattern classification, a neural
network can be designed to provide information
network can be designed to provide information
not only about which particular pattern to
not only about which particular pattern to
select
select
,
,
but also about the
but also about the
confidence
confidence
in the decision
in the decision
made. This latter information may be used to
made. This latter information may be used to
reject ambiguous patterns, should they arise, and
reject ambiguous patterns, should they arise, and
thereby improve the classification performance of
thereby improve the classification performance of
the network.
the network.
Neural Networks
Neural Networks
5. Contextual Information
5. Contextual Information
(
(
L. contextus,contexere
L. contextus,contexere
-
-
con
con
-
-
, texere, textum,
, texere, textum,
to weave)
to weave)
Knowledge is represented by the very structure and
Knowledge is represented by the very structure and
activation state of a neural network.
activation state of a neural network.
Every neuron in the network is potentially affected by the
Every neuron in the network is potentially affected by the
global activity of all other neurons in the network.
global activity of all other neurons in the network.
Consequently, contextual information is dealt with
Consequently, contextual information is dealt with
naturally by a neural network.
naturally by a neural network.
Neural Networks
Neural Networks
6. Fault Tolerance
6. Fault Tolerance
A neural network, implemented in hardware form, has
A neural network, implemented in hardware form, has
the potential to be inherently
the potential to be inherently
fault tolerant
fault tolerant
in the sense
in the sense
that its performance is
that its performance is
degraded gracefully
degraded gracefully
under adverse
under adverse
operating conditions (Bolt, 1992).
operating conditions (Bolt, 1992).
For example, if a neuron or its connecting links are
For example, if a neuron or its connecting links are
damaged, recall of a stored pattern is impaired in quality.
damaged, recall of a stored pattern is impaired in quality.
However, owing to the distributed nature of information
However, owing to the distributed nature of information
in the network, the damage has to be extensive before
in the network, the damage has to be extensive before
the overall response of the network is degraded seriously.
the overall response of the network is degraded seriously.
Thus, in principle, a neural network exhibits a graceful
Thus, in principle, a neural network exhibits a graceful
degradation in performance rather than
degradation in performance rather than
catastrophic
catastrophic
failure
failure
.
.
Neural Networks
Neural Networks
7.
7.
VLSI Implementability
VLSI Implementability
The massively parallel nature of a neural network makes it The massively parallel nature of a neural network makes it
potentially fast for the computation of certain tasks. This same potentially fast for the computation of certain tasks. This same
feature makes a neural network ideally suited for implementation feature makes a neural network ideally suited for implementation
using using very very- -Iarge Iarge- -scale scale- -integrated integrated (VLSI) (VLSI) technology. technology.
The particular virtue of VLSI is that it provides a means of The particular virtue of VLSI is that it provides a means of
capturing truly complex behavior in a highly hierarchical fashio capturing truly complex behavior in a highly hierarchical fashion n
(Mead and Conway, 1980), which makes it possible to use a neural (Mead and Conway, 1980), which makes it possible to use a neural
network as a tool for real network as a tool for real- -time applications involving pattern time applications involving pattern
recognition, signal processing, and control. recognition, signal processing, and control.
Neural Networks
Neural Networks

8.
8.
Uniformity of Analysis and Design
Uniformity of Analysis and Design
.
.
Basically, neural
Basically, neural
networks enjoy
networks enjoy
universality
universality
as information processors.
as information processors.
We say this in the sense that the same notation is used in
We say this in the sense that the same notation is used in
all the domains involving the application of neural
all the domains involving the application of neural
networks. This feature manifests itself in different ways:
networks. This feature manifests itself in different ways:

Neurons, in one form or another, represent an ingredient
Neurons, in one form or another, represent an ingredient
common
common
to all neural networks.
to all neural networks.

This commonality makes it possible to
This commonality makes it possible to
share
share
theories
theories
and learning algorithms in different applications of
and learning algorithms in different applications of
neural networks.
neural networks.

Modular networks can be built through a
Modular networks can be built through a
seamless
seamless
integration of modules.
integration of modules.
Neural Networks
Neural Networks
Networks
Networks
a analysis nalysis: : [Medieval Latin, from Greek analusis, [Medieval Latin, from Greek analusis, a dissolving a dissolving, from anal , from anal ein, ein, to to
undo undo : ana : ana- -, , throughout throughout; see ; see ana ana- - + l + l ein, ein, to loosen to loosen; see leu ; see leu- - in Indo in Indo- -European European
roots.] roots.]
( (Download Now Download Now or or Buy the Book Buy the Book) ) The American Heritage The American Heritage Dictionary of the English Dictionary of the English
Language, Fourth Edition Language, Fourth Edition
Copyright Copyright 2006 by Houghton Mifflin Company. 2006 by Houghton Mifflin Company.
Published by Houghton Mifflin Company. All rights reserved. Published by Houghton Mifflin Company. All rights reserved.Online Etymology Online Etymology
Dictionary Dictionary - - Cite This Source Cite This Source - - Share This Share This
analysis analysis
1581, "resolution of anything complex into simple elements" (opp 1581, "resolution of anything complex into simple elements" (opposite of osite of
synthesis synthesis), from M.L. ), from M.L. analysis, analysis, from Gk. from Gk. analysis analysis "a breaking up," from "a breaking up," from analyein analyein
"unloose," from "unloose," from ana ana- - "up, throughout" + "up, throughout" + lysis lysis "a loosening" (see "a loosening" (see lose lose). ).
Psychological sense is from 1890. Phrase Psychological sense is from 1890. Phrase in the final (or last) analysis in the final (or last) analysis (1844), (1844),
translates Fr. translates Fr. en derni en derni re analyse. re analyse.
Networks
Networks

Design
Design
:
:
1548, from L. designare "mark out,
1548, from L. designare "mark out,
devise," from de
devise," from de
-
-
"out" + signare "to mark," from
"out" + signare "to mark," from
signum "a mark, sign." Originally in Eng. with the
signum "a mark, sign." Originally in Eng. with the
meaning now attached to designate (1646, from L.
meaning now attached to designate (1646, from L.
designatus, pp. of designare); many modern uses
designatus, pp. of designare); many modern uses
of design are metaphoric extensions. Designer
of design are metaphoric extensions. Designer
(adj.) in the fashion sense of "prestigious" is first
(adj.) in the fashion sense of "prestigious" is first
recorded 1966; designer drug is from 1983.
recorded 1966; designer drug is from 1983.
Designing "scheming" is from 1671. Designated
Designing "scheming" is from 1671. Designated
hitter introduced in American League baseball in
hitter introduced in American League baseball in
1973, soon giving wide figurative extension to
1973, soon giving wide figurative extension to
designated.
designated.
9. Neurobiological Analogy
9. Neurobiological Analogy

The design of a neural network is motivated by
The design of a neural network is motivated by
analogy with the brain, which is a living proof
analogy with the brain, which is a living proof
that fault
that fault
-
-
tolerant parallel processing is not only
tolerant parallel processing is not only
physically possible but also fast and powerful.
physically possible but also fast and powerful.
Neurobiologists look to (artificial) neural
Neurobiologists look to (artificial) neural
networks as a research tool for the interpretation
networks as a research tool for the interpretation
of neurobiological phenomena.
of neurobiological phenomena.
Neural Networks
Neural Networks
For example, neural networks have been used to provide insight For example, neural networks have been used to provide insight
on the development of premotor on the development of premotor ( (relating to, or being the area of relating to, or being the area of
the cortex of the frontal lobe lying immediately in front of the the cortex of the frontal lobe lying immediately in front of the
motor area of the precentral gyrus motor area of the precentral gyrus( (Any of the prominent, rounded, Any of the prominent, rounded,
elevated convolutions on the surfaces of the cerebral hemisphere elevated convolutions on the surfaces of the cerebral hemispheres. s.
[Latin g [Latin g rus, rus, circle circle; see gyre.] ; see gyre.] ) ) ) )circuits in the oculomotor circuits in the oculomotor
(1. (1.Of or relating to movements of the eyeball: Of or relating to movements of the eyeball: an oculomotor an oculomotor
muscle. muscle.
2. 2.Of or relating to the oculomotor nerve. Of or relating to the oculomotor nerve.
[Latin oculus, [Latin oculus, eye eye; see okw ; see okw- - in Indo in Indo- -European roots + motor.] European roots + motor.]
system (responsible for eye movements) and the manner in which system (responsible for eye movements) and the manner in which
they process signals (Robinson, 1992). On the other hand, they process signals (Robinson, 1992). On the other hand,
engineers look to neurobiology for new ideas to solve problems engineers look to neurobiology for new ideas to solve problems
more complex than those based on conventional hard more complex than those based on conventional hard- -wired wired
design techniques. design techniques.
Neural Networks
Neural Networks

Here, for example, we may mention the development of
Here, for example, we may mention the development of
a model sonar receiver based on the bat (Simmons et aI.,
a model sonar receiver based on the bat (Simmons et aI.,
1992). The bat
1992). The bat
inspired model consists of three stages:

inspired model consists of three stages:

(1) a front end that mimics the inner ear of the bat in
(1) a front end that mimics the inner ear of the bat in
order to encode waveforms;
order to encode waveforms;

(2) a subsystem of delay lines that computes echo delays;
(2) a subsystem of delay lines that computes echo delays;

(3) a subsystem that computes the spectrum of echoes,
(3) a subsystem that computes the spectrum of echoes,
which is in turn used to estimate the time separation of
which is in turn used to estimate the time separation of
echoes from multiple target glints.
echoes from multiple target glints.
Neural Networks
Neural Networks

The motivation is to develop a new sonar receiver
The motivation is to develop a new sonar receiver
that is superior to one designed by conventional
that is superior to one designed by conventional
methods. The neurobiological analogy is also
methods. The neurobiological analogy is also
useful in another important way: It provides a
useful in another important way: It provides a
hope and be
hope and be
l
l
ief (
ief (
an
an
d, to a certain extent, an
d, to a certain extent, an
existence proof) that physical understanding of
existence proof) that physical understanding of
neurob
neurob
io
io
log
log
i
i
cal structures could indeed influence
cal structures could indeed influence
the art of electronics and thus VLSI (Andreou,
the art of electronics and thus VLSI (Andreou,
1992).
1992).
Neural Networks
Neural Networks

With inspiration from neurobiological analogy in
With inspiration from neurobiological analogy in
mind, it seems appropriate that we take a brief
mind, it seems appropriate that we take a brief
look at the structural levels of organization in the
look at the structural levels of organization in the
brain
brain
.
.
Neural Networks
Neural Networks

1.2 Structural Levels of Organization in the
1.2 Structural Levels of Organization in the
Brain
Brain
The human nervous system may be viewed
The human nervous system may be viewed
as a three
as a three
-
-
stage system,(Arbib, 1987).
stage system,(Arbib, 1987).
Block-diagram representation of nervous system
Neural Networks
Neural Networks
Central to the system is the
Central to the system is the
brain
brain
,
,
represented
represented
b
b
y the
y the
neural (ner
neural (ner
v
v
e) net
e) net
in
in
this figure
this figure
, which continually
, which continually
receives information, perceives
receives information, perceives
i
i
t, and makes
t, and makes
appropriate decisions. Two sets of arrows are shown in
appropriate decisions. Two sets of arrows are shown in
this figure:
this figure:
1
1
Those pointing from left to right indicate the
Those pointing from left to right indicate the
forward
forward
transmission of information
transmission of information
-
-
bearing signals through
bearing signals through
the system.
the system.
2
2
T
T
he arrows pointing from right to left signify the
he arrows pointing from right to left signify the
presence of
presence of
feedback
feedback
in the system.
in the system.
Neural Networks
Neural Networks
The
The
receptors
receptors
in
in
the figure
the figure
convert stimuli from
convert stimuli from
the human body or the external environment into
the human body or the external environment into
electrical impulses that convey information to the
electrical impulses that convey information to the
neural net (brain). The
neural net (brain). The
effectors
effectors
,
,
on the other
on the other
hand, convert electrical impulses generated by the
hand, convert electrical impulses generated by the
neural net into discernible responses as system
neural net into discernible responses as system
outputs. . In the brai
outputs. . In the brai
n
n
there are both small
there are both small
-
-
scale
scale
and large
and large
-
-
scale anatomical organizations, and
scale anatomical organizations, and
di
di
fferent funct
fferent funct
io
io
ns take place at lower and higher
ns take place at lower and higher
levels.
levels.
Neural Networks
Neural Networks
This figure
This figure
shows a
shows a
hierarchy of
hierarchy of
in
in
terwoven
terwoven
levels of organization that
levels of organization that
has emerged from the
has emerged from the
extensive work done on
extensive work done on
the analysis of local
the analysis of local
regions in the brain
regions in the brain
(Churchland and
(Churchland and
Sejnowski, 1992; Shepherd
Sejnowski, 1992; Shepherd
and Koch, 1990).
and Koch, 1990).
Neural Networks
Neural Networks
Proceeding upward from
Proceeding upward from
synapses
synapses
that represent
that represent
the most fundamental level and that depend on
the most fundamental level and that depend on
molecules and ions for their action, we have
molecules and ions for their action, we have
neural microcircuits
neural microcircuits
dendritic trees
dendritic trees
, and then
, and then
neurons.
neurons.
Neural Networks
Neural Networks
A
A
neural microcircuit
neural microcircuit
refers to an assembly
refers to an assembly
of synapse
of synapse
s
s
orga
orga
ni
ni
zed
zed
in
in
to patterns of
to patterns of
connectivity so as to produce a functional
connectivity so as to produce a functional
operation of interest. A n
operation of interest. A n
e
e
ural microcircuit
ural microcircuit
may be likened to a silicon chip made up of
may be likened to a silicon chip made up of
an assembly of trans
an assembly of trans
i
i
stors.
stors.
Neural Networks
Neural Networks
The smallest size of microcircuits is measured in
The smallest size of microcircuits is measured in
micrometers
micrometers
(
(
m
m
), and their fastest speed of
), and their fastest speed of
operation is measured in milliseconds. The neural
operation is measured in milliseconds. The neural
microcircuits are grouped to form
microcircuits are grouped to form
dendritic
dendritic
subunits
subunits
within the
within the
dendritic
dendritic
trees
trees
of individual
of individual
neurons. The whole
neurons. The whole
neuron
neuron
,
,
about 100
about 100
m in size,
m in size,
contains several dendritic subunits.
contains several dendritic subunits.
Neural Networks
Neural Networks

At the next level of complexity, we have
At the next level of complexity, we have
local
local
circuits
circuits
(about 1 mm in size) made up of
(about 1 mm in size) made up of
neurons with similar or different properties;
neurons with similar or different properties;
these neural assemblies perform operations
these neural assemblies perform operations
characteristic of a localized region in the
characteristic of a localized region in the
brain.
brain.
Neural Networks
Neural Networks

This is followed by
This is followed by
interregional circuits
interregional circuits
made up
made up
of pathways, columns, and topographic maps,
of pathways, columns, and topographic maps,
which involve multiple regions located in different
which involve multiple regions located in different
parts of the brain.
parts of the brain.
Neural Networks
Neural Networks

Topographic maps
Topographic maps
are organized to respond to
are organized to respond to
incoming sensory information. These maps are
incoming sensory information. These maps are
often arranged in sheets, as in the superior
often arranged in sheets, as in the superior
colliculus, where the
colliculus, where the
visual, auditory, and
visual, auditory, and
somatosensory
somatosensory
maps are stacked in adjacent
maps are stacked in adjacent
layers in such a way that stimuli from
layers in such a way that stimuli from
corresponding points in space lie above each
corresponding points in space lie above each
other. Finally, the topographic maps, and other
other. Finally, the topographic maps, and other
interregional circuits mediate specific types of
interregional circuits mediate specific types of
behavior in the
behavior in the
central nervous system
central nervous system
.
.
Neural Networks
Neural Networks

It is important to recognize that the
It is important to recognize that the
structural levels of organization described
structural levels of organization described
herein are a unique characteristic of the
herein are a unique characteristic of the
brain.
brain.
They are nowhere to be found in a
They are nowhere to be found in a
digital computer, and we are nowhere close
digital computer, and we are nowhere close
to realizing them with artificial neural
to realizing them with artificial neural
networks.
networks.
Nevertheless, we are inching our
Nevertheless, we are inching our
way toward a hierarchy of computational
way toward a hierarchy of computational
levels similar to that described in the
levels similar to that described in the
last
last
figure.
figure.
Neural Networks
Neural Networks
Neural Networks
Neural Networks
The artificial neurons we use to build our
neural networks are truly primitive in
comparison to those found in the brain.
The neural networks we are presently able to
design are just as primitive compared to the
local circuits and the interregional circuits in
the brain.

What is really satisfying, however, is the
What is really satisfying, however, is the
remarkable progress that we have made on
remarkable progress that we have made on
so many fronts during the past 20 years.
so many fronts during the past 20 years.
With the neurobiological analogy as the
With the neurobiological analogy as the
source of inspiration, and the wealth of
source of inspiration, and the wealth of
theoretical and technological tools that we
theoretical and technological tools that we
are bringing together, it is for certain that in
are bringing together, it is for certain that in
another 10 years our understanding of
another 10 years our understanding of
artificial neural networks will be much more
artificial neural networks will be much more
sophisticated than it is today.
sophisticated than it is today.
Neural Networks
Neural Networks

Our primary interest here is confined to the study
Our primary interest here is confined to the study
of artificial neural networks from an engineering
of artificial neural networks from an engineering
perspective, to which
perspective, to which
we refer simply as neural
we refer simply as neural
networks
networks
. We begin the study by describing the
. We begin the study by describing the
models of (artificial) neurons
models of (artificial) neurons
that form the basis of the neural networks
that form the basis of the neural networks
considered in these lectures.
considered in these lectures.
Neural Networks
Neural Networks
Models of a Neuron
Models of a Neuron

Models of a Neuron
Models of a Neuron
A
A
neuron
neuron
is an information
is an information
-
-
processing unit that is
processing unit that is
fundamental to the operation of a neural network.
fundamental to the operation of a neural network.
The figure on the next slide shows the
The figure on the next slide shows the
model
model
for a
for a
neuron.
neuron.
Nonlinear model of a neuron
Models of a Neuron
Models of a Neuron
1.
1.
A set of
A set of
synapses
synapses
or
or
connecting links
connecting links
,
,
each
each
of which is characterized by a
of which is characterized by a
weight
weight
or
or
strength
strength
of its own. Specifically, a signal
of its own. Specifically, a signal
x
x
j j
at the input of synapse
at the input of synapse
j
j
connected to
connected to
neuron
neuron
k
k
is multiplied by the synaptic
is multiplied by the synaptic
weight
weight
w
w
kj kj
.
.
It is important to make a note
It is important to make a note
of the manner in which the subscripts of
of the manner in which the subscripts of
the synaptic weight
the synaptic weight
w
w
kj kj
are written.
are written.
Models of a Neuron
Models of a Neuron
The first subscript refers to the neuron in
The first subscript refers to the neuron in
question and the second subscript refers to
question and the second subscript refers to
the input end of the synapse to which the
the input end of the synapse to which the
weight refers; the reverse of this notation is
weight refers; the reverse of this notation is
also used in the literature.
also used in the literature.
The weight
The weight
w
w
kj kj
is
is
positive
positive
if the associated
if the associated
synapse is
synapse is
excitatory
excitatory
; it is
; it is
negative
negative
if the
if the
synapse is
synapse is
inhibitory
inhibitory
(
(
Middle English
Middle English
inhibiten, to forbid, from Latin inhib
inhibiten, to forbid, from Latin inhib
re, inhibit
re, inhibit
-
-
, to restrain, forbid : in
, to restrain, forbid : in
-
-
, in; see in
, in; see in
-
-
2 + hab
2 + hab
re, to
re, to
hold; see ghabh
hold; see ghabh
-
-
in Indo
in Indo
-
-
European roots.]
European roots.]
)
)
Models of a Neuron
Models of a Neuron
Models of a Neuron
Models of a Neuron
2.
2.
An
An
adder
adder
for summing the input signals,
for summing the input signals,
weighted by the respective synapses of
weighted by the respective synapses of
the neuron; the operations described
the neuron; the operations described
here constitute a
here constitute a
linear combiner.
linear combiner.
3.
3.
An
An
activation function
activation function
for limiting the amplitude
for limiting the amplitude
of the output of a neuron. The activation function,
of the output of a neuron. The activation function,
is also referred to in the literature as a
is also referred to in the literature as a
squashing
squashing
function
function
in that it squashes (limits) the permissible
in that it squashes (limits) the permissible
amplitude range of the output signal to some finite
amplitude range of the output signal to some finite
value. Typically, the normalized amplitude range
value. Typically, the normalized amplitude range
of the output of a neuron is written as the closed
of the output of a neuron is written as the closed
unit interval [0,1] or alternatively [
unit interval [0,1] or alternatively [
-
-
1,1].
1,1].
Models of a Neuron
Models of a Neuron
4.
4.
The model of a neuron also includes an
The model of a neuron also includes an
externally applied
externally applied
threshold
threshold
u
u
k k
that has the
that has the
effect of lowering the net input of the
effect of lowering the net input of the
activation function.
activation function.
On the other hand, the net input of the
On the other hand, the net input of the
activation function may be increased by
activation function may be increased by
employing a
employing a
bias
bias
term rather than a
term rather than a
threshold; the bias is the negative of the
threshold; the bias is the negative of the
threshold
threshold. .
Models of a Neuron
Models of a Neuron
In mathematical terms, we may describe neuron
In mathematical terms, we may describe neuron
k
k
by
by
writing the following pair of equations:
writing the following pair of equations:
Models of a Neuron
Models of a Neuron
) u ( y
x w u
k k k
j
p
j
kj k
u =
=
=1
where
x
x
j j
s
s are the input signals;
w
w
kj kj
s are the synaptic
weights of neuron k; u
k
is the linear combiner output;
u
k
is the threshold; |(.) is the activation function;
and y
k
is the output signal of the neuron.
Mathematical Model
Mathematical Model
of a Neuron
of a Neuron
Models of a Neuron
Models of a Neuron
Block
Block
-
-
Diagram Representation of a Neuron
Diagram Representation of a Neuron
) u ( y
x w u
k k k
j
p
j
kj k
u =
=
=1

Models of a Neuron
Models of a Neuron
The use of
The use of
threshold
threshold
k k
has the effect of
has the effect of
applying an
applying an
affine transformation
affine transformation
to the
to the
output
output
u
u
k k
of the linear combiner in the
of the linear combiner in the
model of the figure, as shown by
model of the figure, as shown by
k k k
u v u =
Models of a Neuron
Models of a Neuron
In particular, depending on
whether the
threshold
threshold
u
u
k k
is
positive or negative, the
relationship between the
effective internal activity level or
activation potential v
k
of neuron
k and the linear combiner
output u
k
is modified in the
manner illustrated in the figure.
Note that as a result of this
affine transformation, the graph
of v
k
versus u
k
no longer passes
through the origin.
The
The
threshold
threshold
k
k
is an external parameter
is an external parameter
of artificial neuron
of artificial neuron
k.
k.
We may account for its
We may account for its
presence as in the above equation.
presence as in the above equation.
Equivalently, we may formulate the
Equivalently, we may formulate the
combination of the two equations as follows:
combination of the two equations as follows:
Models of a Neuron
Models of a Neuron
0
( )
p
k kj j
j
k k
v w x
y v
=
=
=
Here we have added a new synapse, whose input

Here we have added a new synapse, whose input
is
is
Models of a Neuron
Models of a Neuron
1
0
= x
and whose weight is
k k
w u =
0
We may therefore
We may therefore
reformulate the model
reformulate the model
of neuron
of neuron
k
k
as in the
as in the
figure, where the effect
figure, where the effect
of the threshold is
of the threshold is
represented by doing
represented by doing
two things:
two things:
Models of a Neuron
Models of a Neuron
(1) adding a new input signal fixed at
(1) adding a new input signal fixed at
-
-
1, and
1, and
(2) adding a new synaptic weight equal to the
(2) adding a new synaptic weight equal to the
threshold
threshold
k
k
.
.
Alternatively, we may model the neuron as in
Alternatively, we may model the neuron as in
the following slide:
the following slide:
Models of a Neuron
Models of a Neuron
Models of a Neuron
Models of a Neuron
where the
where the
combination of
combination of
fixed input
fixed input
xo
xo
= + 1 and
= + 1 and
weight
weight
w
w
kO kO
=
=
b
b
k k
accounts for the
accounts for the
bias
bias
b
b
k k
Although the
Although the
models of the
models of the
two figures are
two figures are
different in
different in
appearance,
appearance,
they are
they are
mathematically
mathematically
equivalent.
equivalent.
Y1
Sl ayt 92
Y1
YTU; 15.03.2005
Models of a Neuron
Models of a Neuron
Signal
Signal
-
-
Flow Graph Representation of a Neuron
) u ( y
x w u
k k k
j
p
j
kj k
u =
=

=1
Models of a Neuron
Models of a Neuron
Signal
Signal
-
-
Two different types of links may be distinguished:
Two different types of links may be distinguished:
(a)
(a)
Synaptic links
Synaptic links
,
,
defined by a
defined by a
linear
linear
input
input
-
-
output
output
relation. Specifically, the node signal
relation. Specifically, the node signal
x
x
j j
is
is
multiplied by the synaptic weight
multiplied by the synaptic weight
w
w
kj kj
to produce
to produce
the node signal
the node signal
v
v
k k
.
.
(b)
(b)
Activation links
Activation links
,
,
defined in general by a
defined in general by a
nonlinear
nonlinear
input
input
-
-
output relation. This form of
output relation. This form of
relationship is the nonlinear activation function
relationship is the nonlinear activation function
given as
given as
(.)
Models of a Neuron
Models of a Neuron
The Activation Function
The Activation Function
The activation function, denoted by
The activation function, denoted by
defines the output
defines the output
y
y
of a neuron in terms of the
of a neuron in terms of the
activity level at its input
activity level at its input
v
v
.
.
( ) y v =
Models of a Neuron
Models of a Neuron
We may identify three basic types of activation
We may identify three basic types of activation
functions:
functions:
1. 1.
Threshold Function
Threshold Function
2. 2.
Piecewise
Piecewise
-
-
linear Function
linear Function
3. 3.
Sigmoid Function
Sigmoid Function
Models of a Neuron
Models of a Neuron
1
( ) v
v
0
(a) Unipolar
(a) Unipolar
1. 1.
Threshold (hard limiter or binary activation )
Threshold (hard limiter or binary activation )
Function (leading to discrete perceptron)
Function (leading to discrete perceptron)
) sgn(
2
1
2
1
) ( v v + =
Models of a Neuron
Models of a Neuron
1
( ) v
v
0
0
0
-1
(b)
(b)
Bipolar
Bipolar
( ) sgn( ) v v =
Models of a Neuron
Models of a Neuron
1
( ) v
v
0
0
0
0.5
-0.5
(a) Unipolar
(a) Unipolar
2. Piecewise
2. Piecewise
-
-
linear Function
linear Function
1 1 1 1
( )
2 2 2 2
v v v
| |
= + +
|
\ .
Models of a Neuron
Models of a Neuron
( ) 1 ) ( 1 ) (
2
1
) ( ) ( + = = t x t x x f t y
ij ij ij ij
1
( ) v
v
0
-1
(b) Bipolar
(b) Bipolar
-1
1
( )
1
( ) 1 1
2
v v v = +
Models of a Neuron
Models of a Neuron
(a) Unipolar
(a) Unipolar
( ) v
v
1
0.5
3. Sigmoid Function
3. Sigmoid Function
0 ;
1
1
) ( >
+
=

a
e
v
av
Models of a Neuron
Models of a Neuron
( ) v
v
1
-1
(b) Bipolar
(b) Bipolar
1 2
( ) = -1 ; 0
1 1
av
av av
e
v a
e e
= >
+ +
Models of Artificial Neural Networks
DEFINITION OF Neural Network
DEFINITION OF Neural Network
(Jacek M. Zurada, ARTIFICIAL NEURAL SYSTEMS, 1992, West Publishi (Jacek M. Zurada, ARTIFICIAL NEURAL SYSTEMS, 1992, West Publishing Company) ng Company)
A Neural Network is
A Neural Network is
an interconnection of
an interconnection of
neurons such that neuron outputs are
neurons such that neuron outputs are
connected, through weights, to all other
connected, through weights, to all other
neurons including themselves; both lagfree
neurons including themselves; both lagfree
and delay connections are allowed.
and delay connections are allowed.
Neural Networks Viewed as Directed Graphs
Neural Networks Viewed as Directed Graphs
1. 1.
Block
Block
-
-
Diagram Representation (BDR)
Diagram Representation (BDR)
2. 2.
Signal
Signal
-
-
Flow Graph Representation (SFGR)
Flow Graph Representation (SFGR)
These are obtained when BDR and SFGR for
These are obtained when BDR and SFGR for
the neurons are used.
the neurons are used.
An alternative definition of Neural Network
An alternative definition of Neural Network
(Simon Haykin, NEURAL NETWORKS, 1994, Macmillan College Publishi (Simon Haykin, NEURAL NETWORKS, 1994, Macmillan College Publishing Company) ng Company)
A neural network is a directed graph (SFG) consisting of
A neural network is a directed graph (SFG) consisting of
nodes with interconnecting synaptic and activation
nodes with interconnecting synaptic and activation
links, and which is characterized by four properties:
links, and which is characterized by four properties:

Each neuron is represented by a set of linear synaptic
Each neuron is represented by a set of linear synaptic
links, an externally applied threshold, and a nonlinear
links, an externally applied threshold, and a nonlinear
activation link. The threshold is represented by a
activation link. The threshold is represented by a
synaptic link with an input signal fixed at a value of
synaptic link with an input signal fixed at a value of
-
-
1.
1.
2.
2.
The synaptic links of a neuron weight their
The synaptic links of a neuron weight their
respective input signals.
respective input signals.
3.
3.
The weighted sum of the input signals
The weighted sum of the input signals
defines the total internal activity level of
defines the total internal activity level of
the neuron in question.
the neuron in question.
4.
4.
The activation link squashes the internal
The activation link squashes the internal
activity level of the neuron to produce an
activity level of the neuron to produce an
output that represents the output of the
output that represents the output of the
neuron.
neuron.
Network Architectures

In general, we may identify four different
In general, we may identify four different
classes of network architectures:
classes of network architectures:

1.
1.
Single
Single
-
-
Layer Feedforward Networks

2.
2.
Multilayer Feedforward Networks
Multilayer Feedforward Networks

3.
3.
Recurrent Networks
Recurrent Networks

4.
4.
Lattice Structures
Lattice Structures
1.
1.
Single
Single
-
-
A
A
layered
layered
neural network is a network of
neural network is a network of
neurons organized in the form of layers. In
neurons organized in the form of layers. In
the simplest form of a layered network, we
the simplest form of a layered network, we
just have an
just have an
input layer
input layer
of source nodes that
of source nodes that
projects onto an
projects onto an
output layer
output layer
of neurons
of neurons
(computation nodes), but not vice versa.
(computation nodes), but not vice versa.

In other words, this network is strictly of a
In other words, this network is strictly of a
feedforward
feedforward
type. It is illustrated on the
type. It is illustrated on the
following slide for the case of four nodes in
following slide for the case of four nodes in
both the input and output layers. Such a
both the input and output layers. Such a
network is called a
network is called a
single
single
-
-
layer network
layer network
,
,
with the designation "
with the designation "
single layer
single layer
" referring
" referring
to the output layer of computation nodes
to the output layer of computation nodes
(neurons). In other words, we do not count
(neurons). In other words, we do not count
the input layer of source nodes, because no
the input layer of source nodes, because no
computation is performed there.
computation is performed there.

2. Multilayer Feedforward Networks
2. Multilayer Feedforward Networks
The second class of a feedforward neural
The second class of a feedforward neural
network distinguishes itself by the presence
network distinguishes itself by the presence
of one or more
of one or more
hidden layers
hidden layers
,
,
whose
whose
computation nodes are correspondingly
computation nodes are correspondingly
called
called
hidden
hidden
neurons
neurons
or
or
hidden units
hidden units
.
.
The
The
function of the hidden neurons is to
function of the hidden neurons is to
intervene between the external input and the
intervene between the external input and the
network output.
network output.

By adding one or more hidden layers, the network is
By adding one or more hidden layers, the network is
enabled to extract higher
enabled to extract higher
-
-
order statistics, for (in a rather
order statistics, for (in a rather
loose sense) the network acquires a
loose sense) the network acquires a
global
global
perspective
perspective
despite its local connectivity by virtue of:
despite its local connectivity by virtue of:

the extra set of synaptic connections
the extra set of synaptic connections

the extra dimension of neural interactions
the extra dimension of neural interactions
.
.
The ability of hidden neurons to extract higher
The ability of hidden neurons to extract higher
-
-
order
order
statistics is particularly valuable when the size of the
statistics is particularly valuable when the size of the
input layer is large.
input layer is large.

The source nodes in the input layer of the
The source nodes in the input layer of the
network supply respective elements of the
network supply respective elements of the
activation pattern (input vector), which constitute
activation pattern (input vector), which constitute
the input signals applied to the neurons
the input signals applied to the neurons
(computation nodes) in the second layer (i.e., the
(computation nodes) in the second layer (i.e., the
first hidden layer). The output signals of the
first hidden layer). The output signals of the
second layer are used as inputs to the third layer,
second layer are used as inputs to the third layer,
and so on for the rest of the network.
and so on for the rest of the network.

Typically, the neurons in each layer of the
Typically, the neurons in each layer of the
network have as their inputs the output signals of
network have as their inputs the output signals of
the preceding layer only.
the preceding layer only.

The set of output signals of the neurons in the
The set of output signals of the neurons in the
output (final) layer of the network constitutes the
output (final) layer of the network constitutes the
overall response of the network to the activation
overall response of the network to the activation
pattern supplied by the source nodes in the input
pattern supplied by the source nodes in the input
(first) layer.
(first) layer.
This graph illustrates the
This graph illustrates the
layout of a multilayer
layout of a multilayer
feedforward neural
feedforward neural
network for the case of a
network for the case of a
single hidden layer. For
single hidden layer. For
brevity this network is
brevity this network is
referred to as a 10
referred to as a 10
-
-
4
4
-
-
2
2
network in that it has 10
network in that it has 10
source nodes, 4 hidden
source nodes, 4 hidden
neurons, and 2 output
neurons, and 2 output
neurons.
neurons.
As another example, a feedforward
As another example, a feedforward
network with
network with
p
p
source nodes,
source nodes,
h
h
1
1
neurons in the first hidden layer,
neurons in the first hidden layer,
h
h
2
2
neurons in the second layer, and
neurons in the second layer, and
q
q
neurons in the output layer, say, is
neurons in the output layer, say, is
referred to as a
referred to as a
p
p
-
-
h
h
1
1
-
-
h
h
2
2
-
-
q
q
network.
network.

The neural network of
The neural network of
this figure is said to
this figure is said to
be
be
fully connected
fully connected
in
in
the sense that every
the sense that every
node in each layer of
node in each layer of
the network is
the network is
connected to every
connected to every
other node in the
other node in the
adjacent forward
adjacent forward
layer
layer
.
.

If, however, some of the communication links
If, however, some of the communication links
(synaptic connections) are missing from the
(synaptic connections) are missing from the
network, we say that the network is
network, we say that the network is
partially
partially
connected.
connected.
A form of partially connected
A form of partially connected
multilayer feedforward network of particular
multilayer feedforward network of particular
interest is a locally connected network. An
interest is a locally connected network. An
example of such a network with a single hidden
example of such a network with a single hidden
layer is presented on the next slide. Each neuron
layer is presented on the next slide. Each neuron
in the hidden layer is connected to a local (partial)
in the hidden layer is connected to a local (partial)
set of source nodes that lies in the immediate
set of source nodes that lies in the immediate
neighborhood.
neighborhood.
Partially connected
feedforward neural network
Such a set of localized
nodes feeding a neuron
is said to constitute the
receptive field of the
neuron.
Likewise, each neuron in
the output layer is
connected to a local set
of hidden neurons.
3. Recurrent (Feedback or Dynamical) 3. Recurrent (Feedback or Dynamical)
Networks Networks
A A recurrent neural network recurrent neural network
distinguishes itself from a feedforward distinguishes itself from a feedforward
neural network in that it has at least neural network in that it has at least
one one feedback loop feedback loop. For example, a . For example, a
recurrent network may consist of a recurrent network may consist of a
single layer of neurons with each single layer of neurons with each
neuron feeding its output signal back neuron feeding its output signal back
to the inputs of all the other neurons, to the inputs of all the other neurons,
as illustrated in the architectural as illustrated in the architectural
graph of the figure on the right. In the graph of the figure on the right. In the
structure depicted in this figure there structure depicted in this figure there
are are no self no self- -feedback feedback loops in the loops in the
network; self network; self- -feedback refers to a feedback refers to a
situation where the output of a neuron situation where the output of a neuron
is fedback to its own input. is fedback to its own input.
Recurrent network with no
self-feedback loops and no
hidden neurons

The recurrent network
The recurrent network
illustrated on the previous
illustrated on the previous
slide also has
slide also has
no hidden
no hidden
neurons
neurons
. Here we illustrate
. Here we illustrate
another class of recurrent
another class of recurrent
networks with hidden
networks with hidden
neurons. The feedback
neurons. The feedback
connections shown originate
connections shown originate
from the hidden neurons as
from the hidden neurons as
well as the output neurons.
well as the output neurons.
Recurrent network with
hidden neurons

The
The
presence of feedback loops
presence of feedback loops
, be it as in the
, be it as in the
recurrent structure with or without hidden
recurrent structure with or without hidden
neurons,
neurons,
has a profound impact on the learning
has a profound impact on the learning
capability
capability
of the network,
of the network,
and
and
on its
on its
performance
performance
. Moreover, the feedback loops
. Moreover, the feedback loops
involve the use of particular branches
involve the use of particular branches
composed of
composed of
unit
unit
-
-
delay elements
delay elements
(denoted by
(denoted by
z
z
- -1 1
), which result in a nonlinear dynamical
), which result in a nonlinear dynamical
behavior by virtue of the nonlinear nature of
behavior by virtue of the nonlinear nature of
the neurons.
the neurons.
4. Lattice (Multicategory Perceptron) 4. Lattice (Multicategory Perceptron)
Structures Structures
A A lattice lattice consists of a one consists of a one- -dimensional, dimensional,
two two- -dimensional, or higher dimensional, or higher- -dimensional dimensional
array of neurons with a corresponding set array of neurons with a corresponding set
of source nodes that supply the input of source nodes that supply the input
signals to the array; the dimension of the signals to the array; the dimension of the
lattice refers to the number of the lattice refers to the number of the
dimensions of the space in which the dimensions of the space in which the
graph lies. graph lies.
A lattice network is really a feedforward A lattice network is really a feedforward
network with the output neurons network with the output neurons
arranged in rows and columns. arranged in rows and columns.
One dimensional lattice of 3 neurons
Two dimensional lattice of 3-by-3 neurons

The
The
perceptron
perceptron
is the simplest
is the simplest
form of a neural network used
form of a neural network used
for the classification of a special
for the classification of a special
type of patterns said to be
type of patterns said to be
linearly separable
linearly separable
(i.e., patterns
(i.e., patterns
that lie on opposite sides of a
that lie on opposite sides of a
hyperplane).
hyperplane).

Basically, it consists of a single
Basically, it consists of a single
neuron with adjustable synaptic
neuron with adjustable synaptic
weights and threshold, as shown
weights and threshold, as shown
in the figures.
in the figures.
The Perceptron
The Perceptron

The algorithm used to adjust the free parameters of this
The algorithm used to adjust the free parameters of this
neural network first appeared in a learning procedure
neural network first appeared in a learning procedure
developed by
developed by
Rosenblatt
Rosenblatt
(1958, 1962) for his
(1958, 1962) for his
perceptron
perceptron
brain model
brain model
. Indeed, Rosenblatt proved that if the
. Indeed, Rosenblatt proved that if the
patterns (vectors) used to train the perceptron are drawn
patterns (vectors) used to train the perceptron are drawn
from
from
two linearly separable classes
two linearly separable classes
, then the perceptron
, then the perceptron
algorithm
algorithm
converges
converges
and positions the decision surface in
and positions the decision surface in
the form of a
the form of a
hyperplane between the two classes
hyperplane between the two classes
. The
. The
proof of convergence of the algorithm is known as the
proof of convergence of the algorithm is known as the
perceptron convergence theorem.
perceptron convergence theorem.
The Perceptron
The Perceptron

The single
The single
-
-
layer perceptron depicted
layer perceptron depicted
has a single
has a single
neuron
neuron
. Such a perceptron is limited to performing
. Such a perceptron is limited to performing
pattern classification with only two classes
pattern classification with only two classes
.
.

By expanding the output (computation) layer of the
By expanding the output (computation) layer of the
perceptron to include more than one neuron, we may
perceptron to include more than one neuron, we may
correspondingly form classification with more than two
correspondingly form classification with more than two
classes. However, the classes would have to be linearly
classes. However, the classes would have to be linearly
separable for the perceptron to work properly.
separable for the perceptron to work properly.
The Perceptron
The Perceptron
The Perceptron
The Perceptron

From this model we find that
From this model we find that
the linear combiner output
the linear combiner output
(i.e., hard limiter input) is
(i.e., hard limiter input) is
1
p
kj j
j
v w x u
=
=
The purpose of the perceptron is to classify the set of

externally applied stimuli x
1
,x
2
,., x
p
into one of two
classes, C
1
or C
2
, say. The decision rule for the
classification is to assign the point represented by the
inputs x
1
,x
2
,., x
p
to class C
1
, if the perceptron output y is
+ 1 and to class C
2
if it is -1.
The Perceptron
The Perceptron

To develop insight into the behavior of a pattern
To develop insight into the behavior of a pattern
classifier, it is customary to plot a map of the decision
classifier, it is customary to plot a map of the decision
regions in the p
regions in the p
-
-
dimensional signal space spanned by
dimensional signal space spanned by
the
the
p
p
input variables
input variables x
1
,x
2
,., x
p
.
.
In the case of an
In the case of an
elementary perceptron, there are two decision regions
elementary perceptron, there are two decision regions
separated by a
separated by a
hyperplane
hyperplane
defined by
defined by
1
0
p
kj j
j
w x u
=
=
This is illustrated here for the case of This is illustrated here for the case of
two input variables two input variables x x
l l
and and x x
2 2
, , for which for which
the the decision boundary decision boundary takes the form of takes the form of
a straight line called the a straight line called the decision line decision line. A . A
point point (x (x
1 1
,x ,x
2 2
) ) that lies above the that lies above the decision decision
line line is assigned to class C is assigned to class C
1 1
, and a point , and a point
(x (x
1 1
,x ,x
2 2
) ) that lies below the that lies below the decision line decision line
is assigned to class C is assigned to class C
2 2
. Note also that . Note also that
the effect of the threshold the effect of the threshold u u is merely to is merely to
shift the shift the decision line decision line away from the away from the
origin. The synaptic weights origin. The synaptic weights w w
1 1
w w
2 2
, , .., ..,w w
p p
of the perceptron can be fixed or of the perceptron can be fixed or
adapted on an iteration adapted on an iteration- -by by- -iteration iteration
basis. For the adaptation, we may use basis. For the adaptation, we may use
an error an error- -correction rule known as the correction rule known as the
perceptron convergence algorithm. perceptron convergence algorithm.
The Perceptron
The Perceptron
The Perceptron
The Perceptron
We find it more convenient to
We find it more convenient to
work with the modified signal
work with the modified signal
-
-
flow graph
flow graph given here. given here.
In this second model, which is
In this second model, which is
equivalent to that of the
equivalent to that of the
previous
previous
figure, the threshold
figure, the threshold
u
u
is treated as
is treated as
a synaptic weight is connected
a synaptic weight is connected
1 2
[ ... ... 1]
t
p
x x x x =
u
1 2
[ ... ... ]
t
p
w w w w u =
to a fixed input equal to
to a fixed input equal to
-
-
1.
1.
We may thus define the
We may thus define the
(p
(p
+ 1)
+ 1)
-
-
by
by
-
-
1
1
(augmented)
(augmented)
input vector and the
input vector and the
corresponding
corresponding
(augmented)
(augmented)
weight vector as:
weight vector as:
The Perceptron
The Perceptron

Pattern Space
Pattern Space
Any pattern can be represented by a point in
Any pattern can be represented by a point in
n
n
-
-
dimensinal Euclidean space
dimensinal Euclidean space
E
E
n n
called the
called the
pattern space
pattern space
.
.
Points in that space corresponding to
Points in that space corresponding to
members of the pattern set are n
members of the pattern set are n
-
-
tuple vectors
tuple vectors
x.
x.
The Perceptron
The Perceptron
Example 1:
Example 1:
Consider the six patterns in two dimensional
Consider the six patterns in two dimensional
pattern space shown in the following figure.
pattern space shown in the following figure.
x
x
2 2
x
x
1 1
(2,0)
(2,0)
(1.5,
(1.5,
-
-
1)
1)
(1,
(1,
-
-
2)
2)
(
(
-
-
1,
1,
-
-
2)
2)
(
(
-
-
0.5,
0.5,
-
-
1)
1)
(0, 0)
(0, 0)
The Perceptron
The Perceptron
Design a perceptron such that these are classified
Design a perceptron such that these are classified
according to their membership in sets as follows :
according to their membership in sets as follows :
2 1.5 1
, , : class 1
0 1 2
0 0.5 1
, , : class 2
0 1 2

( ( (
`
( ( (

)

( ( (
`
( ( (

)
The Perceptron
The Perceptron
x
x
2 2
x
x
1 1
(2,0)
(2,0)
(1.5,
(1.5,
-
-
1)
1)
(1,
(1,
-
-
2)
2)
(
(
-
-
1,
1,
-
-
2)
2)
(
(
-
-
0.5,
0.5,
-
-
1)
1)
(0, 0)
(0, 0)
x
x
2 2
= 2x
= 2x
1 1
-
-
2
2
One possible decision line is given by x
One possible decision line is given by x
2 2
= 2x
= 2x
1 1
-
-
2
2
which is drawn in the following figure.
which is drawn in the following figure.
The Perceptron
The Perceptron
One decision surface for this line is obtained as: One decision surface for this line is obtained as:
2 2
2 1 3
+ + = x x x
3 1 2
3 1 2
3 1 2
0 2 2 0 gives the points on the decision line
0 2 2 0 gives the part of the surface above the decision line
0 2 2 0 gives the part of the surface below the decision line
x x x
x x x
x x x
= + + =
> + + >
< + + <
Such a pattern classification can be performed by
Such a pattern classification can be performed by
the following (discrete)
the following (discrete)
perceptron
perceptron
(dichotomizer):
(dichotomizer):
dichotomize
dichotomize
:
:
to divide or separate into two parts
to divide or separate into two parts
dicha
dicha
:
:
in two;
in two;
tomia
tomia
: to cut
: to cut
The Perceptron
The Perceptron
+
+
sgn
sgn
(
(
v
v
)
)
x
x
1 1
x
x
2 2
-
-
1
1
y
y
-
-
2
2
1
1
-
-
2
2
v
v
) 2 2 (
2 1
+ + = x x sgn y
Single
Single
-
-
Layer Feedforward Neural
Network
Network
Example 2:
Example 2:
Assume that a set of eight points,
Assume that a set of eight points,
P
P
0 0
, P
, P
1 1
... ,
... ,
P
P
7 7
,
,
in three
in three
-
-
dimensional space is available.
dimensional space is available.
The set consists of all vertices of a three
The set consists of all vertices of a three
-
-
dimensional
dimensional
cube as follows:
cube as follows:
{P
{P
0 0
(
(
-
-
l,
l,
-
-
1,
1,
-
-
l),
l),
P
P
1 1
(
(
-
-
l,
l,
-
-
1, l),
1, l),
P
P
2 2
(
(
-
-
1, 1,
1, 1,
-
-
1),
1),
P
P
3 3
(
(
-
-
1, 1, 1),
1, 1, 1),
P
P
4 4
(1,
(1,
-
-
1,
1,
-
-
l),
l),
P
P
5 5
(1,
(1,
-
-
1, 1),
1, 1),
P
P
6 6
(1,1,
(1,1,
-
-
1),
1),
P
P
7 7
(1, 1, 1)}
(1, 1, 1)}
Elements of this set need to be classified into two categories
Elements of this set need to be classified into two categories
The first category is defined as containing points with two
The first category is defined as containing points with two
or more positive ones; the second category contains all the
or more positive ones; the second category contains all the
remaining points that do not belong to the first category.
remaining points that do not belong to the first category.
Single
Single
-
-
Network
Network
Classification of points
Classification of points
P
P
3 3
, P
, P
5 5
, P
, P
6 6
,
,
and
and
P
P
7 7
can be
can be
based on the summation of coordinate values for
based on the summation of coordinate values for
each point evaluated for category membership.
each point evaluated for category membership.
Notice that for each point
Notice that for each point
P
P
i i
(
(
x
x
1 1
,
,
x
x
2 2
, x
, x
3 3
) ,
) ,
where
where
i
i
= 0, ... , 7, the membership in the category can be
= 0, ... , 7, the membership in the category can be
established by the following calculation:
established by the following calculation:
= + +
2 category then , 1
1 category then , 1
) sgn( If
3 2 1
x x x
Single
Single
-
-
Network
Network
The neural network given below implements the
The neural network given below implements the
above expression:
above expression:
Single
Single
-
-
Network
Network
The network above performs the
The network above performs the
three
three
dimensional Cartesian space partitioning

dimensional Cartesian space partitioning
as illustrated below :
as illustrated below :
Single
Single
-
-
Network
Network
2 2
2 1 3
+ + = x x x
can be viewed as a
can be viewed as a
Discriminant Function
Discriminant Function
.
.
We
We
may also write
may also write
1 2 1 2
( , ) 2 2 g x x x x = + +
1
1 2
2
( ) 2 2 where
x
g x x
x
(
= + +
(

x x =
or
or
Discriminant Functions
In Example 1
In Example 1
Single
Single
-
-
Network
Network
1 2 1 2
( , ) 2 2 g x x x x = + +
can also be viewed as the equation of a plane in
can also be viewed as the equation of a plane in
3
3
-
-
D Euclidean space.
D Euclidean space.
1 2 1 2
( , ) 0 2 2 0 g x x x x = + + =
is the intersection line of the above plane with the
is the intersection line of the above plane with the
xy
xy
-
-
plane.
plane.
On the other hand
On the other hand
Single
Single
-
-
Network
Network

Obviously:
Obviously:
1 2
1 2
1 2
( ) 0 2 2 0 gives the points on the decision line
( ) 0 2 2 0 gives the points on the plane above the decision line
( ) 0 2 2 0 gives the points on the plane below the decisio
g x x
g x x
g x x
= + + =
> + + >
< + + <
x
x
x n line
Single
Single
-
-
Network
Network
1 2
( , ) 0 g x x =
Since on the decision line we have
Since on the decision line we have
1 2 1 2
1 2 1 2
1 2
( , ) ( , )
( , ) 0
g x x g x x
dg x x dx dx
x x
c c
= + =
c c
we can write
we can write
where
where
dx
dx
1 1
and
and
dx
dx
2 2
are the increments given to
x
x
1 1
and
and
x
x
2 2
on the decision line.
on the decision line.
Single
Single
-
-
Network
Network
1 2
1 1
1 2
2 1 2
2
( , )
( , ) and
( , )
g x x
x dx
g x x dr
dx g x x
x
c
(
(
c
(
(
V = =
(
c (

(
c

and g dr V
where
Now defining
Now defining
are known to be the gradient vector (or normal
vector) and the tangent vector, respectively,
Single
Single
-
-
Network
Network
The gradient vector points toward the positive side
of the decision line. However, there are two normal
vectors, one pointing toward the positive side, q
1
,
and the other toward the negative side, q
2
=-q
1
.
1 2
1
1 2 1 1 2 2
1 2
2
( , )
2 2 2
( , ) , ( , ) ,
( , ) 1 1 1
g x x
x
x x x x
g x x
x
c
(
(
c
( ( (
(
V = = = V = =
( ( (
c (

(
c

g g q q
For the above example the gradient and normal
For the above example the gradient and normal
vectors are given by:
vectors are given by:
Single
Single
-
-
Network
Network
In fact q
2
is obtained from
1 2
( , ) 0 g x x =
Note that
Note that
q
q
1 1
and
and
q
q
2 2
are the projections of the
are the projections of the
normal vectors on the x
normal vectors on the x
-
-
y plane of two intersecting
y plane of two intersecting
planes whose intersection line is given by
planes whose intersection line is given by
1 2
( , ) 0 g x x =
Single
Single
-
-
Network
Network
Although
q
q
1 1
and
and
q
q
2 2
are unique, there are infinetely
many plane pairs whose
intersection line is given
intersection line is given
by
by
1 2
( , ) 0 g x x =
Plane pairs can be built by appropriately
Plane pairs can be built by appropriately
augementing the 2
augementing the 2
-
-
D normal vectors
D normal vectors
q
q
1 1
and
and
q
q
2 2
to 3
to 3
-
-
D normal vectors which will be the normal
D normal vectors which will be the normal
vectors of the two intersecting planes.
vectors of the two intersecting planes.
Single
Single
-
-
Network
Network
The 2
The 2
-
-
D normal vectors are plane vectors given in
D normal vectors are plane vectors given in
the x
the x
-
-
y plane.
y plane.
1 2
2 2
,
1 1
( (
= =
( (

q q
These can be augmented to 3
These can be augmented to 3
-
-
D by adding a third
D by adding a third
component, say 2, yielding
component, say 2, yielding
1 2
2 2
, 1 1
2 2
( (
( (
= =
( (
( (

n n
Single
Single
-
-
Network
Network
The details of building the augmented vectors
The details of building the augmented vectors
are shown below:
are shown below:
n
n
2 2
n
n
1 1
2
2
-
-
1
1
-
-
2
2
1
1
x
x
1 1
x
x
2 2
0
0
q
q
2 2
q
q
1 1
|
|
o
o
-
-
2
2
Decision
Decision
line
line
1
1
g
g
Single
Single
-
-
Network
Network
Note that
Note that
q
q
1 1
and
and
q
q
2 2
are the normal vectors of the
are the normal vectors of the
plane that is perpendicular to the x
plane that is perpendicular to the x
-
-
y plane and
y plane and
intersects the x
intersects the x
-
-
y plane at the decision line.
y plane at the decision line.
On the other hand the vectors
On the other hand the vectors
n
n
1 1
and
and
n
n
2 2
are the
are the
normal vectors of the planes obtained by rotating
normal vectors of the planes obtained by rotating
the above perpendicular plane around the decision
the above perpendicular plane around the decision
line by
line by
o
o
and
and
|
|
, respectively.
, respectively.
Single
Single
-
-
Network
Network
We can now determine the equations for the these
We can now determine the equations for the these
planes by using the
planes by using the
normal vector
normal vector
-
-
point
point
form of
form of
plane equation given as:
plane equation given as:
0 ) (
0
= x x n
t
where:
where:
n
n
is the normal vector of the plane,
is the normal vector of the plane,
x
x
is the vector connecting any point on the plane
is the vector connecting any point on the plane
to the origin,
to the origin,
x
x
0 0
is the
is the
vector connecting
vector connecting
a fixed point on the
a fixed point on the
plane to the origin.
plane to the origin.
Single
Single
-
-
Network
Network
This means that
This means that
x
x
-
-
x
x
0 0
represents the vector
represents the vector
connecting all possible points
connecting all possible points
x
x
on the plane
on the plane
to fixed point
to fixed point
x
x
0 0
on the same plane. That is
on the same plane. That is
x
x
-
-
x
x
0 0
is a
is a
vector that lies on the plane.
vector that lies on the plane.
Now let us find the plane equations for the two
Now let us find the plane equations for the two
normal vectors found above.
normal vectors found above.
Single
Single
-
-
Network
Network
Let
Let
x
x
0 0
be the point (1,0,0) on the decision line.
be the point (1,0,0) on the decision line.
We can write:
We can write:
| |
1
2 1 1 2
1
2 1
1
For 1 2 1 2 0 0 1
2
2 0
x
x g x x
g

( ( (

( ( (
= = =
`
( ( (

( ( (

)
1
n
| |
1
2 2 1 2
2
2 1
1
For 1 2 1 2 0 0 1
2
2 0
x
x g x x
g

( ( (

( ( (
= = = + +
`
( ( (

( ( (

)
2
n
Single
Single
-
-
Network
Network
2 1 2
2
1
For 1 1
2
2
g x x
(
(
= = + +
(
(

2
n
1 1 2
2
1
For 1 1
2
2
g x x
(
(
= =
(
(

1
n
Because of the way g
Because of the way g
1 1
(x) and g
(x) and g
2 2
(x) are built we can
(x) are built we can
state the following:
state the following:
1 2
2 1
( ) ( ) 0 on the positive side of the decision line
( ) ( ) 0 on the negative side of the decision line
g x g x
g x g x
>
>
Single
Single
-
-
Network
Network
n
n
2 2
n
n
1 1
x
x
1 1
x
x
2 2
Decisio Decision
line line
g
g
2 2
g
g
1 1
Decision Decision
line line
g
g
Single
Single
-
-
Network
Network
Now we can compute
Now we can compute
g
g
1 1
(x) and
(x) and
g
g
2 2
(x) for the selected
(x) for the selected
patterns in Example 1.
patterns in Example 1.
Class 1 Class 1 Class 2 Class 2
(2,0) (2,0) (1.5, (1.5,- -1) 1) (1, (1,- -2) 2) (0,0) (0,0) ( (- -0.5,1) 0.5,1) ( (- -1, 1,- -2) 2)
g g
1 1
- -g g
2 2
>0 >0 g g
1 1
- -g g
2 2
>0 >0 g g
1 1
- -g g
2 2
>0 >0 g g
2 2
- -g g
1 1
>0 >0 g g
2 2
- -g g
1 1
>0 >0 g g
2 2
- -g g
1 1
>0 >0
Single
Single
-
-
Network
Network
Henceforth, such
Henceforth, such
g
g
i i
(x) functions will be called
(x) functions will be called
.
.
We can conclude that:
We can conclude that:
2 Class in patterns for the ) ( ) (
1 Class in patterns for the ) ( ) (
1 2
2 1
x g x g
x g x g
>
>
Single
Single
-
-
Network
Network
Minimum Distance Classification
Minimum Distance Classification
The classification of two clusters is carried out in
The classification of two clusters is carried out in
such a way that the boundary of these two clusters
such a way that the boundary of these two clusters
is drawn as a line perpendicular to and passing
is drawn as a line perpendicular to and passing
through the midpoint of the line connecting the
through the midpoint of the line connecting the
center points of two clusters . Therefore the
center points of two clusters . Therefore the
boundary line is the perpendicular bisector of a
boundary line is the perpendicular bisector of a
connecting line.
connecting line.
Single
Single
-
-
Network
Network
P
P
j j
P
P
i i
Positive side
Positive side
x
x
j j
x
x
i i
Negative side
Negative side
P
P
0 0
=(x
=(x
i i
+x
+x
j j
)/2
)/2
x
x
i i
-
-
x
x
j j
0
0
Single
Single
-
-
Network
Network
Now we will derive the equation of the boundary line.
Now we will derive the equation of the boundary line.
Let the vector x and x
Let the vector x and x
0 0
represent any point on this and
represent any point on this and
the point P
the point P
0 0
, respectively. Then the following must hold:
, respectively. Then the following must hold:
0
( ) ( ) 0
t
i j
= x x x x
which can be written in the form
which can be written in the form
1
( ) ( ( )) 0
2
t
i j i j
+ = x x x x x
Single
Single
-
-
Network
Network
1
( ) ( ) ( ) 0
2
t t
i j i j i j
+ = x x x x x x x
and
and
2
2
1
( ) ( ) 0
2
t
i j i j
= x x x x x
or
or
Single
Single
-
-
Network
Network
2
2
1
( ) ( ) ( )
2
t
ij i j i j
g = x x x x x x
Now defining
Now defining
We have already seen that the boundary (decision)
We have already seen that the boundary (decision)
line can be taken as the intersection of two planes
line can be taken as the intersection of two planes
g
g
i i
and
and
g
g
j j
.
.
Single
Single
-
-
Network
Network
( ) ( ) ( )
ij i j
g g g = x x x
where we have called
where we have called
g
g
i i
(x)
(x)
discriminant function
discriminant function
s
s
and shown that they are associated with plane
and shown that they are associated with plane
equations.
equations.
Therefore
Therefore
Single
Single
-
-
Network
Network
2
2 1
( ) ( ) ( ) ( )
2
t
i j i j i j
g g = x x x x x x x
Now using the two equations above we obtain
Now using the two equations above we obtain
which can be used to make the following
which can be used to make the following
identification:
identification:
2 1
( )
2
t
i i i
g = x x x x
2
1
( )
2
t
j j j
g = x x x x
Single
Single
-
-
Network
Network
g
g
i i
(x) can also be expressed as:
(x) can also be expressed as:
, 1
( )
t
i i i n
g w
+
= + x w x
Therefore we can make the identification:
Therefore we can make the identification:
2
, 1
1
2
i
i n i
w
+
=
i
w = x
x
Single
Single
-
-
Network
Network
An alternative approach towards the construction
An alternative approach towards the construction
of discriminant functions may be taken as follows:
of discriminant functions may be taken as follows:
Let us assume that a minimum
Let us assume that a minimum
distance
distance
classification is requried to classify patterns into
classification is requried to classify patterns into
R categories. Each of the classes is represented by
R categories. Each of the classes is represented by
its center point
its center point
P
P
i i
, i=1,2,
, i=1,2,
..,R. The Euclidean

..,R. The Euclidean
distance between an input pattern
distance between an input pattern
x
x
and the point
and the point
P
P
i i
is given by the norm of the vector
is given by the norm of the vector
x
x
-
-
x
x
i i
as:
as:
) ( ) (
i
t
i i
x x x = x x x
Single
Single
-
-
Network
Network
A minimum
A minimum
distance classifier computes the distance

distance classifier computes the distance
from a pattern of unknown classification to each of the
from a pattern of unknown classification to each of the
center points
center points
P
P
i i
. Then the category number of the point
. Then the category number of the point
that yields the minimum distance is assigned to the
that yields the minimum distance is assigned to the
unknown pattern.
unknown pattern.
2
i
=
t t t t t t
i i i i i i
1
x x x x - 2x x + x x = x x - 2(x x - x x ) > 0
2
Squaring the above equation yields
Squaring the above equation yields
Single
Single
-
-
Network
Network
Since
Since
xx
xx
t t
is independent of
is independent of
i,
i,
this term is constant
this term is constant
with respect to the categories. Therefore, in order
with respect to the categories. Therefore, in order
to minimize the distance
to minimize the distance
t t
i i i i
1
g (x) = x x - x x
2
i
x x
we need to maximize
we need to maximize
which is called a discriminant function.
which is called a discriminant function.
Single
Single
-
-
Network
Network
It is also assumed that the index of each point
(pattern)corresponds to its class number.
Example 3: A linear minimum
Example 3: A linear minimum
-
-
distance classifier
distance classifier
will be designed for the three points given as:
will be designed for the three points given as:
(
=
(
=
(
=
5
5
,
5
2
,
2
10
3 2 1
x x x
The three points and the connecting lines
The three points and the connecting lines
constitute a triangle which is shown on the
constitute a triangle which is shown on the
next slide:
next slide:
Single
Single
-
-
Network
Network
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(
-
-
5,5)
5,5)
P
P
2 2
(2,
(2,
-
-
5)
5)
x
x
2 2
x
x
1 1
0
0
Single
Single
-
-
Network
Network
Now let us draw the circle passing through all three
Now let us draw the circle passing through all three
vertices of the triangle, the
vertices of the triangle, the
circumcircle
circumcircle
. We can
. We can
conclude that each boundary is a
conclude that each boundary is a
perpendicular
perpendicular
bisector
bisector
of
of
the
the
triangle
triangle
.
.
A
A
perpendicular bisector
perpendicular bisector
of a
of a
triangle is a straight line passing through the
triangle is a straight line passing through the
midpoint of a side and being perpendicular to it, i.e.
midpoint of a side and being perpendicular to it, i.e.
forming a right angle with it. The three
forming a right angle with it. The three
perpendicular bisectors meet
perpendicular bisectors meet
at
at
a single point, the
a single point, the
triangle's
triangle's
circumcenter
circumcenter
; this point is the center of the
; this point is the center of the
c
c
ircumcircle
ircumcircle
.
.
Single
Single
-
-
Network
Network
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(
-
-
5,5)
5,5)
P
P
2 2
(2,
(2,
-
-
5)
5)
x
x
2 2
x
x
1 1
0
0
Single
Single
-
-
Network
Network
2
2
1
( ) ( ) ( )
2
t
ij i j i j
g = x x x x x x
(
=
(
=
(
=
5
5
,
5
2
,
2
10
3 2 1
x x x
Now using
Now using
and
and
we obtain
we obtain
Single
Single
-
-
Network
Network
2 2
12 1
1
2
1 2
1
( ) ( ) ( )
2
10 2
1
[(100 4) (4 25)]
2 5 2
8 7 37.5
t
1 2 2
t
g
x
x
x x
=

( ( (
= + +
`
( ( (

)
= +
x x x x x x
Single
Single
-
-
Network
Network
2 2
13
1
2
1 2
1
( ) ( ) ( )
2
10 5
1
[(100 4) (25 25)]
2 5 2
15 3 27
t
1 3 1 3
t
g
x
x
x x
=

( ( (
= + +
`
( ( (

)
=
x x x x x x
Single
Single
-
-
Network
Network
2 2
23 2
1
2
1 2
1
( ) ( ) ( )
2
2 5
1
[(4 25) (25 25)]
5 5 2
7 10 10.5
t
2 3 3
t
g
x
x
x x
=

( ( (
= + +
`
( ( (

)
= +
x x x x x x
Single
Single
-
-
Network
Network
2
, 1
1
2
i
i n i
w
+
=
i
w = x
x
Now using
Now using
we obtain
we obtain
1 2 3
10 2 5
; ; ; 2 5 5
-52 -14.5 -25
w w w

( ( (
( ( (
= = =
( ( (
( ( (

Single
Single
-
-
Network
Network
, 1
( )
t
i i i n
g w
+
= + x w x
and using
and using
we obtain
we obtain
1 1 2
2 1 2
3 1 2
( ) 10 2 52
( ) 2 5 14.5
( ) 5 5 25
g x x
g x x
g x x
= +
=
= +
x
x
x
Single
Single
-
-
Network
Network
A block diagram producing the three discriminant
A block diagram producing the three discriminant
functions is shown below:
functions is shown below:
1 2
10 2 52 x x +
x
x
1 1
x
x
2 2
-
-
1
1
10
10
2
2
52
52
2
2
-
-
5
5
14.5
14.5
-
-
5
5
5
5
25
25
1 2
2 5 14.5 x x
1 2
5 5 25 x x +
Single
Single
-
-
Network
Network
The discriminant values for the three patterns
The discriminant values for the three patterns
P
P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,
-
-
5) and P
5) and P
3 3
(
(
-
-
5,5) are shown in the
5,5) are shown in the
table below:
table below:
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2
-
-
5]
5]
t t
Class 3
Class 3
[
[
-
-
5 5]
5 5]
t t
g
g
1 1
(x)=10x
(x)=10x
1 1
+2x
+2x
2 2
-
-
52
52
52
52
-
-
42
42
-
-
92
92
g
g
2 2
(x)= 2x
(x)= 2x
1 1
-
-
5x
5x
2 2
-
-
14.5
14.5
-
-
4.5
4.5
14.5
14.5
-
-
49.5
49.5
g
g
3 3
(x)=
(x)=
-
-
5x
5x
1 1
+5x
+5x
2 2
-
-
25
25
-
-
65
65
-
-
60
60
25
25
Single
Single
-
-
Network
Network
As required by the definition of the discriminant
As required by the definition of the discriminant
function, the responses on the diagonal are the
function, the responses on the diagonal are the
largest in each column. It will be shown later that
largest in each column. It will be shown later that
the same is true for any three points P
the same is true for any three points P
1 1
,P
,P
2 2
,P
,P
3 3
taken
taken
from the three decision regions
H
H
1 1
,
,
,
,
H
H
2 2
,
,
H
H
3 3
provided
provided
that the decision regions are determined as shown
that the decision regions are determined as shown
above. Therefore using a maximum selector at the
above. Therefore using a maximum selector at the
output will provide the required function from the
output will provide the required function from the
network.
network.
Single
Single
-
-
Network
Network
Using the same network with TLUs (bipolar
Using the same network with TLUs (bipolar
activation functions) will result in the outputs
activation functions) will result in the outputs
given in the table below:
given in the table below:
Input
Input
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2
-
-
5]
5]
t t
Class 3
Class 3
[
[
-
-
5 5]
5 5]
t t
sgn(g
sgn(g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2
-
-
5)
5)
1
1
-
-
1
1
-
-
1
1
sgn(g
sgn(g
2 2
(x)=
(x)=
-
-
x
x
2 2
-
-
2)
2)
-
-
1
1
1
1
-
-
1
1
sgn(g
sgn(g
3 3
(x)=
(x)=
-
-
9x
9x
1 1
+x
+x
2 2
)
)
-
-
1
1
-
-
1
1
1
1
Single
Single
-
-
Network
Network
However, as the next example will demonstrate
However, as the next example will demonstrate
this is not true for any three points P
this is not true for any three points P
1 1
,P
,P
2 2
,P
,P
3 3
taken
taken
H
H
1 1
,
,
H
H
2 2
,
,
H
H
3 3
.
.
The diagonal entries=1
The diagonal entries=1
The offdiagonal entries=
The offdiagonal entries=
-
-
1
1
Single
Single
-
-
Network
Network
The response of the same network to the patterns
The response of the same network to the patterns
Q
Q
1 1
(5,0), Q
(5,0), Q
2 2
(0,1) and Q
(0,1) and Q
3 3
(
(
-
-
4,0) are shown in the table below:
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[5 0]
[5 0]
t t
Class 2
Class 2
[0 1]
[0 1]
t t
Class 3
Class 3
[
[
-
-
4 0]
4 0]
t t
g
g
1 1
(x)=10x
(x)=10x
1 1
+2x
+2x
2 2
-
-
52
52
-
-
2
2
-
-
50
50
-
-
92
92
g
g
2 2
(x)= 2x
(x)= 2x
1 1
-
-
5x
5x
2 2
-
-
14.5
14.5
-
-
4.5
4.5
-
-
19.5
19.5
-
-
22.5
22.5
g
g
3 3
(x)=
(x)=
-
-
5x
5x
1 1
+5x
+5x
2 2
-
-
25
25
-
-
50
50
-
-
20
20
-
-
5
5
Single
Single
-
-
Network
Network
The responses on the diagonal are still the largest
The responses on the diagonal are still the largest
in each column. However, using the same network
in each column. However, using the same network
with TLUs (bipolar activation functions) will result
with TLUs (bipolar activation functions) will result
in the outputs given in the table on the next slide:
in the outputs given in the table on the next slide:
Single
Single
-
-
Network
Network
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[5 0]
[5 0]
t t
Class 2
Class 2
[0 1]
[0 1]
t t
Class 3
Class 3
[
[
-
-
4 0]
4 0]
t t
sgn(g
sgn(g
1 1
(x)=10x
(x)=10x
1 1
+2x
+2x
2 2
-
-
52)
52)
-
-
1
1
-
-
1
1
-
-
1
1
sgn(g
sgn(g
2 2
(x)= 2x
(x)= 2x
1 1
-
-
5x
5x
2 2
-
-
14.5)
14.5)
-
-
1
1
-
-
1
1
-
-
1
1
sgn(g
sgn(g
3 3
(x)=
(x)=
-
-
5x
5x
1 1
+5x
+5x
2 2
-
-
25)
25)
-
-
1
1
-
-
1
1
-
-
1
1
Single
Single
-
-
Network
Network
It is therefore impossible to use TLUs once the
It is therefore impossible to use TLUs once the
decision lines are calculated using the
decision lines are calculated using the
minimum
minimum
-
-
distance calssification procedure.
distance calssification procedure.
The only way out is using a maximum selector.
The only way out is using a maximum selector.
The explanation of the responses on the diagonal
The explanation of the responses on the diagonal
being the largest in each column will now be made
being the largest in each column will now be made
in detail.
in detail.
Single
Single
-
-
Network
Network
1 1 2
2 1 2
3 1 2
10 2 52 0
2 5 14.5 0
5 5 25 0
g x x
g x x
g x x
+ =
+ =
+ + =
The discriminant functions determine the plane
The discriminant functions determine the plane
equations
equations
Single
Single
-
-
Network
Network
These planes are shown on the next slide.
These planes are shown on the next slide.
It is easily seen that:
It is easily seen that:
For any point in
For any point in
H1
H1
:
:
g
g
1 1
(x)>g
(x)>g
2 2
(x) and g
(x) and g
1 1
(x)>g
(x)>g
3 3
(x)
(x)
For any point in
For any point in
H2
H2
: g
: g
2 2
(x)>g
(x)>g
1 1
(x) and g
(x) and g
2 2
(x)>g
(x)>g
3 3
(x)
(x)
For any point in
For any point in
H3
H3
: g
: g
3 3
(x)>g
(x)>g
1 1
(x) and g
(x) and g
3 3
(x)>g
(x)>g
2 2
(x)
(x)
Single
Single
-
-
Network
Network
-10
-5
0
5
10
-10
-5
0
5
10
-200
-150
-100
-50
0
50
100
x1
x2
g
i
Single
Single
-
-
Network
Network
The decision regions
The decision regions
H
H
1 1
,
,
H
H
2 2
,
,
H
H
3 3
are projections of
are projections of
the planes g
the planes g
1 1
,g
,g
2 2
and g
and g
3 3
, respectively, on the x
, respectively, on the x
1 1
-
-
x
x
2 2
plane and the decision lines are the projections of
plane and the decision lines are the projections of
the intersection lines of the planes
the intersection lines of the planes
g
g
i i
on the x
on the x
1 1
-
-
x
x
2 2
plane which are shown on the next slide.
plane which are shown on the next slide.
Single
Single
-
-
Network
Network
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(
-
-
5,5)
5,5)
P P
123 123
(2.337,2.686) (2.337,2.686)
x
x
1 1
0
0
g
g
13 13
(x)=0
(x)=0
g
g
12 12
(x)=0
(x)=0
g
g
23 23
(x)=0
(x)=0
1 2
1 3
( ) ( )
( ) ( )
g x g x
g x g x
>
>
2 1
2 3
( ) ( )
( ) ( )
g x g x
g x g x
>
>
3 1
3 2
( ) ( )
( ) ( )
g x g x
g x g x
>
>
P
P
2 2
(2,
(2,
-
-
5)
5)
x
x
2 2
H1,,
H1,,
H2
H2
H3
H3
Single
Single
-
-
Network
Network
A MATLAB plot of the projections of the
A MATLAB plot of the projections of the
intersection lines of the planes
intersection lines of the planes
g
g
i i
are shown
are shown
on the next slide
on the next slide
Single
Single
-
-
Network
Network
-30 -20 -10 0 10 20 30
-30
-20
-10
0
10
20
30
Single
Single
-
-
Network
Network
12 1 2
13 1 2
23 1 2
( ) 8 7 37.5 0
( ) 15 3 27 0
( ) 7 10 10.5 0
g x x
g x x
g x x
= + =
= =
= + =
x
x
x
The projections of the intersection lines of the
planes
planes
g
g
i i
on the x
on the x
1 1
-
-
x
x
2 2
plane are shown to be given
plane are shown to be given
by the following line equations:
by the following line equations:
The previous slide shows the segments that can
The previous slide shows the segments that can
be seen from the top.
be seen from the top.
Single
Single
-
-
Network
Network
The continuation of the line g
12 12
=0 remains
=0 remains
underneath the plane g
3 3
.
.
23 23
=0 remains
=0 remains
1 1
.
.
13 13
=0 remains
=0 remains
2 2
.
.
Single
Single
-
-
Network
Network
A classifier using a maximum selector is shown on
A classifier using a maximum selector is shown on
the next slide. The maximum selector selects the
the next slide. The maximum selector selects the
maximum discriminant and responds with the
maximum discriminant and responds with the
number of the discriminant having the largest
number of the discriminant having the largest
value.
value.
Single
Single
-
-
Network
Network
Classifier using the maximum selector
x
x
1 1
x
x
2 2
-
-
1
1
10
10
2
2
52
52
2
2
-
-
5
5
14.5
14.5
-
-
5
5
5
5
25
25
Maximum
Maximum
selector
selector
i=1,2, or 3 i=1,2, or 3
1
1
2
2
3
3
g g
1 1
(x) (x)
g g
3 3
(x) (x)
g g
2 2
(x) (x)
Single
Single
-
-
Network
Network
The classifier can be redrawn as follows:
The classifier can be redrawn as follows:
x
x
1 1
x
x
2 2
-
-
1
1
10
10
2
2
52
52
2
2
-
-
5
5
14.5
14.5
-
-
5
5
5
5
25
25
x
x
1 1
x
x
2 2
-
-
1
1
x
x
2 2
-
-
1
1
x
x
1 1
Maximum
Maximum
selector
selector
i=1,2, or 3 i=1,2, or 3
1
1
2
2
3
3
g g
1 1
(x) (x)
g g
3 3
(x) (x)
g g
2 2
(x) (x)
Single
Single
-
-
Network
Network
x
x
1 1
x
x
2 2
-
-
1
1
10
10
2
2
52
52
2
2
-
-
5
5
14.5
14.5
-
-
5
5
5
5
25
25
x
x
1 1
x
x
2 2
-
-
1
1
x
x
2 2
-
-
1
1
x
x
1 1
Maximum
Maximum
selector
selector
i=1,2, or 3 i=1,2, or 3
1
1
2
2
3
3
g g
1 1
(x) (x)
g g
3 3
(x) (x)
g g
2 2
(x) (x)
Single
Single
-
-
Network
Network
In the above we have designed a classifier which was In the above we have designed a classifier which was
based on the minimum based on the minimum distance classification for known distance classification for known
clusters and derived the network with three perceptrons clusters and derived the network with three perceptrons
from the discriminant functions which were interpreted from the discriminant functions which were interpreted
as plane equations. Instead, now let us consider the as plane equations. Instead, now let us consider the
network on the next slide which is obtained as a result of network on the next slide which is obtained as a result of
training a network with three perceptrons using the same input training a network with three perceptrons using the same input
patterns P patterns P
1 1
(10,2), P (10,2), P
2 2
(2, (2,- -5) and P 5) and P
3 3
( (- -5,5) 5,5) as in the previous network . as in the previous network .
Single
Single
-
-
Network
Network
1 2
5 3 5 x x +
x
x
1 1
x
x
2 2
-
-
1
1
5
5
3
3
5
5
0
0
-
-
1
1
2
2
-
-
9
9
1
1
0
0
2
2 x
1 2
9x x +
TLU#3
TLU#3
TLU#2
TLU#2
TLU#1
TLU#1
Single
Single
-
-
Network
Network
In fact
In fact
g
g
i i
(x)=0 define the intersection of
(x)=0 define the intersection of
g
g
i i
planes
planes
with x
with x
1 1
-
-
x
x
2 2
plane. Therefore the TLU divides the
plane. Therefore the TLU divides the
g
g
i i
planes into two regions:
planes into two regions:
(1)
(1)
the upper
the upper
-
-
half plane which is above
half plane which is above
x
x
1 1
-
-
x
x
2 2
plane
plane
and
and
(1)
(1)
the lower
the lower
-
-
half plane which is below
half plane which is below
x
x
1 1
-
-
x
x
2 2
plane
plane
.
.
Single
Single
-
-
Network
Network
1 2
2
1 2
5 3 5 0
2 0
9 0
x x
x
x x
+ =
=
+ =
The decision lines are obtained by setting
The decision lines are obtained by setting
g
g
i i
(
(
x
x
)=0
)=0
which are given on the next slide. The shaded
which are given on the next slide. The shaded
areas are indecision regions which will become
areas are indecision regions which will become
clear in the following discussion.
clear in the following discussion.
Single
Single
-
-
Network
Network
2
2 0 x =
1
2
2
(0, 9) 22 1
(0, 9) 29 1
(0, 9) 9 1
g
g
g
( ( (
( ( (
=
( ( (
( ( (

1
2
2
(2, 5) 10 1
(2, 5) 3 1
(2, 5) 23 1
g
g
g
( ( (
( ( (
=
( ( (
( ( (

1 2
5 3 5 0 x x + =
1 2
9 0 x x + =
x
x
1 1
x
x
2 2
0
0
P
P
1 1
(10,2)
(10,2)
P
P
3 3
(
(
-
-
5,5)
5,5)
P
P
2 2
(2,
(2,
-
-
5)
5)
Q
Q
2 2
(4,
(4,
-
-
4)
4)
Q
Q
1 1
(0,9)
(0,9)
Q
Q
3 3
(
(
-
-
1,
1,
-
-
3)
3)
1
2
2
(10, 2) 51 1
(10, 2) 4 1
(10, 2) 88 1
g
g
g
( ( (
( ( (
=
( ( (
( ( (

1
2
2
( 5,5) 15 1
( 5,5) 7 1
( 5,5) 50 1
g
g
g
( ( (
( ( (
=
( ( (
( ( (

1
2
2
(4, 4) 3 1
(4, 4) 2 1
(4, 4) 40 1
g
g
g
( ( (
( ( (
=
( ( (
( ( (

1
2
2
( 1, 3) 19 1
( 1, 3) 1 1
( 1, 3) 6 1
g
g
g
( ( (
( ( (
=
( ( (
( ( (

Single
Single
-
-
Network
Network
The discriminant values g
The discriminant values g
1 1
(x), g
(x), g
2 2
(x), g
(x), g
3 3
(x) for
(x) for
the same three patterns P
the same three patterns P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,
-
-
5) and
5) and
P
P
3 3
(
(
-
-
Input
Input
Discriminant
Discriminant
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2
-
-
5]
5]
t t
Class 3
Class 3
[
[
-
-
5 5]
5 5]
t t
g
g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2
-
-
5
5
51
51
-
-
10
10
-
-
15
15
g
g
2 2
(x)=
(x)=
-
-
x
x
2 2
-
-
2
2
-
-
4
4
3
3
-
-
7
7
g
g
3 3
(x)=
(x)=
-
-
9x
9x
1 1
+x
+x
2 2
-
-
88
88
-
-
23
23
50
50
Single
Single
-
-
Network
Network
The outputs of the network with three discrete
The outputs of the network with three discrete
perceptrons are shown in the table below:
perceptrons are shown in the table below:
Input
Input
Class 1
Class 1
[10 2]
[10 2]
t t
Class 2
Class 2
[2
[2
-
-
5]
5]
t t
Class 3
Class 3
[
[
-
-
5 5]
5 5]
t t
sgn(g
sgn(g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2
-
-
5)
5)
1
1
-
-
1
1
-
-
1
1
sgn(g
sgn(g
2 2
(x)=
(x)=
-
-
x
x
2 2
-
-
2)
2)
-
-
1
1
1
1
-
-
1
1
sgn(g
sgn(g
3 3
(x)=
(x)=
-
-
9x
9x
1 1
+x
+x
2 2
)
)
-
-
1
1
-
-
1
1
1
1
Single
Single
-
-
Network
Network
The table on the previous slide shows that the new
The table on the previous slide shows that the new
discriminant functions
discriminant functions
1 1 2
2 2
3 1 2
( ) 5 3 5
( ) 2
( ) 9
g x x
g x
g x x
= +
=
= +
x
x
x
classify the paterns P
classify the paterns P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,
-
-
5) and P
5) and P
3 3
(
(
-
-
5,5)
5,5)
in the same way as the discriminant functions
in the same way as the discriminant functions
1 1 2
2 1 2
3 1 2
( ) 10 2 52
( ) 2 5 14.5
( ) 5 5 25
g x x
g x x
g x x
= +
=
= +
x
x
x
Single
Single
-
-
Network
Network
Conclusion:
Conclusion:
The network, which is obtained through the
The network, which is obtained through the
perceptron learning algorithm, and the network
perceptron learning algorithm, and the network
obtained using the maximum
obtained using the maximum
-
-
distance classification
distance classification
procedure have classified the three points
procedure have classified the three points
P
P
1 1
(10,2), P
(10,2), P
2 2
(2,
(2,
-
-
5) and P
5) and P
3 3
(
(
-
-
5,5) in exactly the same
5,5) in exactly the same
way, i.e.,
way, i.e.,
3 class (-5,5) P
2 class (2,-5) P
1 class (10,2) P
3
2
1
Single
Single
-
-
Network
Network
Now consider the patterns Q1(0,9), Q2(4,
Now consider the patterns Q1(0,9), Q2(4,
-
-
4)
4)
and Q3(
and Q3(
-
-
1,
1,
-
-
3) which fall into shaded areas.
3) which fall into shaded areas.
The discriminant values for these patterns are
The discriminant values for these patterns are
shown in the table on the next slide:
shown in the table on the next slide:
Single
Single
-
-
Network
Network
Input
Input
Discriminant
Discriminant
[0 9]
[0 9]
t t
[4
[4
-
-
4]
4]
t t
[
[
-
-
1
1
-
-
3]
3]
t t
g
g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2
-
-
5
5
22
22
3
3
-
-
19
19
g
g
2 2
(x)=
(x)=
-
-
x
x
2 2
-
-
2
2
-
-
29
29
2
2
1
1
g
g
3 3
(x)=
(x)=
-
-
9x
9x
1 1
+x
+x
2 2
9
9
-
-
40
40
6
6
Single
Single
-
-
Network
Network
1 3 2
3 2 1
1 2 3
(0,9) (0,9), (0,9)
( 1, 3)> ( 1, 3), ( 1, 3)
(4,-4)> (4,-4), (4,-4)
g g g
g g g
g g g
>

Since
Since
if we use a
if we use a
maximum selector instead of the three
maximum selector instead of the three
TLUs the network can decide that
TLUs the network can decide that
1 2
(0,9) and Q (4,-4) class 1 Q
3
( 1, 3) class 3 Q
Single
Single
-
-
Network
Network
Input
Input
Discriminant
Discriminant
[0 9]
[0 9]
t t
[4
[4
-
-
4]
4]
t t
[
[
-
-
1
1
-
-
3]
3]
t t
g
g
1 1
(x)=5x
(x)=5x
1 1
+3x
+3x
2 2
-
-
5
5
1
1
1
1
-
-
1
1
g
g
2 2
(x)=
(x)=
-
-
x
x
2 2
-
-
2
2
-
-
1
1
1
1
1
1
g
g
3 3
(x)=
(x)=
-
-
9x
9x
1 1
+x
+x
2 2
1
1
-
-
1
1
1
1
On the other hand, if we use TLUs we would
On the other hand, if we use TLUs we would
obtain the outputs in the following table:
obtain the outputs in the following table:
Single
Single
-
-
Network
Network
In order to make a classification we should have a
In order to make a classification we should have a
column with one 1 and two
column with one 1 and two
-
-
1s. Therefore
1s. Therefore
according to the table obtained non of the three
according to the table obtained non of the three
patterns Q
patterns Q
1 1
(0,9), Q
(0,9), Q
2 2
(4,
(4,
-
-
4) and Q
4) and Q
3 3
(
(
-
-
1,
1,
-
-
3) could be
3) could be
classified into any class. Therefore according to
classified into any class. Therefore according to
the network with TLUs the shaded areas will be
the network with TLUs the shaded areas will be
called
called
indecision regions.
indecision regions.
Single
Single
-
-
Network
Network
1 1 2
2 2
3 1 2
5 3 5
2
9
g x x
g x
g x x
= +
=
= +
Now let us consider the planes defined by
Now let us consider the planes defined by
which are plotted on the next slide:
which are plotted on the next slide:
Single
Single
-
-
Network
Network
-10
-8 -6 -4 -2
0 2 4 6
8 10
-10
0
10
-100
-50
0
50
100
x1
g
i
x2
Single
Single
-
-
Network
Network
12 1 2
23 1 2
13 1 2
: 5 4 3 0
: 9 2 2 0
: 14 2 5 0
g x x
g x x
g x x
+ =
=
+ =
planes
planes
g
g
i i
(x) on the x
(x) on the x
1 1
-
-
x
x
2 2
plane are given by
plane are given by
The segments that can be seen from the top are
The segments that can be seen from the top are
plotted on the next slide.
plotted on the next slide.
Single
Single
-
-
Network
Network
-10 -5 0 5 10 15 20
-15
-10
-5
0
5
10
15
Single
Single
-
-
Network
Network
12 12
=0 remains
=0 remains
3 3
.
.
23 23
=0 remains
=0 remains
1 1
.
.
13 13
=0 remains
=0 remains
2 2
.
.
v
1
x
2
x
j
x
n
1
k
K
2
v
2
v
k
v
m
x
1
y
1
y
2
y
k
y
m
Neurons
Input nodes
Output nodes
w
11
w
k1
w
kj
w
mn
w
K1
w
K2
Single
Single
-
-
Network
Network
J J j j
x w ...... x w ......... x w x w v
1 1 2 12 1 11 1
+ + + + + =
J J j j
x w ...... x w ......... x w x w v
2 2 2 22 1 21 2
+ + + + + =
J kJ j kj k k k
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
J KJ j Kj K K K
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
) v ( f y
1 1
=
) v ( f y
2 2
=
) v ( f y
k k
=
) v ( f y
K K
=
Single
Single
-
-
Network
Network
Single
Single
-
-
Network
Network
(
(
(
(
(
(
(
(
(
(
(
(
=
(
(
(
(
(
(
J KJ K K
J
J
K
x
.
.
x
x
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
v
.
.
v
v
2
1
2 1
1 22 21
1 12 11
2
1
Wx v =
|
|
|
|
|
|
.
|
\
|
(
(
(
(
(
(
=
(
(
(
(
(
(
=
(
(
(
(
(
(
J J K
v
.
.
v
v
) v ( f
.
.
) v ( f
) v ( f
y
.
.
y
y
2
1
2
1
2
1
(v) y =
(
(
(
(
(
(
(
(
(
(
(
(
=
(
(
(
(
(
(
J K
v
.
.
v
v
(.) f . .
. . . . .
. . . . .
. . (.) f
. . (.) f
y
.
.
y
y
2
1
2
1
0 0
0 0
0 0
[Wx] y =
Single
Single
-
-
Network
Network
Single
Single
-
-
Network
Network
1 1 1 2 3
2 2 2 3
3 1 2
5 3 5 5 3 5
0 1 2 2
9 1 0 1 9
v x x x x
v x x x
v x x
+ +
( ( ( (
( ( ( (
= = +
( ( ( (
( ( ( (
+

1 1 2 3
2 2 3
3 1 2
sgn(5 3 5 )
sgn( 2 )
sgn( 9 )
y x x x
y x x
y x x
+ +
( (
( (
= +
( (
( (
+

Example 1:
Example 1:
1
2
1
x
x x
(
(
=
(
(

Two
Two
-
-
Network
Network
Example 1:
Example 1:
Design a neural network such that the network
Design a neural network such that the network
maps the shaded region of plane
maps the shaded region of plane
x
x
1, 1,
x
x
2 2
into
into
y
y
= 1, and it
= 1, and it
maps its complement into
maps its complement into
y
y
=
=
-
-
1, where
1, where
y
y
is the output of the
is the output of the
neural network. In summary, the network will provide the
neural network. In summary, the network will provide the
mapping of the entire
mapping of the entire
x
x
1 1
, x
, x
2 2
plane into one of the two points
plane into one of the two points
1 on the real number axis.

1 on the real number axis.
Two
Two
-
-
Network
Network
Solution:
Solution:
The inputs to the neural network will
The inputs to the neural network will
be
be
x
x
1 1
, x
, x
2 2
and the threshold value
and the threshold value
-
-
1. Thus the
1. Thus the
input vector is given as:
input vector is given as:
(
(
(
=
1
2
1
x
x
x
Two
Two
-
-
Network
Network
The boundaries of the shaded region are given
The boundaries of the shaded region are given
by the equations:
by the equations:
0 3
0
0 2
0 1
2
2
1
1
=
=
=
=
x
x
x
x
Two
Two
-
-
Network
Network
The shaded region satisfies the inequalities:
The shaded region satisfies the inequalities:
1
1
2
2
1
2
0
3
x
x
x
x
>
<
>
<
or
or
1
1
2
2
1 0
2 0
0
3 0
x
x
x
x
>
+ >
>
+ >
Two
Two
-
-
Network
Network
These inequalities may be implemented using
These inequalities may be implemented using
four neurons:
four neurons:
Two
Two
-
-
Network
Network
1
1
2
2
3
4
1 0 1
1 0 2
0 1 0
1
0 1 3
v
x
v
x
v
v
( (
(
( (

(
( (
=
(
( (
(
( (

The equations for the first layer are obtained as:
The equations for the first layer are obtained as:
| |
t
x x x x y ) 3 sgn( ) sgn( ) 2 sgn( ) 1 sgn(
2 2 1 1
+ + =
where binary
where binary
(
(
threshold or hard limiter
threshold or hard limiter
)
)
activation
activation
function, i.e.,
function, i.e.,
discrete perceptron
discrete perceptron
is used.
is used.
Two
Two
-
-
Network
Network

The half
The half
-
-
planes where the neurons' responses are
planes where the neurons' responses are
positive (+ 1) have been marked with arrows pointing
positive (+ 1) have been marked with arrows pointing
toward the positive response half plane.
toward the positive response half plane.

The response of the. second layer can be easily
The response of the. second layer can be easily
obtained as
obtained as
Let us discuss the mapping performed by the first
Let us discuss the mapping performed by the first
layer. Note that each of the neurons 1 through 4
layer. Note that each of the neurons 1 through 4
divides the plane
divides the plane
x
x
l l
,x
,x
2 2
into two half
into two half
-
-
planes.
planes.
) 5 . 3 sgn(
4 3 2 1
+ + + = y y y y y
Two
Two
-
-
Network
Network
The resultant neural network
The resultant neural network
The Perceptron Training Algorithm
For the development of the For the development of the perceptron perceptron
learning algorithm learning algorithm for a single for a single- -layer layer
perceptron, we find it more convenient perceptron, we find it more convenient
to work with the modified to work with the modified
signal signal- -flow graph model given here. In this second model, flow graph model given here. In this second model,
which is equivalent to that of the which is equivalent to that of the previous figure, the threshold previous figure, the threshold
u u is treated as a synaptic weight connected to a fixed input is treated as a synaptic weight connected to a fixed input
equal to equal to - -1. We may thus define the 1. We may thus define the (p (p + 1) + 1)- -by by- -1 input vector and 1 input vector and
the corresponding weight vector as: the corresponding weight vector as:
1 2
[ ... ... 1]
t
n
x x x x =
u
1 2
[ ... ... ]
t
n
w w w w u =
u
These vectors are respectively called the augmented input vector These vectors are respectively called the augmented input vector
and the augmented weight vector. and the augmented weight vector.
For fixed For fixed n, n, the equation the equation w w
t t
x x = 0 = 0, plotted in , plotted in
p p- -dimensional space with coordinates dimensional space with coordinates x x
l l
; x ; x
z z
, , ... , ... , x x
p p
, , defines a defines a
hyperplane as the decision surface between two different classes hyperplane as the decision surface between two different classes of of
inputs. inputs.
Suppose then the input variables of the single Suppose then the input variables of the single- -layer perceptron layer perceptron
originate from two originate from two linearly separable classes linearly separable classes that fall on the that fall on the
opposite sides of some hyperplane. Let X opposite sides of some hyperplane. Let X
l l
be the subset of training be the subset of training
vectors x vectors x
l l
(1), x (1), x
l l
(2), ... that belong to class C (2), ... that belong to class C
1 1
, and let X , and let X
2 2
be the be the
subset of training vectors x subset of training vectors x
2 2
(1), x (1), x
2 2
(2), ... (2), ...
that belong to class C that belong to class C
2 2
. The union of X . The union of X
l l
and X and X
2 2
is the complete is the complete
training set X. training set X.

Given the sets of vectors X
Given the sets of vectors X
l l
and X
and X
2 2
to train the classifier,
to train the classifier,
the training process involves the adjustment of the
the training process involves the adjustment of the
weight vector w in such a way that the two classes C
weight vector w in such a way that the two classes C
l l
and
and
C
C
2 2
are separable.
are separable.

These two classes are said to be
These two classes are said to be
linearly separable
linearly separable
if a
if a
realizable setting of the weight vector w exists.
realizable setting of the weight vector w exists.

Conversely, if the two classes C
Conversely, if the two classes C
l l
and C
and C
2 2
are known to be
are known to be
linearly separable, then there exists a weight vector w
linearly separable, then there exists a weight vector w
such that we may state
such that we may state
:
:
u
0
t
w x >
- for every input vector x belonging to class C
1
0
t
w x s
for every input vector x belonging to class C
2
u Given the subsets of training vectors X
1
and X
2
,
the training problem for the elementary perceptron
is then to find a weight vector w such that the two
inequalities above are satisfied.

However, until this is achieved in the itermediate steps
However, until this is achieved in the itermediate steps
we will have
we will have
0
t
w x >
- for some input vectors x belonging to class C
2
0
t
w x s
for some input vector x belonging to class C
1
t
w x
0
t
w x s
t
w x
u
In the former case therefore will
In the former case therefore will
be reduced until is achieved,
be reduced until is achieved,
and in the latter case will be
and in the latter case will be
increased until is reached.
increased until is reached.
Here we will begin to examine neural
Here we will begin to examine neural
network classifiers that derive their
network classifiers that derive their
weights during the learning cycle.
weights during the learning cycle.
0
t
w x >
The sample pattern vectors

The sample pattern vectors
x
x
1 , 1 ,
x
x
2 2
, ... ,
, ... ,
x
x
p p
,
,
called
called
the
the
training sequence,
training sequence,
are presented to the
are presented to the
machine along with the correct response.
machine along with the correct response.
The response is provided by the teacher and

The response is provided by the teacher and
specifies the classification information 'for
specifies the classification information 'for
each mput vector. The classifier modifies its
each mput vector. The classifier modifies its
parameters by means of iterative, supervised
parameters by means of iterative, supervised
learning.
learning.
The network learns from 'experience

The network learns from 'experience
by comparing the targeted correct
by comparing the targeted correct
response with the actual response.
response with the actual response.
The classifier structure is usually

The classifier structure is usually
adjusted after each incorrect response
adjusted after each incorrect response
based on the error value generated.
based on the error value generated.

Let us now look again at the dichotomizer
Let us now look again at the dichotomizer
introduced and defined earlier.
introduced and defined earlier.

We will develop a supervised training procedure
We will develop a supervised training procedure
for this two
for this two
-
-
class linear classifier.
class linear classifier.
Assuming that the desired response is provided,
the error signal is computed.
The error information can be used to adapt the
weights of the discrete perceptron.
First we examine the geometrical conditions in
the augmented weight space.
This will make it possible to devise a
meaningful training procedure for the
dichotomizer under consideration.
The decision surface equation in n+1
dimensional augmented pattern space is
0
t
w x =
When the above equation is considered in the
pattern space then it is written for fixed weights
pattern space then it is written for fixed weights
w
w
(1)
(1)
, w(
, w(
2),
2),
.,
.,
w
w
(k)
(k)
.
.
Therefore the variables
Therefore the variables
of the function
of the function
f(w
f(w
t t
(
(
i
i
)x)
)x)
are x
are x
1 1
, x
, x
2 2
,
,
.,
.,
x
x
n+1 n+1
,
,
the components of the pattern vector.
the components of the pattern vector.
x
x
1 1
x
x
2 2
w
w
(
(
i
i
)
)
f(w
f(w
t t
(
(
i
i
)
)
x)=
x)=
0
0
The normal vector
The normal vector
w
w
(
(
i
i
) (weight wector) points
) (weight wector) points
toward the side of the pattern space for which
toward the side of the pattern space for which
w
w
t t
(
(
i
i
)
)
x
x
> 0, called the positive side.
weight space then it is written for fixed patterns
weight space then it is written for fixed patterns
x
x
(1)
(1)
, x(
, x(
2),
2),
.,
.,
x
x
(p)
(p)
.
.
Therefore the variables of
Therefore the variables of
the function f
the function f
(
(
w
w
t t
x
x
(i))
(i))
are w
are w
1 1
, w
, w
2 2
,
,
.,
.,
w
w
n+1 n+1
, the
, the
components of the weight vector.
components of the weight vector.
w
w
1 1
w
w
2 2
x
x
(
(
i
i
)
)
f(w
f(w
t t
x(i))=
x(i))=
0
0
The normal vector
The normal vector
x
x
(
(
i
i
) (pattern vector) points
) (pattern vector) points
toward the side of the weight space for which
toward the side of the weight space for which
w
w
t t
x(i)
x(i)
1 2 1
1
1
1 2 1
2
2 1 2 1
1
1 2 1
1
( , ,....., )
( )
( , ,....., )
( )
( ) ( , ,....., )
( )
( , ,....., )
n
n
n
n
n
n
f w w w
w
x i
f w w w
x i
w i w w w i
x i
f w w w
w
+
+
+
+
+
+
c
(
(
c
(
(
c (
(
(
(
c = = = =
(
(
(
(
(

c
(
( c

( )
t
f w x f x( )
In further discussion it will be understood that the
In further discussion it will be understood that the
normal vector will always point toward the side of
normal vector will always point toward the side of
the space for which
the space for which
w
w
t t
x
x
> 0, called the positive side,
> 0, called the positive side,
or semispace, of the hyperplane.
or semispace, of the hyperplane.
Decision hyperplane in augmented weight space
Decision hyperplane in augmented weight space
for a five pattern set from two classes
for a five pattern set from two classes
Note that the vectors

Note that the vectors
x(
x(
i
i
)
)
points toward the
points toward the
positive side of the decision hyperplanes
positive side of the decision hyperplanes
w
w
t t
x(i)
x(i)
= 0.
= 0.
By labeling each decision boundary in the

By labeling each decision boundary in the
augmented weight space with an arrow
augmented weight space with an arrow
pointing into the positive half
pointing into the positive half
-
-
plane, we can
plane, we can
easily find a region in the weight space that
easily find a region in the weight space that
satisfies the linearly separable classification.
satisfies the linearly separable classification.
To find the solution for weights, we will

To find the solution for weights, we will
look for the intersection of the positive
look for the intersection of the positive
decision regions due to the prototypes
decision regions due to the prototypes
of class 1
of class 1
and of the negative decision

and of the negative decision
regions due to the prototypes of class 2.
regions due to the prototypes of class 2.
Inspection of the figure reveals that the

Inspection of the figure reveals that the
intersection of the sets of weights
intersection of the sets of weights
yielding all five correct classifications
yielding all five correct classifications
of depicted patterns is in the shaded
of depicted patterns is in the shaded
region of the second quadrant as shown
region of the second quadrant as shown
in the figure above.
in the figure above.
Let us now attempt to arrive iteratively

Let us now attempt to arrive iteratively
at the weight vector w located in the
at the weight vector w located in the
shaded weight solution area.
shaded weight solution area.
To accomplish this, the weights need

To accomplish this, the weights need
to be adjusted from the initial value
to be adjusted from the initial value
located anywhere in the weight space.
located anywhere in the weight space.
This assumption is due to our ignorance of

This assumption is due to our ignorance of
the weight solution region as well as weight
the weight solution region as well as weight
initialization.
initialization.
The adjustment discussed, or network

The adjustment discussed, or network
training, is based on an error
training, is based on an error
-
-
correction
correction
scheme.
scheme.

At this point we will introduce the
At this point we will introduce the
Perceptron
Perceptron
Learning (Traning) Rule (Algorithm).
Learning (Traning) Rule (Algorithm).

The perceptron learning rule is of central
The perceptron learning rule is of central
importance for
importance for
supervised learning
supervised learning
of neural
of neural
networks.
networks.

The
The
weights are initialized at any values
weights are initialized at any values
in this
in this
method
method
A neuron is considered to be an adaptive

A neuron is considered to be an adaptive
element. Its weights are modifiable
element. Its weights are modifiable
depending on the input signal it receives,
depending on the input signal it receives,
its output value, and the associated
its output value, and the associated
teacher (supervisor) response.
teacher (supervisor) response.
( 1) ( ) ( ) w i w i w i + = + A
and
and
d(i) is the teacher

d(i) is the teacher
s (supervisor
s (supervisor
s) signal
s) signal
r is the learning signal

r is the learning signal
c is a positive number called the learning

c is a positive number called the learning
constant depending on the sign of r.
constant depending on the sign of r.
The weight vector is changed according to the
The weight vector is changed according to the
following:
following:
( )
( ) ( ), ( ), ( ) ( ) w i cr w i x i d i x i A =
where
where
1
1
2
2
1
1
(
( )
( )
( )
( )
(
n
n
f i
w
x i
i
x i
i i w
x i
i
w
+
+
( c
(
c
(
(
(
c
(
(
(
= = c
(
(
(
(
(

c (
(
c

t
t
t
t
w x( ))
w x( )
(w x( )) = x( )
w x( ))
This reveals that the change in the weight

This reveals that the change in the weight
vector is in the direction of steepest ascent
vector is in the direction of steepest ascent
(or descent)of
(or descent)of
w
w
t t
x(i).
x(i).
Here we have used
Here we have used
Perceptron Learning (Traning) Rule (Algorithm).
Perceptron Learning (Traning) Rule (Algorithm).
In this case the learning signal is defined as:
In this case the learning signal is defined as:
( ) ( ) ( ) r i d i y i =
where d(i) is the desired output signal and y(i) is
where d(i) is the desired output signal and y(i) is
the actual output signal for the input pattern x(i)
the actual output signal for the input pattern x(i)
given by:
given by:
( ) sgn( y i (i) (i) =
t
w x )
The weight adjustment is given by:
The weight adjustment is given by:
( ) [ ( ) sgn( ( ) ( )) ( ) w i c d i i i A = ]
t
w x x i
1) =sgn( ) 1,i.e., the input is misclassifie

d 1 ( 1) 2;
the correction is in the direction of steepest ascent and given as

t
y w x r d y = = = = +
d =1, i.e.,class 1 is input :
( ) 2 ( )
2) =sgn( ) 1,i.e., the input is correctly classified 1 1 0; no correction

1) sgn( ) 1,i.e., the input is co
w i c x i
t
y w x r d y
t
y w x
A =
= = = =
= =
d =-1, i.e.,class 2 is input :
rrectly classified 1 ( 1) 0; no correction
2) sgn( ) 1,i.e., the input is misclassified 1 (1) 2;
the correction is in the direction of steepest decent and given as ( ) 2 ( )
r d y
t
y w x r d y
w i c x i
= = =
= = = = =
A =

EXAMPLE:
EXAMPLE:
The trained classifier should provide the

The trained classifier should provide the
following classification of four patterns x
following classification of four patterns x
with known class membership d:
with known class membership d:

x(1)
x(1)
= 1,
= 1,
x(3)
x(3)
=3,
=3,
d1
d1
=
=
d3
d3
= 1: class C1
= 1: class C1

x(2)
x(2)
=
=
-
-
0.5,
0.5,
x(4)
x(4)
=
=
-
-
2,
2,
d2
d2
=
=
d4
d4
=
=
-
-
1: class C2
1: class C2
u
The augmented input vectors are given as:
The augmented input vectors are given as:
1 0.5 3 2
(1) , (2) , (3) , (4)
1 1 1 1
x x x x

( ( ( (
= = = =
( ( ( (

x(1)
x(2)
x(3)
x(4)
2.5
(1)
1.75
w

(
=
(

| |
1
(1) (1) 2.5 1.75 0.75 0
1
t
w x
(
= = <
(

Let us choose an arbitrary augmented weight vector of
Let us choose an arbitrary augmented weight vector of
With x (1) being the input
,
we obtain
sgn( (1) (1)) 1
t
w x =
and
binary activation function (discrete
binary activation function (discrete
perceptron)
perceptron)
Hence x(1) is classified as being in class C
2
. However
this is not true. Therefore a correction has to be made.
t
w x
The question to be asked at this point is: How do we make
this correction? The answer depends on which training
algorithm used. Since sgn{w
t
(1)x(1)}=-1, one thing is
certain,however, is that the correction should me made in
such a way that
increases. In order that we achieve this we must first find
out if there is a direction in which the decrease or, for that
matter, increase takes place. To show this, let us consider
the surface given by :
1 2 1
( , ,....., )
n
z f w w w
+
=
1 2 1 1 2 1 1 2 1
1 2 1 1 2 1
1 2 1
( , ,....., ) ( , ,....., ) ( , ,....., )
( , ,....., ) .....
n n n
n n
n
f w w w f w w w f w w w
df w w w dw dw dw
w w w
+ + +
+ +
+
c c c
= + + +
c c c
We can write:
We can write:
Let us now restrict ourselves to the case of
Let us now restrict ourselves to the case of
3 dimensions, namely,
3 dimensions, namely,
1 2
, , z w w or more
or more
succinctly
succinctly
, , z x y
The Perceptron Training
Algorithm
Algorithm
( , ) z f x y =
Now consider the surface
If the level curves are interpreted as contour lines
If the level curves are interpreted as contour lines
of the landscape, i.e., of the surface, then along
of the landscape, i.e., of the surface, then along
these curves
these curves
( , ) constant z f x y = =
Algorithm
Algorithm
( , ) 0 dz df x y = =
consequently, we obtain
consequently, we obtain
0 =
c
c
+
c
c
= dy
y
) y , x ( f
dx
x
) y , x ( f
) y , x ( df
hence
hence
where
where
dx
dx
and
and
dy
dy
x
x
and
and
y
y
on the level curve.
on the level curve.
Algorithm
Algorithm
(
=
(
(
(
(
c
c
c
c
= V
dy
dx
dr
y
) y , x ( f
x
) y , x ( f
) y , x ( f and
dr f and V
where
Now defining
Now defining
are known to be the gradient vector and the tangent
vector,respectively,
0 = V = dr f ) y , x ( df
t
This means that the gradient vector and the
tangent vector are orthogonal vectors. Moreover
it can be shown that the gradient vector points in
the direction of steepest ascent of the function
f(x,y). Furthermore, the gradient is the rate of
climb in the direction of steepest ascent.
we can write
2 2 2
( , ) ( 50) ( 50) 32 z f x y x y = = +
Now consider the surface
The following MATLAB program plots this
The following MATLAB program plots this
surface:
surface:
u
close all
clear all
for x=1:1:100;
for y=1:1:100;
f(x,y)=(x-50).^2+(y-50).^2-1024;
end
end
mesh(f);title('f(x,y)=x^2+y^2-1024');
figure,imshow(f,[ ],'notruesize');colormap(jet);
colorbar;title('f(x,y)=x^2+y^2-1024');
0
20
40
60
80
100
0
50
100
-2000
-1000
0
1000
2000
3000
4000
z=f(x,y)=(x-50)
2
+(y-50)
2
-32
2
0
20
40
60
80
100
0
50
100
-2000
-1000
0
1000
2000
3000
4000
z=f(x,y)=(x-50)
2
+(y-50)
2
-32
2
z=f(x,y)=(x-50)
2
+(y-50)
2
-32
2
-1000
-500
0
500
1000
1500
2000
2500
3000
3500
2 2 2
i
( , ) ( 50) ( 50) 32 C z f x y x y = = + =
The level curves are obtained from
x
y
where C
where C
i i
are constants.
are constants.
4096
1
= C
2
9216 C =
3
16384 C =
( , )
2( 50)
( , ) and
( , )
2( 50)
f x y
x dx
x
f x y dr
f x y
y dy
y
c
(
(

( ( c
( V = = =
( (
c

(

(
c

Considering the four quadrants of the circle:
Considering the four quadrants of the circle:
0 ) 50 ( 2 0 ) 50 ( 2
0 ) 50 ( 2 4 . 0 ) 50 ( 2 3 .
0 ) 50 ( 2 0 ) 50 ( 2
0 ) 50 ( 2 2 . 0 ) 50 ( 2 1 .
< <
> >
> >
< >
y y
x Q x Q
y y
x Q x Q
the gradient vector points in directions as given
the gradient vector points in directions as given
below:
below:
Q.1
Q.1
Q.2
Q.2
Q.3
Q.3
Q.4
Q.4
The fact that the gradient vector is orthogonal to
the tangent vector proves that it is in the direction
of steepest ascent or steepest descent.
The directions found for the example show that
the gradient vector points in the direction of ascent
of the function f(x,y).
Combining the two facts we can conclude that it
points in the direction of steepest ascent.
(
=
(
=
(
=
(
=
(
=
(
=
(
=
(
=
1
2
4
4
4
1
3
3
3
3
1
5 0
2
2
2
1
1
1
1
1
2
1
2
1
2
1
2
1
) ( x
) ( x
) ( x ,
) ( x
) ( x
) ( x ,
.
) ( x
) ( x
) ( x ,
) ( x
) ( x
) ( x
In the weight space the following straight lines
represent the decision lines:
| |
1 2 2 1 2 1
0
1
1
w w w w w w = = + =
(
| |
1 2 2 1 2 1
5 0 0 5 0
1
5 0
w . w w . w
.
w w = = =
(
| |
1 2 2 1 2 1
3 0 3
1
3
w w w w w w = = + =
(
| |
1 2 2 1 2 1
2 0 2
1
2
w w w w w w = = =
(

w
2
w
1
Initial weight vector
2
4
3
1
x(1)
x(2)
x(3)
x(4)
Decision lines in
weight space
The corresponding gradient vectors
are computed as follows:
Algorithm
Algorithm
(
= =
(
=
(
(
(
(
c
c
c
c
= V + =
1
1
1
1
1
1
1
1 1 1
1
1
2
1
2 1
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w ) ( x w ) ( x for
t
t
t t
(
= =
(
=
(
(
(
(
c
c
c
c
= V + =
1
5 0
2
2
2
2
2
2 5 0 1 2
1
1
2
1
2 1
.
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w . ) ( x w ) ( x for
t
t
t t
(
= =
(
=
(
(
(
(
c
c
c
c
= V + =
1
3
3
3
3
3
3
3 5 0 3 3
1
1
2
1
2 1
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w . ) ( x w ) ( x for
t
t
t t
(
= =
(
=
(
(
(
(
c
c
c
c
= V + =
1
2
4
4
4
4
4
4 2 4 4
1
1
2
1
2 1
) ( x
) ( x
) ( x
w
)) ( x w (
w
)) ( x w (
)) ( x w ( w w ) ( x w ) ( x for
t
t
t t
w
2
w
1
2.5
(1)
1.75
w

(
=
(

x(1)
Decision lines and
gradient vectors in
weight space
w
t
x(4)<0
x(2)
x(3)
w
t
x(4)>0
w
t
x(3)<0 w
t
x(3)>0
x(4)
w
t
x(1)<0
w
t
x(1)>0
w
t
x(2)<0
w
t
x(2)>0
2
4
3
1
Now we can concentrate on the particular training
(or learning) algorithm (or rule).
This is a supervised learning algorithm. This
This is a supervised learning algorithm. This
means that at each step the correction is made
means that at each step the correction is made
according to the directive given by the supervisor
according to the directive given by the supervisor
as shown in the following figure.
as shown in the following figure.
x
y
i
d
i
Weight learning rule: d
i
is provided only in the
case of supervised learning
Now consider
) x w sgn( d r
t
i
=
Since
1 1 = = ) x w sgn( , d
t
i
r can take on one of the three values:
0 2 2 , , +
In fact
0 1 1
0 1 1
2 1 1
2 1 1
= = =
= + = + =
= + = =
+ = = + =
r ) x w sgn( , d and
r ) x w sgn( , d for
t
i
t
i
t
i
t
i
Since
Therefore we can define the correction rule in terms
of the correction amount at the nth step as follows:
)) n ( x ) n ( w ( ))) n ( x ) n ( w sgn( ) n ( d )( n ( ) n ( w
t t
i i
V =q
) n ( x ))) n ( x ) n ( w sgn( ) n ( d )( n ( ) n ( w
t
i i
=q
y
i
d
i
Algorithm
Algorithm
0 4
0 3
0 2
0 1
<
>
<
>
) ( x ) N ( w
) ( x ) N ( w
) ( x ) N ( w
) ( x ) N ( w
t
t
t
t
In order for the correct cllasification of the entire
training set
1 4 1 3 1 2 1 1 = = = = ) ( d and ; ) ( d , ) ( d , ) ( d
the following four inequalities must hold:
where w(N) is the final weight vector that provides
correct classification for the entire training set.
) ( x and ), ( x ), ( x ), ( x 4 3 2 1
with respective class memberships
This means that after N

This means that after N
-
-
1 training steps
1 training steps
the weight vector w(N) ends up in the
the weight vector w(N) ends up in the
solution area, which is the shaded area
solution area, which is the shaded area
in the following figure.
in the following figure.
Algorithm
Algorithm
w
2
w
1
2.5
(1)
1.75
w

(
=
(

x(1)
Weight Space
w
t
x(4)<0
x(2)
w
t
x(4)>0
w
t
x(3)<0 w
t
x(3)>0
x(4)
w
t
x(1)<0
w
t
x(1)>0
x(3)
1
3
4
2
w
t
x(2)>0
w
t
x(2)<0

The training has so far been shown in the weight
The training has so far been shown in the weight
space. This is achieved using the decision lines
space. This is achieved using the decision lines
defined by x(1), x(2), x(3) nd x(4). However, the
defined by x(1), x(2), x(3) nd x(4). However, the
original decision lines determined by the
original decision lines determined by the
perceptron at each step are defined in the pattern
perceptron at each step are defined in the pattern
space as this enables the classification to be easily
space as this enables the classification to be easily
seen. These decision lines are defined by w(1), w(2)
seen. These decision lines are defined by w(1), w(2)
w(3) and w(4).
w(3) and w(4).

In the following we show the correction steps of the
In the following we show the correction steps of the
weight vector as well as the corresponding decision
weight vector as well as the corresponding decision
surfaces in the pattern space.
surfaces in the pattern space.
Algorithm
Algorithm

In the pattern space
In the pattern space
Algorithm
Algorithm
0 1 = x ) ( w
determines the the decision line defined by
the initial weight vector
2.5
(1)
1.75
w

(
=
(

| |
1
1 2 2 1
2
(1) 2.5 1.75 2.5 1.75 0 1.429
t
x
w x x x x x
x
(
= = + = =
(

as
Algorithm
Algorithm
1 2
(1) 2.5 1.75 0
t
w x x x = + >
(
= =
(
=
(
(
(
(
c
c
c
c
= V + =
75 1
5 2
1
1
1
1
1
1 5 2 1
2
1
2
1
2 1
.
.
) ( w
) ( w
) ( w
x
) x ) ( w (
x
) x ) ( w (
x ) ( w ( x x . x ) ( w
t
t
t t
The corresponding gradient vector is computed
as follows:
which is the initial weight vector. As the
gradient vector lies on the side of
1 2
(1) 2.5 1.75 0
t
w x x x = + =
where
However, x(1) and x(3) have class 1 ,i.e.,

However, x(1) and x(3) have class 1 ,i.e.,
d1= d3=1 and x(2) and x(4) have class 2 ,i.e.,
d1= d3=1 and x(2) and x(4) have class 2 ,i.e.,
d2= d4=
d2= d4=
-
-
1.
1.
This means that x(1), x(2), x(3), x(4) all are

This means that x(1), x(2), x(3), x(4) all are
wrongly classified.
wrongly classified.
Algorithm
Algorithm
Algorithm
Algorithm
| |
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
Weight Space Pattern Space
Initial
weight
vector
(
=
75 1
5 2
1
.
.
) ( w
Initial
decision
line
Weight vector is orthogonal to corresponding decision line
w
2
w
1
2.5
(1)
1.75
w

(
=
(

Weight Space
Pattern Space
x
2
x
1
| |
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
x(1)
x(3)
x(2)
x(4)
0 1 > x ) ( w
t
) ( w 1
0 1 < x ) ( w
t
(is orthogonal todecion line)
Decision line for initial weight vector
Algorithm
Algorithm
| |
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
Initial
weight
vector
(
=
75 1
5 2
1
.
.
) ( w
Initial
decision
line
Weight vector is orthogonal to corresponding decision line
Pattern x(1) is input
| |
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
= = + =
(
=
(
=
1
1
1) ( x
Decision line is orthogonal to corresponding input vector
First
input
vector
Initial
decision
line
w
2
w
1
2.5
(1)
1.75
w

(
=
(

| |
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
= = + =
(
=
x(1)
Weight Space
w
t
x(1)<0
w
t
x(1)>0
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
| |
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
x(1)
x(3)
x(2)
x(4)
0 1 > x ) ( w
t
) ( w 1
0 1 < x ) ( w
t
Step 1:Pattern x(1) is input
1
Line 1 is
decision line
Algorithm
Algorithm
2.5 1 1.5
(2) (1) (1)
1.75 1 2.75
w w x

( ( (
= + = + =
( ( (

| | 1
1
1
75 1 5 2 1 1 1 =
(
= = ) . . sgn( )) ( x ) ( w sgn( ) ( y
t
2 1 1 1 1 = = ) ( ) ( y ) ( d
| |
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
Initial weight
vector
(
=
75 1
5 2
1
.
.
) ( w
First input
vector
(
=
1
1
1) ( x

Step 1 (Update 1): Pattern x(1) is input
Initial decision line
Updated weight vector
| |
1 2 2 1
2
1
545 0 0 75 2 5 1 75 2 5 1 2 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
Updated decision line
w
2
w
1
2.5
(1)
1.75
w

(
=
(

| |
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
= = + =
(
=
x(1)
Weight Space
w
t
x(1)<0
w
t
x(1)>0
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 2 > x ) ( w
t
0 2 < x ) ( w
t
Step 1 (Update 1): :Pattern x(1) is input
(
=
75 2
5 1
2
.
.
) ( w
) ( w 2
| |
1 2 2 1
2
1
545 0 0 75 2 5 1 75 2 5 1 2 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
2
Line 2 is
decision line
Algorithm
Algorithm
1.5 0.5 1
(3) (2) (2)
2.75 1 1.75
w w x

( ( (
= = =
( ( (

| | 1
1
5 0
75 2 5 1 2 2 2 =
(
= = )
.
. . sgn( )) ( x ) ( w sgn( ) ( y
t
2 1 1 2 2 = = ) ) ( y ) ( d
| |
1 2 2 1
2
1
545 0 0 75 2 5 1 75 2 5 1 2 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
=
75 2
5 1
2
.
.
) ( w
(
=
1
5 0
2
.
) ( x
Weight
vector to be
updated
Second
input
vector
Decision line to be updated
| |
1 2 2 1
2
1
57 0 0 75 1 75 1 1 3 x . x x . x
x
x
. x ) ( w
t
= = + =
(
Second update
w
2
w
1
Weight Space
w
t
x(2)<0
w
t
x(2)>0
0 2 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 3 > x ) ( w
t
0 3 < x ) ( w
t
) ( w 3
x(2)
| |
1 2 2 1 2 1
5 0 0 5 0
1
5 0
w . w w . w
.
w w = = =
(
3
Line 3 is
decision line
) ( w 3
1 3 2
(4) (3) (3)
1.75 1 2.75
w w x

( ( (
= + = + =
( ( (

Algorithm
Algorithm
| | 1
1
3
75 1 1 3 3 3 =
(
= = ) . sgn( )) ( x ) ( w sgn( ) ( y
t
2 1 1 3 3 = = ) ( ) ( y ) ( d
Weight
vector to be
updated
Third
input
vector
(

=
75 1
1
3
.
) ( w
(
=
1
3
3) ( x
| |
1 2 2 1
2
1
57 0 0 75 1 75 1 1 3 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
| |
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
w
2
w
1
Weight Space
w
t
x(3)<0
w
t
x(3)>0
0 3 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 4 > x ) ( w
t
0 4 < x ) ( w
t
| |
1 2 2 1 2 1
3 0 3
1
3
w w w w w w = = + =
(
4
w(4)
x(3)
Line 4 is
decision line
2
(5) (4)
2.75
w w
(
= =
(

Algorithm
Algorithm
| | 1
1
2
75 2 2 4 4 4 =
(
= = ) . sgn( )) ( x ) ( w sgn( ) ( y
t
0 1 1 4 4 = = ) ( ) ( y ) ( d
Weight
vector to be
updated
Fourth
input
vector
(
=
1
2
4 ) ( x
(
=
75 2
2
4
.
) ( w
| |
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
| |
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
No update,
same decision line
w
2
w
1
Weight Space
w
t
x(4)>0
w
t
x(4)<0
0 4 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 4 > x ) ( w
t
0 4 < x ) ( w
t
| |
1 2 2 1 2 1
2 0 2
1
2
w w w w w w = = + =
(
5
w(5) =w(4)
x(4)
Line 4 remains
decision line
=4
2
(6) (5) (4)
2.75
w w w
(
= = =
(

Algorithm
Algorithm
0 1 1 5 1 = = ) ( y ) ( d
| | 1
1
1
75 2 2 1 4 1 5 5 =
(
= = = ) . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

t t
Step 5 (Update5): :Pattern x(1) is input
Weight
vector to be
updated
First
input
vector
(
=
1
1
1) ( x
(
= =
75 2
2
4 5
.
) ( w ) ( w
| |
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
| |
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
No update,
same decision line
w
2
w
1
Weight Space
w
t
x(1)>0
w
t
x(1)<0
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 4 > x ) ( w
t
0 4 < x ) ( w
t
| |
1 2 2 1 2 1
0
1
1
w w w w w w = = + =
(
6
w(6) =w(5)= w(4)
x(1)
Line 4 remains
decision line
Step 5 (Update5): :Pattern x(1) is input
5
4
| |
0.5
(6) sgn( (6) (2)) sgn( (6) (2)) sgn( 2 2.75 ) 1
1
t t
y w x w x

(
= = = =
(

Algorithm
Algorithm
(
=
(
= =
75 1
5 2
1
5 0
75 2
2
2 4 7
.
. .
.
) ( x ) ( w ) ( w
2 1 1 6 2 = = ) ) ( y ) ( d
Weight
vector to be
updated
Second
input
vector
(
=
1
5 0
2
.
) ( x
(
= = =
75 2
2
4 5 6
.
) ( w ) ( w ) ( w
| |
1 2 2 1
2
1
73 0 0 75 2 2 75 2 2 4 x . x x . x
x
x
. x ) ( w
t
= = + =
(
=
| |
1 2 2 1
2
1
43 1 0 75 1 5 2 75 1 5 2 7 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
w
2
w
1
Weight Space
w
t
x(3)<0
w
t
x(3)>0
0 2 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 7 > x ) ( w
t
0 7 < x ) ( w
t
| |
1 2 2 1 2 1
2 0 2
1
2
w w w w w w = = + =
(
7
w(7)
w(7)
x(3)
w
2
w
1
Weight Space
w
t
x(3)<0
w
t
x(3)>0
0 3 = ) ( x w
t
Pattern Space
x
2
x
1
x(1)
x(3)
x(2)
x(4)
0 3 > x ) ( w
t
0 3 < x ) ( w
t
x(2)
| |
1 2 2 1 2 1
3 0 3
1
3
w w w w w w = = + =
(
3
w(4)
Algorithm
Algorithm
2.5
(8) (7)
1.75
w w
(
= =
(

| |
3
(7) sgn( (7) (3)) sgn( 2.5 1.75 ) 1
1
t
y w x
(
= = =
(

Step 7:Pattern x
3
is input
0 1 1 7 3 = = ) ( y ) ( d
2.5
(9) (8) (7)
1.75
w w w
(
= = =
(

Step 8:Pattern x
4
is input
Algorithm
Algorithm
0 1 1 7 4 = = ) ( ) ( y ) ( d
| | 1
1
2
75 1 5 2 4 7 4 8 8 =
(
= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

t t
2.5
(10) (9) (8) (7)
1.75
w w w w
(
= = = =
(

Algorithm
Algorithm
| | 1
1
1
75 1 5 2 1 7 1 9 9 =
(
= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

t t
0 1 1 9 1 = = ) ( y ) ( d
2.5
(11)
1.75
w
(
=
(

Algorithm
Algorithm
| | 1
1
1
75 1 5 2 1 7 2 10 10 =
(
= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

t t
2 1 1 10 2 = = ) ( y ) ( d
Algorithm
Algorithm
w
2
w
1
2.5
(1)
1.75
w

(
=
(

| |
1 2 2 1 2 1
0
1
1
1 w w w w w w ) ( x w
t
= = + =
(
=
x(1)
Weight Space
w
t
x(1)<0
w
t
x(1)>0
0 1 = ) ( x w
t
Pattern Space
x
2
x
1
| |
1 2 2 1
2
1
429 1 0 75 1 5 2 75 1 5 2 1 x . x x . x .
x
x
. . x ) ( w
t
= = + =
(
=
x(1)
x(3)
x(2)
x(4)
0 1 > x ) ( w
t
) ( w 1
0 1 < x ) ( w
t
The initial weight vector w(1) and the weight
The initial weight vector w(1) and the weight
vectors w(2)
vectors w(2)
-
-
w(11) obtained during the training
w(11) obtained during the training
algorithm are given below:
algorithm are given below:
2.5 1.5 1 2
(1) , (2) , (3) , (4) ,
1.75 2.75 1.75 2.75
2.5
(5) (4), (6) (5) (4), (7) ,
1.75
3
(8) (7), (9) (8) (7), (10) (9) (8) (7), (11)
0.75
w w w w
w w w w w w
w w w w w w w w w w

( ( ( (
= = = =
( ( ( (

(
= = = =
(

(
= = = = = = =
(

As can be seen from these vectors, out of the ten
As can be seen from these vectors, out of the ten
vectors w(2)
vectors w(2)
-
-
w(11) only five are different .
w(11) only five are different .
-8 -6 -4 -2 0 2 4 6 8 10 12
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
These five vectors are given in the MATLAB plot
These five vectors are given in the MATLAB plot
below:
below:
x
x
3 3
x
x
1 1
x
x
2 2
0
0
x
x
3 3
=0.25 plane =0.25 plane
(0,0,0)
(0,0,0)
(1,1,0)
(1,1,0)
(0,1,0)
(0,1,0)
(1,0,0)
(1,0,0)
(1,0,1)
(1,0,1)
(0,0,1)
(0,0,1)
(0,1,1)
(0,1,1)
(0,0,1)
(0,0,1)
Example: The trained
Example: The trained
classifier is required
classifier is required
to provide the
to provide the
classification such
classification such
that the yellow vertices
that the yellow vertices
of the cube have class
of the cube have class
membership d=1 and
membership d=1 and
the blue vertices have
the blue vertices have
class membership
class membership
d=2.
d=2.
SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM
Given are the p training pairs
{x
{x
1 1
,
,
d
d
1, 1,
x
x
2 2
,
,
d
d
2 2
,
,
.,
.,
x
x
p, p,
d
d
p p
}, where x
}, where x
i i
is (N
is (N
+
+
1)
1)
x
x
1
1
D
D
i i
is 1 x 1, i=1,2, ,P. In the following n denotes
the training step and p denotes the step counter
within the training cycle.
Step 1:
Step 1:
c>0 is chosen.
c>0 is chosen.
Step2:
Step2:
Weights are initialized at w at random small
values, w is (N
values, w is (N
+
+
1)
1)
x
x
1. Counters and error are
initialized.
initialized.
1 ,1 and 0 k p E
Step3:
Step3:
The training cycle begins here. Input is
presented and output is computed:
, , sgn( )
t
p p
x x d d y w x =
Step4:
Step4:
Weights are updated:
1
, ( )
2
p
c d y + x x w w x
Step5:
Step5:
Cycle error is computed.
2
1
( )
2
E d y
Step 6:
Step 6:
If p<P then
If p<P then 1, 1 p p n n + +
and go to Step 3, otherwise go to Step 7.
Step 7:
Step 7:
The training cycle is completed. For E=0,
The training cycle is completed. For E=0,
terminate the training session. Output weights, k
and E. If E>0 then enter the new training cycle
and E. If E>0 then enter the new training cycle
by going to Step 3.
by going to Step 3.

Here the activation function is a continuous function of
Here the activation function is a continuous function of
the weights instead of the signum.
the weights instead of the signum.

There are two main objectives of this:
There are two main objectives of this:
1 1
To define a continuous function of the weights as the
To define a continuous function of the weights as the
error function so as to obtain finer control over the
error function so as to obtain finer control over the
weights as well as over the whole training procedure;
weights as well as over the whole training procedure;
2 2
To enable the computation of the error gradient in
To enable the computation of the error gradient in
order to be continuously in a position to know the
order to be continuously in a position to know the
direction in which the error decreases.
direction in which the error decreases.
Single
Single
-
-
Layer Continuous
Layer Continuous
Perceptron
Perceptron
Training Rule for a Single
-
-
Layer
Layer
Continuous Perceptron:The Delta
Training Rule
Training Rule
The Delta Training Rule is based on the
The Delta Training Rule is based on the
minimisation of the error function which is given by
minimisation of the error function which is given by
2
2
1
)) n ( y ) n ( d ( ) n ( E =
where n is a positive integer representing the traning
where n is a positive integer representing the traning
step number, i.e.,the
step number, i.e.,the step number in the minimisation
process,
d(n) is the desired output signal and
d(n) is the desired output signal and
( ) [ ( ) ( )]
t
y n f w n x n =
is the actual output.
is the actual output.
-
-
Layer
Layer
Training Rule
Training Rule
The error function (error surgface) is a function
The error function (error surgface) is a function
of the weights: E(w
of the weights: E(w
1 1
,w
,w
2 2
,....,w
,....,w
p p
)=E(w) which is
)=E(w) which is
minimised using an iterative minimisation
minimised using an iterative minimisation
method which computes the new values of the
method which computes the new values of the
weights according to
weights according to
( 1) ( ) ( ) w n w n w n + = + A
where
where
A
A
w(n) is the increment given to the
w(n) is the increment given to the
present weight vector w(n) to obtain the new
present weight vector w(n) to obtain the new
weight vector w(n+1).
weight vector w(n+1).
-
-
Layer
Layer
Training Rule
Training Rule
q where E(w(n)) is the gradient vector and is called the
learning constant. Using this in the equation above, we obtain
Let us now use the steepest descent method for the

Let us now use the steepest descent method for the
minimisation of the error function E(w) where it is
minimisation of the error function E(w) where it is
required that the weight changes be in the negative
required that the weight changes be in the negative
gradient direction. Therefore we take:
gradient direction. Therefore we take:
( ) ( ( )) w n E w n q A = V
)) n ( w ( E ) n ( w ) n ( w V = + q 1
-
-
Layer
Layer
Training Rule
Training Rule
) n ( E )) n ( w ( E =
is the error surface at the nth training step .
The independent variables for minimisation at
each training step are w
i
, the components of the
weight vector.
Therefore the error to be minimised is:
2
2
1
)) n ( x ) n ( w ( f ) n ( d ( ) n ( E
t
=
The error minimisation requires the computation
of the gradient of the error function:
| |
) n ( w w
t
) n ( w w
)) n ( x w ( f ) n ( d ( ) w ( E
=
=
(
V = V
2
2
1
-
-
Layer
Layer
Training Rule
Training Rule
The gradient vector is defined as:
(
(
(
(
(
(
(
(
(
c
c
c
c
c
c
= V
+1
2
1
p
w
E
.
.
w
E
w
E
) w ( E
-
-
Layer
Layer
Training Rule
Training Rule

Using
Using
| |
) n ( w w
t
) n ( w w
)) n ( x w ( f ) n ( d ( ) w ( E
=
=
(
V = V
2
2
1
we obtain
and defining
x w ) w ( v
t
=
-
-
Layer
Layer
Training Rule
Training Rule
| | | |
) n ( w w
p
) n ( w w
w
) w ( v
.
.
w
) w ( v
w
) w ( v
dv
)) w ( v ( df
)) w ( v ( f ) n ( d ) w ( E
=
+
=
(
(
(
(
(
(
(
(
(
c
c
c
c
c
c
= V
1
2
1
-
-
Layer
Layer
Training Rule
Training Rule
-
-
Layer
Layer
Training Rule
Training Rule
i
i
x
w
w v
=
c
c ) (
Since
and
y ) v ( f =
we can write
and
| | | |
( )
( )
( ( ))
( ) ( ) ( ) ( )
w w n
w w n
df v w
E w d n y n x n
dv
=
=
(
V =
(
| |
( ) ( )
( ) ( ( ))
( ) ( ) ( )
i
w w n
i
w w n
E w df v w
d n y n x n
w dv
= =
(
c

=
`
(
c
)

-
-
Layer
Layer
Training Rule
Training Rule
-
-
Layer
Layer
Training Rule
Training Rule
If bipolar continuous activation function is used
then we have:
1
( )
1
v
v
e
f v
e
=
+
and
2
( ) 2
(1 )
v
v
df v e
dv e
=
+
In fact
( )
2
2
2
( ) 2 1 1 1
1 1 ( )
2 1 2
1
v v
v
v
df v e e
f v
dv e
e

| |
| |
( | = = =
|

|
+
\ . +
\ .
-
-
Layer
Layer
Training Rule
Training Rule
-
-
Layer
Layer
Training Rule
Training Rule
| | | || | ) n ( x ) n ( y ) n ( y ) n ( d ) w ( E
) n ( w w
2
1
2
1
= V
=
Conclusion: The delta training rule for the
bipolar continuous perceptron is given as:
| || | ) n ( x ) n ( y ) n ( y ) n ( d ) n ( w ) n ( w
2
1
2
1
1 + = + q
-
-
Layer
Layer
Training Rule
Training Rule
-
-
Layer
Layer
Training Rule
Training Rule
If unipolar continuous activation function is used
then we have:
1
( )
1
v
f v
e
=
+
and
2
( )
(1 )
v
v
df v e
dv e
=
+
-
-
Layer
Layer
Training Rule
Training Rule
( )
2
( ) 1 1 1 1 1
(1 )
1 1 1 1
1
= ( )(1 ( ))
v v
v v v v
v
df v e e
dv e e e e
e
f v f v

+
= = =
+ + + +
+
we can write
we can write
Example:We will carry out the same
training algorithm as in the previous
example but this time using a continuous
bipolar perceptron.
2
2
1
1
2
2
1
1
1
2
1
)
`

+
=
)
`
) n ( v ) n ( v
) n ( v
e
) n ( d
e
e
) n ( d ) n ( E
The error at step n is given by:
-
-
Layer
Layer
Training Rule
Training Rule
For the first pattern x(1)=[1 1]
t
, d(1)=1.
1 2
1 2
1 2 1 2 1 2
2
( )
2
2
( )
( ) ( ) ( ) 2
1 2
(1) (1) 1
2 1
1 2 1 2 2
1 1
2 1 2 1 (1 )
w w
w w
w w w w w w
E d
e
e
e e e
+
+
+ + +

(
= =
`
(
+

)

(
= =
` `
(
+ + +

)
)
The error at step 1 is given by:
-
-
Layer
Layer
Training Rule
Training Rule
For the second pattern x(2)=[-0.5 1]
t
,
d(2)=-1.
| |
2
5 0
2
5 0
2
5 0
2
5 0
2 1
2 1 2 1
2 1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
2
2
2
1
2
) w w . (
) w w . ( ) w w . (
) w w . (
e
e e
e
) ( d ) ( E
+
+
+
=
)
`
+

=
)
`

+

=
)
`

+
=
-
-
Layer
Layer
Training Rule
Training Rule
-
-
Layer
Layer
Training Rule
Training Rule
For the third pattern x(3)=[3 1]
t
, d(3)=1.
| |
2
3
2
3
3
2
3
2
3
2 1
2 1
2 1
2 1
2 1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
2
3
2
1
3
) w w (
) w w (
) w w (
) w w (
) w w (
e
e
e
e
e
) ( d ) ( E
+
+
+
+
+
+
=
)
`
+
=
)
`

+

=
)
`

+
=
For the fourth pattern x(4)=[-2 1]
t
, d(4)=-1.
| |
2
2
2
2
2
2
2
2
2 1
2 1 2 1
2 1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
2
4
2
1
4
) w w (
) w w ( ) w w (
) w w (
e
e e
e
) ( d ) ( E
+ +
+
+
=
)
`
+

=
)
`

+

=
)
`

+
=
-
-
Layer
Layer
Training Rule
Training Rule
close all;
clear all;
[w1,w2] = meshgrid(-4:.1:4, -4:.1:4);
Z1 = exp(w1+w2);
E1=2./(1+Z1).^2
mesh(Z1)
Z2 = exp(.5*w1-w2);
E2=2./(1+Z2).^2
figure,mesh(Z2)
Z3 = exp(3*w1+w2);
E3=2./(1+Z3).^2
figure,mesh(Z3)
Z4 = exp(2*w1-w2);
E4=2./(1+Z4).^2
figure,mesh(Z4)
subplot(2,2,1),mesh(E1);title('Error surface for xt(1)=[1 1] and y=f(wt*x(1))');
xlabel('w1'),ylabel('w2'),zlabel('E1(w1,w2)');
-
-
Layer
Layer
Training Rule
Training Rule
subplot(2,2,2),mesh(E2);title('Error surface for xt(2)=[.5 -1] and y=f(wt*x(2))');
subplot(2,2,3),mesh(E3);title('Error surface for xt(3)=[3 1] and y=f(wt*x(3))');
subplot(2,2,4),mesh(E4);title('Error surface for xt(4)=[2 -1] and y=f(wt*x(4))');
E = E1+E2+E3+E4;
figure,mesh(E);title('Total Error
E(w1,w2)=E1(w1,w2)+E2(w1,w2)+E3(w1,w2)+E4(w1,w2),MESH');
xlabel('w1'),ylabel('w2'),zlabel('E(w1,w2)');
figure,imshow(E,[],'notruesize');colormap(jet);title('Total Error
E(w1,w2)=E1(w1,w2)+E2(w1,w2)+E3(w1,w2)+E4(w1,w2),IMSHOW');
xlabel('w1'),ylabel('w2')
The error surfaces for the above four cases are
shown in the next slide:
-
-
Layer
Layer
Training Rule
Training Rule
-
-
Layer Continuous
Layer Continuous
Perceptron:The Delta Training Rule
0
50
100
0
50
100
0
1
2
w2
Error surface for xt(1)=[1 1] and y=f(wt*x(1))
w1
E
1
(
w
1
,
w
2
)
0
50
100
0
50
100
0
1
2
w1
Error surface for xt(2)=[.5 -1] and y=f(wt*x(2))
w2
E
2
(
w
1
,
w
2
)
0
50
100
0
50
100
0
1
2
w2
Error surface for xt(3)=[3 1] and y=f(wt*x(3))
w1
E
3
(
w
1
,
w
2
)
0
50
100
0
50
100
0
1
2
w1
Error surface for xt(4)=[2 -1] and y=f(wt*x(4))
w2
E
4
(
w
1
,
w
2
)
-
-
Layer
Layer
Training Rule
Training Rule
The total error is defined by:
) w , w ( E ) w , w ( E ) w , w ( E ) w , w ( E ) w , w ( E
2 1 4 2 1 3 2 1 2 2 1 1 2 1
+ + + =
The total error surface is shown in the next
slide.
-
-
Layer Continuous
Layer Continuous
-
-
Layer Continuous
Layer Continuous
A contour map of the total error is depicted below:
A contour map of the total error is depicted below:
-
-
Layer Continuous
Layer Continuous
The classifier training has been simulated for
The classifier training has been simulated for
= 0.5 for four arbitrarily chosen initial weight

= 0.5 for four arbitrarily chosen initial weight
vectors, including the one taken from
vectors, including the one taken from
x(1)
x(1)
= 1,
= 1,
x(3)
x(3)
=3,
=3,
d1
d1
=
=
d3
d3
= 1: class C1
= 1: class C1
x(2)
x(2)
=
=
-
-
0.5,
0.5,
x(4)
x(4)
=
=
-
-
2,
2,
d2
d2
=
=
d4
d4
=
=
-
-
1: class C2
1: class C2
-
-
Layer Continuous
Layer Continuous
The resulting trajectories of 150 simulated
The resulting trajectories of 150 simulated
training steps are shown in the following figure
training steps are shown in the following figure
(each tenth step is shown).
(each tenth step is shown).
-
-
Layer Continuous
Layer Continuous
In each case the weights converge during training
In each case the weights converge during training
toward the center of the solution region obtained
toward the center of the solution region obtained
for the discrete perceptron case given on the next
for the discrete perceptron case given on the next
slide, which coincides with the dark blue region in
slide, which coincides with the dark blue region in
the contour map of the total error is depicted before
the contour map of the total error is depicted before
and also shown on the next slide.
and also shown on the next slide.
-
-
Layer Continuous
Layer Continuous
-
-
Layer Continuous
Layer Continuous
SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM SUMMARY OF CONTINUOUS PERCEPTRON TRAINING ALGORITHM
{x
{x
1 1
,
,
d
d
1, 1,
x
x
2 2
,
,
d
d
2 2
,
,
.,
.,
x
x
p, p,
d
d
p p
}, where x
}, where x
i i
is (N
is (N
+
+
1)
1)
x
x
1
1
D
D
i i
Step 1:
Step 1:
q
q
>0,
>0,
=1 and Emax>0 chosen.

=1 and Emax>0 chosen.
Step2:
Step2:
values, w is (N
values, w is (N
+
+
1)
1)
x
x
initialized.
initialized.
1 ,1 and 0 k p E
-
-
Layer Continuous
Layer Continuous
Step3:
Step3:
, , ( )
p p
x x d d y f =
t
w x
Step4:
Step4:
2
1
, ( )(1 )
2
p
x x w w d y y x q +
Step5:
Step5:
2
1
( )
2
E d y
-
-
Layer Continuous
Layer Continuous
Step 6:
Step 6:
If p<P then
If p<P then 1, 1 p p n n + +
Step 7:
Step 7:
The training cycle is completed. For E<E
The training cycle is completed. For E<E
max max
and E. If E>E
and E. If E>E
max max
then enter the new training cycle
then enter the new training cycle
by going to Step 3.
by going to Step 3.
v
1
x
2
x
j
x
J
1
k
K
2
v
2
v
k
v
K
x
1
y
1
y
2
y
k
y
K
Neurons
j-th
column
of nodes
k-th column of nodes
w
11
w
12
w
jk
w
KJ
w
K1
w
K2
Delta Training Rule for
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
w
1j
w
1J
w
w
21 21
-
-
1
1
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
The above can be redrawn as:
The above can be redrawn as:
J J j j
x w ...... x w ......... x w x w v
1 1 2 12 1 11 1
+ + + + + =
J J j j
x w ...... x w ......... x w x w v
2 2 2 22 1 21 2
+ + + + + =
1 1 2 2
......... ......
l l l lj j lJ J
v w x w x w x w x = + + + + +
J KJ j Kj K K K
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
) v ( f y
1 1
=
) v ( f y
2 2
=
( )
l l
y f v =
) v ( f y
K K
=
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
(
(
(
(
(
(
(
(
(
(
(
(
=
(
(
(
(
(
(
J KJ K K
J
J
K
x
.
.
x
x
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
v
.
.
v
v
2
1
2 1
1 22 21
1 12 11
2
1
Wx v =
|
|
|
|
|
|
.
|
\
|
(
(
(
(
(
(
=
(
(
(
(
(
(
=
(
(
(
(
(
(
J J K
v
.
.
v
v
) v ( f
.
.
) v ( f
) v ( f
y
.
.
y
y
2
1
2
1
2
1
(v) y =
(
(
(
(
(
(
(
(
(
(
(
(
=
(
(
(
(
(
(
J K
v
.
.
v
v
(.) f . .
. . . . .
. . . . .
. . (.) f
. . (.) f
y
.
.
y
y
2
1
2
1
0 0
0 0
0 0
[Wx] y =
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer

The desired and actual output vectors at the n
The desired and actual output vectors at the n
th th
training step are given as:
training step are given as:
1
2
( )
( )
.
.
( )
K
d n
d n
d n
(
(
(
( =
(
(
(

d
2
2
1
)) n ( y ) n ( d ( ) n ( E =
The error expression for a single perceptron was given as:
1
2
( )
( )
.
.
( )
K
y n
y n
y n
(
(
(
( =
(
(
(

y
where
where
n represents the n
n represents the n
-
-
th step which corresponds
th step which corresponds
to a
to a
specific input pattern
specific input pattern
that produces the output error.
that produces the output error.
2
1
2
2
1
2
1
) n n )) n ( y ) n ( d ( ) n ( E
K
k
k k
y( ) d( = =

=
which can be generalised to include all squared errors at
the outputs k=1,2,.....,K
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer

The updated weight value from input j to
The updated weight value from input j to
neuron k at step n is given by:
neuron k at step n is given by:
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
) n ( w ) n ( w ) n ( w
kj kj kj
+ = +1
According to the delta training rule for
According to the delta training rule for
continuous perceptron
continuous perceptron
)
`
=
=
(
(
(
c
c
=
=
J ,..., , j
K ,..., , k
for
w
E
) n ( w
) n ( w w
kj
kj kj
kj
2 1
2 1
q
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
kj kj
w
v
v
E
w
E
k
k
c
c
c
c
=
c
c
J kJ j kj k k k
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
Using
Using
we have
we have
j
k
x
w
v
kj
=
c
c
where
where
= )) ( ( w v E E
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
The error signal term produced by the k
The error signal term produced by the k
th th
neuron is defined as:
neuron is defined as:
k
yk
v
E
c
c
= o
j yk
x
w
E
kj
o =
c
c
Using this yields
Using this yields
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
k
k
k k
yk
v
y
y
E
v
E
c
c
c
c
=
c
c
= o
On the other hand we can write:
On the other hand we can write:
Since
Since
2
1
2
2
1
2
1
) n n )) n ( y ) n ( d ( ) n ( E
K
k
k k
y( ) d( = =

=
) y d (
y
E
k k
k
=
c
c
we get
we get
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
On the other hand using
On the other hand using
k
k
k
k
v
) v ( f
v
y
c
c
=
c
c
k
k
k k
k
k
k k
yk
v
) v ( f
) y d (
v
y
y
E
v
E
c
c
=
c
c
c
c
=
c
c
= o
yields
yields
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
which is used to obtain
which is used to obtain
j
k
k
k k kj
x
v
) v ( f
) y d ( ) n ( w
c
c
=q
| | ( )
2 2
1
2
1
1
2
1
k k
k
k
y )) v ( f (
v
) v ( f
= =
c
c
For bipolar continuous activation function
For bipolar continuous activation function
we already know that
we already know that

Hence
Hence
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
j k k k kj
x ) )) n ( y ( ))( n ( y ) n ( d ( ) n ( w
2
1
2
1
= q
) n ( x ) )) n ( y ( ))( n ( y ) n ( d ( ) n ( w ) n ( w
j k k k kj kj
2
1
2
1
1 + = + q
and
and
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
) )) n ( y ( ))( n ( y ) n ( d ( ) n (
k k k yk
2
1 = o
where
where
| ) n ( x . ) n ( x
) n (
.
) n (
) n ( w . ) n ( w
. . .
) n ( w . ) n ( w
) n ( w . ) n ( w
. . .
) n ( w . ) n ( w
J
yK
y
KJ K
J
KJ K
J
1
1
1
1 11
1
1 11
1 1
1 1
(
(
(
+
(
(
(
(
(
(
=
(
(
(
+ +
+ +
o
o
q
we can write
we can write
Multi
Multi
-
-
Perceptron Layer
Perceptron Layer
1
( )
( ) .
( )
J
x n
n
x n
(
(
=
(
(

x
( 1) ( )
t
y
n n q + = + W W x
Now defining
Now defining
1
( )
( ) .
( )
J
n
n
n
o
o
o
(
(
=
(
(

v
1
x
2
x
j
x
J
1
k
K
2
v
2
v
k
v
K
x
1
y
1
y
2
y
k
y
K
j-th column of nodes (hidden layer) k-th column of nodes
w
11
w
k1
w
kj
w
kJ
w
K1
w
K2
Generalised Delta Training Rule for
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
z
1
z
2
z
i
.
.
.
.
.
.
.
.
.
.
z
I
i-th column of nodes
t
t
11 11
t
t
21 21
t
t
j1 j1
t
t
IJ IJ
u
u
1 1
u
u
2 2
u
u
j j
u
u
J J
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
The weight adjustment for the hidden layer
The weight adjustment for the hidden layer
according to the gradient descent method will be:
according to the gradient descent method will be:
)
`
=
=
(
(
(
c
c
=
=
I ,..., , i
J ,..., , j
for
t
E
) n ( t
) n ( t t
ji
ji ji
ji
2 1
2 1
q
where
where
ji
j
j
t
u
u
E
t
E
ji
c
c
c
c
=
c
c
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
Here
Here
J ,...., , j for
u
E
j
xj
2 1 =
c
c
= o
is the error signal term of the hidden layer
is the error signal term of the hidden layer
with output x.
with output x.
This term is produced by the j
This term is produced by the j
-
-
th
th
neuron of the hidden layer, where j=1,2,....,J.
neuron of the hidden layer, where j=1,2,....,J.
On the other hand, using
On the other hand, using
I jI j j j
z t ..... .......... z t z t u + + + =
2 2 1 1
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
ji
j
t
u
c
c
we can calculate
we can calculate
j
i
ji
u
z
t
c
=
c
as
as
i xj
ji
j
j
z
t
u
u
E
t
E
ji
o =
c
c
c
c
=
c
c
Therefore
Therefore
and
and
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
i xj ji
z t qo =
Since
Since
) u ( f x
j j
=
j
j
j j
xj
u
x
x
E
u
E
c
c
c
c
=
c
c
= o
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
} {
|
.
|
\
|

c
c
=
c
c
=
K
k
k k
j j
) v ( f d
x x
E
1
2
2
1
and
and
j
j
j
j
u
) u ( f
u
x
c
c
=
c
c
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
| |
j
k
k
k
K
k
k k
j
k
K
k
k k
j
x
v
v
) v ( f
) y d (
x
) v ( f
) v ( f d (
x
E
c
c
c
c
=
c
c
=
c
c
=
=
1
1
Now using
Now using
J kJ j kj k k k
x w ...... x w ......... x w x w v + + + + + =
2 2 1 1
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
kj
j
k
w
x
v
=
c
c
we have
we have
k
k
k k
k
yk
v
) v ( f
) y d (
v
E
c
c
=
c
c
= o
Now using this equality and
Now using this equality and
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
| |
j
k
k
k
K
k
k k
j
k
K
k
k k
j
x
v
v
) v ( f
) y d (
x
) v ( f
) v ( f d (
x
E
c
c
c
c
=
c
c
=
c
c
=
=
1
1
in
in
we obtain
we obtain
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
kj
K
k
yk
j
w
x
E
=
=
c
c
1
o
j
j
j j
xj
u
x
x
E
u
E
c
c
c
c
=
c
c
= o
Now using this and
Now using this and
kj
j
k
w
x
v
=
c
c
in
in
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
=
c
c
=
K
k
kj yk i
j
j
ji
w z
u
) u ( f
t
1
o q
=
c
c
=
K
k
kj yk
j
j
xj
w
u
) u ( f
1
o o
we obtain
we obtain
Now using
Now using
i xj ji
z t qo =
we get
we get
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
I ,..., , i
J ,..., , j for
z
u
) u ( f
w ) n ( t ) n ( t
i
j
j
K
k
kj yk ji ji
2 1
2 1
1
1
=
=
c
c
|
.
|
\
|
+ = +

=
o q
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
Now defining the j
Now defining the j
-
-
th column of the matrix
th column of the matrix
(
(
(
(
(
(
=
KJ K K
J
J
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
2 1
1 22 21
1 12 11
W
as
as
j
w
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
y
K
k
kj yk
w w
t
j
=
=1
o
(
(
(
=
yK
y
.
o
o
1
y
and using
and using
we can write
we can write
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
) x ( f
u
) u ( f
j
'
xj
j
j
2
1
2
1
= =
c
c
In the case of bipolar activation function we
In the case of bipolar activation function we
obtain for the hidden layer
obtain for the hidden layer
Now construct a vector whose entries are the
Now construct a vector whose entries are the
above terms for j=1,2,...,J, i.e.,
above terms for j=1,2,...,J, i.e.,
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
(
(
(
(
(
(
(
(
=
(
(
(
(
(
(
=
) x (
) x (
) x (
f
f
f
f
J
'
xJ
'
x
'
x
'
x
2
2
2
2
1
2
1
1
2
1
1
2
1
1
2
1
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
z f w
'
x y
t
j i
'
j
K
k
kj yk
) ( z f w = |
.
|
\
|
=1
o
We then have
We then have
(
(
(
(
(
(
=
I
z
.
.
z
z
2
1
z
and define
and define
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
Now defining
Now defining
(
(
(
(
(
(
=
JI J J
I
I
t . . t t
. . . . .
. . . . .
t . . t t
t . . t t
2 1
1 22 21
1 12 11
T
and
and
'
x y
t
j x
f w =
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
we finally obtain
we finally obtain
t
x
) n ( ) n ( z T T q + = +1
This updating formula is called the
This updating formula is called the
Generalised
Generalised
Delta Rule for adjusting the hidden layer weights.
Delta Rule for adjusting the hidden layer weights.
A similar formula was given for updating the
A similar formula was given for updating the
output layer weights:
output layer weights:
( 1) ( )
t
y
n n q + = + W W x
k
k
k k yk
v
) v ( f
) y d (
c
c
= o
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
Here the main difference is in computing the error
Here the main difference is in computing the error
signals
signals
o
o
y y
and
and
o
o
x x
. In fact, the entries of
. In fact, the entries of
o
o
y y
are given
are given
as
as
which only contain terms belonging to the output
which only contain terms belonging to the output
layer. However, this is not the case with
layer. However, this is not the case with
o
o
x x
,
,
'
x y
t
j x
f w =
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
whose entries are weighted sum of error
whose entries are weighted sum of error
signals
signals
o
o
yk yk
produced by the following layer.
produced by the following layer.
Here we can draw the following conclusion:
Here we can draw the following conclusion:
'
x
f
The Generalised Delta Learning Rule propagates
The Generalised Delta Learning Rule propagates
the error back by one layer which is true for every
the error back by one layer which is true for every
layer.
layer.
Multi
Multi
-
-
Layer Perceptron
Layer Perceptron
- Summary of the Error Back-Propagation
Training Algorithm (EBPTA) Given are P
training pairs
where zi is (1 Xl), d
i
is (K X 1), and i = 1, 2, ... ,
P. Note that the l'th component of each zi is of
value -1 since input vectors have been
augmented. Size J - 1 of the hidden layer
having outputs y is
Error Back
Error Back
-
-
Propagation Training
Algorithm (EBPTA)
Algorithm (EBPTA)
selected. Note that the J'th component of y is of
value -1, since hidden layer outputs have also
been augmented; y is (J X 1) and 0 is (K Xl).
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 1: 'T/ > 0, Emax chosen .
Weights W and V are initialized at small random
values; W is (K X J), V is (J X /).
q f-- 1, p f-- 1, E f-- 0
. Step 2: Training step starts here
(See Note 1 at end of list.)
Input is presented and the layers' outputs computed
[f(net) as in (2.3a) is used]:
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
Z f-- zP' d f-- d
p
Yj f-- f(vjz), for j = 1, 2, ... , J where
~
'" a column
vector, is the j'th row of V, and J .
ok f-- f(w
~
y), . for k = 1, 2, ... , K
where Wk, a column vector, is the k'th row of W.
Step 3: Error value is computed:
1 2
E f-- '2(d
k
- Ok) + E, for k = 1, 2, ... , K
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 4: Error signal vectors 8
0
and 8y of
both layers are computed.
Vector 8
0
is (K Xl), 8y is (J X 1). (See
Note 2 at end of list.) The error signal
terms of the output layer in this step are
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
1 2
Ook = -(d
k
- 0k)(l - Ok)' for k = 1, 2, ... , K 2
The error signal terms of the hidden layer in this step
are
12K
0yj = _ (1 - Yj) L 0okWkj, for j = 1, 2, ... , J
k=l
Step 5: Output layer weights are adjusted:
Wkj f-- Wkj + T/OokYj, for k = 1, 2, ... , K and j = 1, 2,
... ,J
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 7: If p < P then p f-- P + 1, q f-- q + 1, and
go to Step 2; otherwise, go to Step 8.
Step 6: Hidden layer weights are adjusted:
Vji f-- Vji + T/OyPi' for j = 1, 2, ... ,J and i = 1,
2, ... ,I
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
Step 8: The training cycle is completed.
For E < Emax terminate the training session. Output
weights W, V, q, and E.
If E > Emax' then E f- 0, P f- 1, and initiate the new
training cycle by going to Step 2.
NOTE 1 For best results, patterns should be
chosen at random
from the training set Qustification follows in Section
4.5).
Error Back
Error Back
-
-
Algorithm (EBPTA)
Algorithm (EBPTA)
11IIIII NOTE 2 If formula (2.4a) is used in Step
2, then the error
signal terms in Step 4 are computed as follows
80k = (dk - 0k)(l - 0k)ok' for k = 1, 2, ... , K K
8y) = y}(l - y) L 8okWk}' for j = 1, 2, ... , J k=l

The
The
Hopfield Network
Hopfield Network
We know that the
We know that the
Hopfield Network
Hopfield Network
is a
is a
Recurrent (Feedback or Dynamical) Neural
Recurrent (Feedback or Dynamical) Neural
Network.
Network.
Let
Let
y
y
i i
, i=1,2,.....,n, be the
, i=1,2,.....,n, be the
outputs
outputs
of the
of the
network
network
and the energy function E satisfy the
and the energy function E satisfy the
following:
following:
0
2
<
|
.
|
\
|
=

=
n
1 i
i
d
dy
d
d
t t
E
i
o
where
where
o
o
i i
>0 , i=1,2,.....,n
>0 , i=1,2,.....,n
.
.
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network

The above inequlity reveals that the energy
The above inequlity reveals that the energy
decreases with time and becomes zero if and
decreases with time and becomes zero if and
only if
only if
, i
dt
dy
i
= 0
i.e.,
i.e.,
, i t tan cons ) t ( y
i
=
i.e.,
i.e.,
. states m equilibriu stable their reach ) t ( y
i
The
The
Hopfield Network
Hopfield Network
Now let us assume that
Now let us assume that
1
( )
i
i i
i
df y
C
dy
o

=
where
where
0 >
i
C
The
The
Hopfield Network
Hopfield Network
ax
ax
e
e
) x ( f y

= =
1
1
For the bipolar activation function
For the bipolar activation function
the inverse function is given by:
the inverse function is given by:
y
y
ln
a
) y ( f x
+
= =

1
1 1
1
The
The
Hopfield Network
Hopfield Network
The Bipolar Activation Function and its Inverse
The Bipolar Activation Function and its Inverse
The
The
Hopfield Network
Hopfield Network
The Derivative of the Inverse of the Bipolar Function
The Derivative of the Inverse of the Bipolar Function
2
1
1 2
y a dy
dx
=
The
The
Hopfield Network
Hopfield Network
0
2
<
|
.
|
\
|
=

=
i
i i
i
y
) y ( f
t
y
C
t
E
d
d
d
d
d
d
-1
n
1 i
Therefore
Therefore
We can conclude that
We can conclude that
1 1 0
1
< < >
i
i
i
y for
dy
) y ( df
The
The
Hopfield Network
Hopfield Network
t
y
y
) y ( f
t
) y ( f
t
x
i
i
i i i
d
d
d
d
d
d
d
d
-1 -1
= =
Considering
Considering
we obtain
we obtain
2
-1
n
i 1 1
d d ( ) d
d d d
n
i i i i
i i
i
i
y f y dx dy E
C C
t t y dt dt
= =
| |
= =
|
\ .

The
The
Hopfield Network
Hopfield Network
dt
d
dt
d
dt
d
dt
d
dt
dE
t
t
y x
C
x
C
y
|
.
|
\
|
= =
(
(
(
(
(
(
=
N
x
.
.
x
x
2
1
x
(
(
(
(
(
(
=
N
y
.
.
y
y
2
1
y
) C ( diag
i
= C
Now defining
Now defining
yields
yields
The
The
Hopfield Network
Hopfield Network
Since
Since
dt
d
) ( E
dt
dE
t
y
y V =
We can write
We can write
dt
d
C ) ( E
x
y = V
This reveals that the capacitor current vector
This reveals that the capacitor current vector
is parallel to the negative gradient vector.
is parallel to the negative gradient vector.
The
The
Hopfield Network
Hopfield Network
.
.
.
.
.
.
I
I
i i
w
w
i1 i1
w
w
i2 i2
w
w
iN iN
y
y
1 1
y
y
2 2
y
y
N N
y
y
i i
C
C
i i
g
g
i i
i i
N
j
N
j
i ij j ij
i
i
I x g w y w
t
x
C +
|
|
|
.
|
\
|
+ =

= = 1 1
d
d
=
+ =
N
j
i i i i j ij
i
i
I x g x y w
t
x
C
1
) (
d
d
x
x
i i
) y ( f x
) x ( f y
i i
i i
1
=
=
The
The
Hopfield Network
Hopfield Network
i
N
j
i ij
G g w = +
=1
) C ( diag
i
= C
N ,..., , i ), G ( diag
i
2 1 = = G
Now define
Now define
(
(
(
(
(
(
=
NN N N
N
N
w . . w w
. . . . .
. . . . .
w . . w w
w . . w w
2 1
2 22 12
1 12 11
W
(
(
(
(
(
(
=
N
I
.
.
I
I
2
1
I
(
(
(
(
(
(
=
N
x
.
.
x
x
2
1
x
The
The
Hopfield Network
Hopfield Network
I Gx Wy
) x(
C + =
t d
d t
We obtain
We obtain
i i
N
j
i j ij
i
i
I x G y w
t
x
C + =
=1
d
d
consequently
consequently
dt
d
) ( E
x
C y = V
and since
and since
The
The
Hopfield Network
Hopfield Network
I Gx Wy y + = V ) ( E
we obtain
we obtain
In the case of bipolar activation function we know that
In the case of bipolar activation function we know that
y
y
ln
a
) y ( f x
+
= =

1
1 1
1
The
The
Hopfield Network
Hopfield Network
(
(
(
(
(
(
(
(
(
=
N
N
y
y
ln
y
y
ln
y
y
ln
a
1
1
1
1
1
1
1
2
2
1
1
x
Therefore the state vector is given as:
Therefore the state vector is given as:
The
The
Hopfield Network
Hopfield Network
dt
dy
dt
dx
C
t
E
i
N
i
i
i
=
=
1
d
d
We already know
We already know
therefore
therefore
) (
d
d
1 1 1 1
dt
dy
I
dt
dy
x G
dt
dy
y w
t
E
i
N
i
N
j
N
i
N
i
i
i
i i
i
j ij
= = = =
+ =

= =
+ =
N
i
N
j
i
i i i j ij
dt
dy
I x G y w
t
E
1 1
) (
d
d
and
and
The
The
Hopfield Network
Hopfield Network
dt
d
dt
d
) (
dt
d y
W y Wy
y
Wy y
t
t
t
+ =
W W
t
=
then
then
If
If
dt
d
dt
d
) (
dt
d y
W y y W
y
Wy y
t t
t
t
+ =
Now consider:
Now consider:
The
The
Hopfield Network
Hopfield Network
dt
d
) (
dt
d y
W y Wy y
t t
2 =
dt
d
)
dt
d
(
dt
d
dt
d
t
t t
y
W y
y
W y Wy
y
y W
y
t t t
= = =
Therefore
Therefore
) (
dt
d
dt
d
Wy y
y
W y
t t
2
1
=
and
and
) (
d
d
1 1 1 1
dt
dy
I
dt
dy
x G
dt
dy
y w
t
E
i
N
i
N
j
N
i
N
i
i
i
i i
i
j ij
= = = =
+ =
The
The
Hopfield Network
Hopfield Network
Now consider the first term of
Now consider the first term of
dt
d
dt
dy
y w
N
i
N
j
i
j ij
y
W y
t
=
= = 1 1
We can write:
We can write:
Now using the above equality, we have
Now using the above equality, we have
) (
dt
dy
y w
N
i
N
j
i
j ij
Wy y
dt
d
t
2
1
1 1
=
= =
The
The
Hopfield Network
Hopfield Network
Now consider the second term in the same equation:
Now consider the second term in the same equation:
dt
dy
) y ( f
dt
dy
x
i
i
i
i
1
=
1 1
0
( ) ( ( ) )
i
y
i
i
d
f y f y dy
dy

=
}
we can write
we can write
1 1 1
0 0
( ) ( ( ) ) ( ( ) )
i i
y y
i i
i
i
dy d dy d
f y f y dy f y dy
dt dy dt dt

= =
} }
The
The
Hopfield Network
Hopfield Network
1
1 1
0
1
( ( ) )
2
i
y
N N
t
i i i i
i i
dE d
y Wy G f y dy I y
dt dt

= =
= +

}
1
1 1
0
1
( ( ) )
2
i
y
N N
t
i i i i
i i
E y Wy G f y dy I y
= =
= +

}
The
The
Hopfield Network
Hopfield Network
2
2 1
1
i
i i
dx
dy a y
=
i i
N
j
i j ij
i
i
I x G y w
t
x
C + =
=1
d
d
In order to obtain the state equations in terms
In order to obtain the state equations in terms
of the outputs y
of the outputs y
i i
consider once again
consider once again
Using
Using
The
The
Hopfield Network
Hopfield Network
we obtain
we obtain
and
and
i i
N
j
i j ij
i
i
I x G y w
t
y
y a
C
i
+ =

=1
2
d
d
) 1 (
2
i i
N
j
i j ij
i
i
I x G y w
C
y a
t
y
i
+
=

=1
2
2
) 1 (
d
d
The
The
Hopfield Network
Hopfield Network
( ) I (y) G Wy
y
1
+
|
|
.
|
\
|

=

i
i
C
) y ( a
diag
dt
d
2
1
2
The
The
Hopfield Network
Hopfield Network
y
y
1 1
C
C
1 1
g
g
1 1
x
x
1 1
y
y
2 2
C
C
2 2
g
g
2 2
x
x
2 2
1
1 1 1 11 1 1 12 2 1
( )+ ( )
dx
g x C g y x g y x
dt
+ =
2
2 2 2 22 2 2 21 1 2
( )+ ( )
dx
g x C g y x g y x
dt
+ =
g
g
12 12
g
g
21 21
g
g
11 11
g
g
22 22
The
The
Hopfield Network
Hopfield Network
1
1 11 12 1 1 11 12 1
1 21 22 2 2 22 21 2 2
0 0
0 0
dx
C g g y g g g x
dt
C g g y g g g x dx
dt
(
(
+ +
( ( ( ( (
=
(
( ( ( ( (
+ +
(
(

1 11 12 1 11 12
1 21 22 2 22 21
0 0
, ,
0 0
C g g g g g
C g g g g g
+ +
( ( (
= = =
( ( (
+ +

C W G
which
which
yields
yields
The
The
Hopfield Network
Hopfield Network
( )
2
11 12 1
1
1
2
1
21 22 2
0
1
2
i
y
i
i
g g y
E y y G f y dy
g g y

=
( (
= +
(
( (

}
and
and
1
1 11 12 1 1
1 2
21 22 2 2 2
2
1
ln
1 0
1
( , )
0 1
ln
1
y
y g g y G
E y y
g g y G y a
y
(
(
+
( ( (
(
V =
( ( (
(

(
+

The
The
Hopfield Network
Hopfield Network
and
and
1
11 1 12 2 1
1 1
1 2
2
22 1 21 1 2
2 2
1 1
ln
1
( , )
1 1
ln
1
E y
g y g y G
y a y
E y y
E y
g y g y G
y a y
c
( (
+
( (
c +
( (
V = =
c ( (
+
( (
c +

The
The
Hopfield Network
Hopfield Network
( ) ( )
1 2
2 2 1 1
11 1 22 2 1 2 12 1 2 12 1 2
0 0
1
( )
2
y y
E g y g y y y g y y g G f y dy G f y dy

= + + + + +
} }
1 2
2 2
11 1 22 2 1 2 21 1 2 12 1 2
0 0
1 1 1 1
( ) ln ln
2 1 1
y y
y y
E g y g y y y g y y g G dy G dy
a y y

= + + +
+ +
} }
} } }
+ =
+
=
i i i
y y y
dy ) y ln( dy ) y ln( dy
y
y
ln I
0 0 0
1 1
1
1
Now consider
Now consider
The
The
Hopfield Network
Hopfield Network
dy ) y ln( I
i
y
}
=
0
1
1
and
and
Let
Let
v y
u ) y ln(
=
=
1
1
then
then
dy dv
y
dy
du
=

=
1
}
} } }
+ =
+ = = =
i
i
i
i
i i
y
y
y
y
y y
y
dy
) y ( ) y ( ) y ( ln
) vdu ] uv ( dv u dy ) y ln( I
0
0
0
0 0
1
1
1 1
1
0
] } 1 {
Hence
Hence
The
The
Hopfield Network
Hopfield Network
i i i
y ) y ( ) y ( ln I = 1
1
} 1 {
i i i
y ) y ( ) y ( ln I + + = 1
2
} 1 {
} } }
+ =
+
=
i i i
y y y
dy ) y ln( dy ) y ln( dy
y
y
ln I
0 0 0
1 1
1
1
) y ( ) y ( ln ) y ( ) y ( ln I I I
i i i i
+ + = = 1 1
2 1
} 1 { } 1 {
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
1 2
2 2
11 1 22 2 1 2 12 1 2 12 1 2
0 0
1 1 1 1 1
( ) ln ln
2 1 1
y y
y y
E g y g y y y g y y g G dy G dy
a y a y

= + + +
+ +
} }
{ }
{ }
2 2
11 1 22 2 12 21 1 2 1 1 1 1 1
2 2 2 2 2
1 1
{ } 1 ln 1 1 ln 1
2
1
1 ln 1 1 ln 1
E g y g y (g g )y y G ( y ) ( y ) ( y ) ( y )
a
G ( y ) ( y ) ( y ) ( y )
a
= + + + + + + +
+ + + +
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
The
The
Hopfield Network
Hopfield Network
2
1 1
1
1 1 11 12 1 1
2
21 22 2 2 2 2
2
2
2
1 1 1
0 ln
2 1 0
0 1 1
1
ln
0
1
2
y y
dy
a
C a y g g y G
dt
g g y G dy y
y
a
dt a y
C
(

(
(

(

(
(
+
( ( (

(
(
=
(
`
( ( (
(
(
(
(
(

+

)

2
1
1
1 11 1 12 2 1
1 1
2
2 2
2 22 2 21 1 2
2 2
1 1
(1 )( ln )
2 1
1 1
(1 )( ln )
2 1
y
dy
y ag y ag y G
C y
dt
dy y
y ag y ag y G
dt C y
(
(
+ +
(
(
+
(
=
(
(
(
+ +
(
(
+

Using
Using
( ) I (y) G Wy
y
1
+
|
|
.
|
\
|

=

i
i
C
) y ( a
diag
dt
d
2
1
2
the state equations are obtained as
the state equations are obtained as
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
I Gx Wy
) x(
C + =
t d
d t
Consider the state equation of the Gradient
Consider the state equation of the Gradient
-
-
Type
Type
Hopfield Network:
Hopfield Network:
I (y) G Wy
) x(
C
1 -
+ =
t d
d t
We can write
We can write
As the plot of the inverse
As the plot of the inverse
bipolar activation function
bipolar activation function
shows the second term in
shows the second term in
the above equation is zero
the above equation is zero
for high gain neurons.
Hence:
Hence:
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
I Wy
) x(
C + =
t d
d t
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
Now consider
Now consider
Using the this plot we
Using the this plot we
can conclude that
can conclude that
t
y
y
) y ( f
t
) y ( f
t
x
i
i
i i i
d
d
d
d
d
d
d
d
-1 -1
= =
-1
d ( )
0
d
i
i
f y
y
=
Now let us solve this equation using Jacobi
Now let us solve this equation using Jacobi
s
s
algorithm. To this end define:
algorithm. To this end define:
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
I Wy 0 + =
0
) x(
=
t d
d t
Hence
Hence
It follows that
It follows that
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
D = + W' W = L U
and
ii
w L,U D = diag( )
where
where
11 12 1N 11 12 1N
21 21 22 2N 22 2N
31 32
N1 N2 N,N 1 N1 N2 NN NN
0 0 . . 0 w w . . w w 0 . . 0 0 w . . w
w 0 . . 0 w w . . w 0 w . . 0 0 0 . . w
w w 0 . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
w w . w 0 w w . . w 0 0 . . w

( ( (
( ( (

( ( (
( ( ( = =
( ( (
( ( (
( ( (

W
N 1,N
. w
0 0 . . 0
(
(
(
(
(
(
(

Are the lower and upper triangular and diagonal
Are the lower and upper triangular and diagonal
matrices shown in the following decomposition
matrices shown in the following decomposition
of
of
W.
W.
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
ii
D w = diag( )
Now defining
Now defining
we obtain
we obtain
I
-1 -1
-Dy =W'y + I
y = -D W'y - D
I =
-1
-1
-D W' =
-D
W
I
Now define
Now define
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
+ y = y W I
Now replace the vector
Now replace the vector
y
y
on the right
on the right
-
-
hand side
hand side
by an initial
by an initial
y(0)
y(0)
vector
vector
.
.
If the vector
If the vector
y
y
on the left
on the left
-
-
hand side is obtained as
hand side is obtained as
y(0),
y(0),
then
then
y(0)
y(0)
is the
is the
solution of the system. If not then call the vector
solution of the system. If not then call the vector
y
y
obtained on the left
obtained on the left
-
-
hand side
hand side
y(1), i.e.,
y(1), i.e.,
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
and in general we can write
+ y(k + 1) = y(k) W I
+ y(1) = y(0) W I
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
The method will always converge if the matrix
The method will always converge if the matrix
W
W
is strictly or irreducibly
is strictly or irreducibly
diagonally
diagonally
dominant
dominant
. Strict row diagonal dominance
. Strict row diagonal dominance
means that for each row, the absolute value of
means that for each row, the absolute value of
the diagonal term is greater than the sum of
the diagonal term is greater than the sum of
absolute values of other terms:
absolute values of other terms:
ii ij
i j
w w
=
>
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
The Jacobi method sometimes converges even
The Jacobi method sometimes converges even
if this condition is not satisfied. It is necessary,
if this condition is not satisfied. It is necessary,
however, that the diagonal terms in the matrix
however, that the diagonal terms in the matrix
are greater (in magnitude) than the other
are greater (in magnitude) than the other
terms.
terms.
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
Solution by Gauss-Seidel Method
In Jacobis method the updating of the
unknowns is made after all N unknowns have
been moved to the left side of the equation. We
will see in the following that this is not
necessary, i.e., the updating can be made
individually for each unknown and this updated
value can be used in the next equation. This is
shown in the following equations:
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
1 12 2 13 3 1 1
11
1
( 1) [ ( ) ( ) ...... ( ) ]
N N
x n a x n a x n a x n b
a
+ = + + +
| |
2 21 1 23 3 2 2 22
22
1
( 1) ( 1) ( ) ..... ( )
N N
x n a x n a x n a x n b a
a
+ = + + + +
| |
3 31 1 32 2 34 2 3 3 33
33
1
( 1) ( 1) ( 1) ( ) ...... ( )
N N
x n a x n a x n a x n a x n b a
a
+ = + + +
1 1 2 2 , 1 1
1
( 1) ( 1) ( 1) ...... ( 1)
N N N N N N N NN
NN
x n a x n a x n a x n b a
a

( + = + + + +

In vector-matrix form, we can write:
1
( 1) ( 1) ( )
( ) ( 1) ( )
( 1) ( ) ( ( ) )
D n L n U n b
D L n U n b
n D L U n b
+ = + + +
+ = +
+ = +
x x x
x x
x x
and
and
Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
This matrix expression is mainly used to analyze
the method. When implementing Gauss Seidel,
an explicit entry-by-entry approach is used:
1
( 1) ( 1) ( )
i i ij j ij j
j i j i
ii
x n b a x n a x n
a
< >
(
+ = +
(

Discrete
Discrete
-
-
Time
Time
Hopfield Network
Hopfield Network
s
s
Gauss
Gauss
-
-
Seidel method
Seidel method
is defined on matrices with non
is defined on matrices with non
-
-
zero
zero
diagonals,
diagonals,
but convergence is only guaranteed if the matrix
but convergence is only guaranteed if the matrix
is
is
e
e
ither
ither
:
:
1. 1.
diagonally dominant
diagonally dominant
or
or
2. 2.
symmetric and
symmetric and
positive definite
positive definite
.
.

Neural Networks

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Networks

Uploaded by

Copyright:

Available Formats

NEURAL NETWORKS

inspired model consists of three stages:

Here we have added a new synapse, whose input

The purpose of the perceptron is to classify the set of

dimensional Cartesian space partitioning

..,R. The Euclidean

distance classifier computes the distance

1 on the real number axis.

The sample pattern vectors

The response is provided by the teacher and

The network learns from 'experience

The classifier structure is usually

Note that the vectors

By labeling each decision boundary in the

To find the solution for weights, we will

and of the negative decision

Inspection of the figure reveals that the

Let us now attempt to arrive iteratively

To accomplish this, the weights need

This assumption is due to our ignorance of

The adjustment discussed, or network

A neuron is considered to be an adaptive

d(i) is the teacher

r is the learning signal

c is a positive number called the learning

This reveals that the change in the weight

The trained classifier should provide the

The Perceptron Training Algorithm

This means that after N

However, x(1) and x(3) have class 1 ,i.e.,

This means that x(1), x(2), x(3), x(4) all are

= = = ) . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

= = = ) . . sgn( )) ( x ) ( w sgn( )) ( x ) ( w sgn( ) ( y

Let us now use the steepest descent method for the

= 0.5 for four arbitrarily chosen initial weight

=1 and Emax>0 chosen.

You might also like