P. K. Bora
Department of Electronics & Communication Engineering
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
Acknowledgement
I take this opportunity to thank Prof. A. Mahanta who inspired me to take the course
Statistical Signal Processing. I am also thankful to my other faculty colleagues of the
ECE department for their constant support. I acknowledge the help of my students,
particularly Mr. Diganta Gogoi and Mr. Gaurav Gupta for their help in preparation of the
handouts. My appreciation goes to Mr. Sanjib Das who painstakingly edited the final
manuscript and prepared the powerpoint presentations for the lectures. I acknowledge
the help of Mr. L.N. Sharma and Mr. Nabajyoti Dutta for wordprocessing a part of the
manuscript. Finally I acknowledge QIP, IIT Guwhati for the financial support for this
work.
SECTION I
REVIEW OF
RANDOM VARIABLES & RANDOM PROCESS
Table of Contents
CHAPTER  1: REVIEW OF RANDOM VARIABLES .............................................. 9
1.1 Introduction..............................................................................................................9
1.2 Discrete and Continuous Random Variables ......................................................10
1.3 Probability Distribution Function ........................................................................10
1.4 Probability Density Function ...............................................................................11
1.5 Joint random variable ...........................................................................................12
1.6 Marginal density functions....................................................................................12
1.7 Conditional density function.................................................................................13
1.8 Bayes Rule for mixed random variables.............................................................14
1.9 Independent Random Variable ............................................................................15
1.10 Moments of Random Variables ..........................................................................16
1.11 Uncorrelated random variables..........................................................................17
1.12 Linear prediction of Y from X ..........................................................................17
1.13 Vector space Interpretation of Random Variables...........................................18
1.14 Linear Independence ...........................................................................................18
1.15 Statistical Independence......................................................................................18
1.16 Inner Product .......................................................................................................18
1.17 Schwary Inequality ..............................................................................................19
1.18 Orthogonal Random Variables...........................................................................19
1.19 Orthogonality Principle.......................................................................................20
1.20 Chebysev Inequality.............................................................................................21
1.21 Markov Inequality ...............................................................................................21
1.22 Convergence of a sequence of random variables ..............................................22
1.23 Almost sure (a.s.) convergence or convergence with probability 1 .................22
1.24 Convergence in mean square sense ....................................................................23
1.25 Convergence in probability.................................................................................23
1.26 Convergence in distribution................................................................................24
1.27 Central Limit Theorem .......................................................................................24
1.28 Jointly Gaussian Random variables...................................................................25
CHAPTER  2 : REVIEW OF RANDOM PROCESS .................................................26
2.1 Introduction............................................................................................................26
2.2 How to describe a random process?.....................................................................27
2.3 Stationary Random Process ..................................................................................28
2.4 Spectral Representation of a Random Process ...................................................30
2.5 Crosscorrelation & Cross power Spectral Density............................................31
2.6 White noise process................................................................................................32
2.7 White Noise Sequence............................................................................................33
2.8 Linear Shift Invariant System with Random Inputs ..........................................33
2.9 Spectral factorization theorem .............................................................................35
2.10 Wolds Decomposition .........................................................................................37
CHAPTER  3: RANDOM SIGNAL MODELLING ...................................................38
3.1 Introduction............................................................................................................38
3.2 White Noise Sequence............................................................................................38
3.3 Moving Average model MA(q ) model ..................................................................38
3.4 Autoregressive Model ............................................................................................40
3.5 ARMA(p,q) Autoregressive Moving Average Model ......................................42
3.6 General ARMA( p, q ) Model building Steps .........................................................43
3.7 Other model: To model nonstatinary random processes ...................................43
5
7.13 Recursive representation of R YY [n ] ...............................................................106
7.14 Matrix Inversion Lemma ..................................................................................106
7.15 RLS algorithm Steps..........................................................................................107
7.16 Discussion RLS................................................................................................108
7.16.1 Relation with Wiener filter ............................................................................108
7.16.2. Dependence condition on the initial values..................................................109
7.16.3. Convergence in stationary condition............................................................109
7.16.4. Tracking nonstaionarity...............................................................................110
7.16.5. Computational Complexity ...........................................................................110
SECTION I
REVIEW OF
RANDOM VARIABLES & RANDOM PROCESS
X (s)
Notations:
Random variables are represented by
uppercase letters.
Values of a random variable are
denoted by lower case letters
Y = y means that y is the value of a
random variable X .
Example 1: Consider the example of tossing a fair coin twice. The sample space is S=
{HH, HT, TH, TT} and all four outcomes are equally likely. Then we can define a
random variable X as follows
Sample Point
P{ X = x}
Variable X = x
HH
1
4
HT
1
4
TH
1
4
TT
1
4
Example 2: Consider the sample space associated with the single toss of a fair die. The
sample space is given by S = {1, 2,3, 4,5,6} . If we define the random variable
that
associates a real number equal to the number in the face of the die, then X = {1, 2,3, 4,5,6}
xi
such that
P ( x ) = 1.
m
Pm ( xi ) is called the
A random variable may also de mixed type. In this case the RV takes
continuous values, but at each finite number of points there is a finite
probability.
FX ( x),
random variable X .
FX () = 0
FX () = 1
P{x1 < X x} = FX ( x) FX ( x1 )
10
FX ( x)
x<0
0 x <1
1
4
1 x < 2
1
2
2 x<3
3
4
x3
FX ( x)
X =x
d
FX ( x) is called the probability density function and
dx
f X (x)
( x)dx = 1
P ( x1 < X x 2 ) =
x2
( x)dx
x1
Remark: Using the Dirac delta function we can define the density function for a discrete
random variables.
11
S.
FX ( x) = FXY ( x,+).
To prove this
( X x ) = ( X x ) ( Y + )
F X ( x ) = P (X x ) = P (X x , Y
FX ( x) = P( X x ) = P( X x, Y ) = FXY ( x,+)
)=
XY
( x , + )
Similarly FY ( y ) = FXY (, y ).
Given FX ,Y ( x, y ),  < x < ,  < y < ,
each of FX ( x) and FY ( y ) is
f X ,Y ( x, y ) =
2
FX ,Y ( x, y ) , provided it exists
xy
FX ,Y ( x, y ) =
x y
f X ,Y ( x, y )dxdy
d
dx
d
dx
d
dx
=
and
fY ( y ) =
FX (x)
FX ( x, )
x
f X ,Y ( x , y ) d y ) d x
f X ,Y ( x , y ) d y
f X ,Y ( x , y ) d x
12
F
Y/X
P(Y y, X = x)
P ( X = x)
as both the numerator and the denominator are zero for the above expression.
The conditional distribution function is defined in the limiting sense as follows:
F
Y/X
( y / x) = limx 0 P (Y y / x < X x + x)
=limx 0
P (Y y , x < X x + x)
P ( x < X x + x)
y
=limx 0
f X ,Y ( x, u )xdu
f X ( x ) x
f X ,Y ( x, u )du
f X ( x)
f Y / X ( y / X = x ) = lim y 0 ( FY / X ( y + y / X = x ) FY / X ( y / X = x )) / y
= lim y0,x0 ( FY / X ( y + y / x < X x + x ) FY / X ( y / x < X x + x )) / y
Because ( X = x) = lim x 0 ( x < X x + x)
The right hand side in equation (1) is
lim y 0,x0 ( FY / X ( y + y / x < X < x + x) FY / X ( y / x < X < x + x)) / y
= lim y 0, x 0 ( P( y < Y y + y / x < X x + x)) / y
= lim y 0, x 0 ( P( y < Y y + y, x < X x + x)) / P ( x < X x + x)y
= lim y 0, x 0 f X ,Y ( x, y )xy / f X ( x)xy
= f X ,Y ( x, y ) / f X ( x)
fY / X ( x / y ) = f X ,Y ( x, y ) / f X ( x )
(2)
Similarly, we have
f X / Y ( x / y ) = f X ,Y ( x, y ) / fY ( y )
(3)
13
(1)
f X ,Y (x, y)
fY ( y)
f (x) fY / X ( y / x)
= X
fY ( y)
f (x, y)
= X ,Y
f X ,Y (x, y)dx
f X /Y (x / y) =
(4)
f ( y / x) f X (x)
= Y/X
f X (u) fY / X ( y / x)du
Given the joint density function we can find out the conditional density function.
Example 4:
For random variables X and Y, the joint probability density function is given by
1 + xy
x 1, y 1
4
= 0 otherwise
f X ,Y ( x, y ) =
Find the marginal density f X ( x), fY ( y ) and fY / X ( y / x). Are X and Y independent?
1
f X ( x) =
1 + xy
1
dy =
4
2
1
Similarly
fY ( y ) =
1
2
1 y 1
and
fY / X ( y / x ) =
f X ,Y ( x, y ) 1 + xy
=
, x 1, y 1
f X ( x)
4
= 0 otherwise
X and Y are not independent
14
PX
/Y
( x / y ) = li m
y 0
= li m
y 0
= lim
=
==
y 0
PX
/Y
( x / y < Y y + y )
PX
,Y
(x, y < Y y + y)
PY ( y < Y y + y )
PX ( x ) fY / X ( y / x ) y
fY ( y ) y
PX ( x ) fY / X ( y / x )
fY ( y )
PX ( x ) fY / X ( y / x )
PX ( x ) fY / X ( y / x )
x
Example 5:
X
V
X
1 with probability 2
X =
1 with probability 1
2
V is the Gaussian noise with mean
0 and variance .2
Then
PX / Y ( x = 1/ y ) =
PX ( x) fY / X ( y / x)
PX ( x) fY / X ( y / x)
x
e
e
( y 1) 2 / 2 2
( y 1)2 / 2 2
+ (e ( y +1)
/ 2 2
and f X ,Y ( x, y ) =
2
xy
FX ,Y ( x, y )
f X (x)
density functions.
15
It is far easier to estimate the expectation of a R.V. from data than to estimate its
distribution
The mean
Y = g ( X ) is given by EY = Eg ( X ) =
g ( x) f
( x )dx
Second moment
EX 2 = x 2 f X ( x)dx
Variance
X2 = ( x X ) 2 f X ( x) dx
E ( XY ) = X ,Y = xyf X ,Y ( x, y )dxdy
The ratio
E( X X )(Y Y )
E( X X )2 E(Y Y )2
XY
XY
is called the correlation coefficient. The correlation coefficient measures how much two
random variables are similar.
16
Two random variables may be dependent, but still they may be uncorrelated. If there
exists correlation between two random variables, one may be represented as a linear
regression of the others. We will discuss this point in the next section.
from
Regression
Prediction error Y Y
Mean square prediction error
E (Y Y ) 2 = E (Y aX b) 2
Corresponding to the
1
Y Y = 2 X ,Y ( x X )
XY
so that Y Y = X ,Y y ( x X ) , where X ,Y =
XY
x
Note that independence => Uncorrelatedness. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).
Example 6:
= EXY = EX 3 = 0
= EXEY ( EX = 0)
In fact for any zero mean symmetric distribution of X, X and X 2 are uncorrelated.
17
2. < x, x > = x 0
3. < x + y, z > = < x, z > + < y, z >
4. < ax, y > = a < x, y >
x =< x, x >
= EX 2 = x 2 f X (x)dx
The set of RVs along with the inner product defined through the joint expectation
operation and the corresponding norm defines a Hilbert Space.
18
 < x, y >  x y
For RV X and Y
E 2 ( XY ) EX 2 EY 2
Proof:
Nonnegatively of the lefthand side => its minimum also must be nonnegative.
For the minimum value,
dEZ 2
EXY
= 0 => a =
da
EX 2
so the corresponding minimum is
E 2 XY
E 2 XY
2
+
EY
2
EX 2
EX 2
E 2 XY
0
EX 2
=> E 2 XY < EX 2 EY 2
EY 2
( X ,Y ) =
Cov( X , Y )
Xx X
E ( X X )(Y Y )
E ( X X ) 2 E (Y Y ) 2
( X ,Y ) 1
< x, y > = 0
Cov( X ,Y ) = EXY
Therefore, if EXY = 0 then Cov ( XY ) = 0 for this case.
For zeromean random variables,
Orthogonality uncorrelatedness
19
which is statistically dependent on X . Given a value of Y what is the best guess for X ?
(Estimation problem).
Let the best estimate be
X (Y ) . Then
E ( X X (Y )) 2
to X (Y ) .
And the corresponding estimation principle is called minimum mean square error
principle. For finding the minimum, we have
Since
E ( X X (Y ))2 = 0
2
( x X ( y )) f X ,Y ( x, y )dydx = 0
2
( x X ( y )) fY ( y ) f X / Y ( x )dyd x = 0
2
fY ( y )( ( x X ( y )) f X / Y ( x )dx )dy = 0
fY ( y )
minimization is equivalent to
2
( x X ( y )) f X / Y ( x)dx = 0
Or 2 ( x X ( y )) f X / Y ( x)dx = 0

X ( y ) f X / Y ( x)dx = xf X / Y ( x)dx
X ( y ) = E ( X / Y )
Thus, the minimum meansquare error estimation involves conditional expectation which
is difficult to obtain numerically.
Let us consider a simpler version of the problem. We assume that
X ( y ) = ay and the
estimation problem is to find the optimal value for a. Thus we have the linear
minimum meansquare error criterion which minimizes E ( X aY ) 2 .
d
da
E ( X aY ) 2 = 0
E dad ( X aY ) 2 = 0
E ( X aY )Y = 0
EeY = 0
where e is the estimation error.
20
The above result shows that for the linear minimum meansquare error criterion,
estimation error is orthogonal to data. This result helps us in deriving optimal filters to
estimate a random signal buried in noise.
The mean and variance also give some quantitative information about the bounds of RVs.
Following inequalities are extremely useful in many practical problems.
mean
X and variance X2 . The quality control department rejects the item if the absolute
P{ X X }
X2
Proof:
x2 = ( x X ) 2 f X ( x)dx
( x X ) 2 f X ( x)dx
2 f X ( x) dx
X X
X X
= 2 P{ X X }
P{ X X }
X2
E( X )
a
where a > 0.
E ( X ) = xf X ( x)dx
0
xf X ( x)dx
a
af X ( x)dx
a
= aP{ X a}
21
P{ X a}
E( X )
a
Result: P{( X k ) 2 a}
E ( X k )2
a
variables. Suppose we want to estimate the mean of the random variable on the basis of
the observed data by means of the relation
1N
n i =1
X = X i
X as
P{s  X n ( s ) X ( s )} = 1
as
n ,
as
m ,
1 n
X i X with probability 1as n .
n i =1
22
as
square (M.S).
Example 7:
Now,
1N
1 N
E ( X i X ) 2 = E ( ( ( X i X )) 2
n i =1
n i =1
1 N
1 n n
= 2 E ( X i X ) 2 + 2 E ( X i X )( X j X )
n i=1
n i=1 j=1,ji
=
=
n X2
+0 ( Because of independence)
n2
X2
n
1N
lim E ( X i X ) 2 = 0
n
n i =1
P{ X n X > } 0
as
n .
> 2 } E ( X n X )2 / 2
(Markov Inequality)
We have
P{ X n X > } E ( X n X ) 2 / 2
If E ( X n X ) 2 0
P{ X n X > } 0 as
as
n .
23
Example 8:
1
n
and
P ( X n = 1} =
1
n
Clearly
P{ X n 1 > } = P{ X n = 1} =
1
0
n
as n .
P
Therefore { X n }
{ X = 0}
as n .
24
f X ,Y ( x, y ) = Ae
where
A=
1
2 )
2 (1 X
,Y
( x X
2
X
)2
2 XY
( x
)( y ) ( y )2
Y +
Y
Y2
X Y
1
2 x y 1 X2 ,Y
Properties:
(1) If X and Y are jointly Gaussian, then for any constants a and b, then the random
variable
Z , given by Z = aX + bY is Gaussian with mean Z = a X + bY and variance
Z 2 = a 2 X 2 + b 2 Y 2 + 2ab X Y X ,Y
(2) If two jointly Gaussian RVs are uncorrelated, X ,Y = 0 then they are statistically
independent.
f X ,Y ( x, y ) = f X ( x ) f Y ( y ) in this case.
X and Y are joint by Gaussian random variables then the optimum nonlinear
25
denoting time.
may be written as X (t , ).
X (t , s3 )
S
s3
s2
X (t , s2 )
s1
X (t , s1 )
26
Example 1:
n jointly
These
random
variables
define
random
random
vector
X = [ X (t1 ), X (t2 ),..... X (tn )]'. The process X (t ) is called Gaussian if the random vector
[ X (t1 ), X (t2 ),..... X (tn )]' is jointly Gaussian with the joint density function given by
1
X'CX1X
e 2
det(CX )
where C X = E ( X X )( X X ) '
27
For n = 1,
FX ( t1 ) ( x1 ) = FX ( t1 + t0 ) ( x1 ) t 0 T
Let us assume t 0 = t1
FX (t1 ) ( x1 ) = FX ( 0 ) ( x1 )
EX (t1 ) = EX (0) = X (0) = constant
For n = 2,
FX ( t1 ), X ( t 2 ) ( x1 , x 2 .) = FX ( t1 + t0 ), X ( t 2 +t0 ) ( x1 , x 2 )
Put t 0 = t 2
FX ( t1 ), X ( t2 ) ( x1 , x2 ) = FX ( t1 t2 ), X ( 0 ) ( x1 , x2 )
R X (t1 , t2 ) = R X (t1 t2 )
A random process X (t ) is called wide sense stationary process (WSS) if
X (t ) = constant
R X (t1 , t2 ) = R X (t1 t2 ) is a function of time lag.
For a Gaussian random process, WSS implies strict sense stationarity, because this
process is completely described by the mean and the autocorrelation functions.
The autocorrelation function
process.
X (t + )
= EX 2 (t ) EX 2 (t + )
= RX2 (0) RX2 (0)
RX ( ) <RX (0)
28
positive
ai a j RX (ti , t j )>0
i =1 j =1
If X (t ) is periodic (in the mean square sense or any other sense like with
probability 1), then R X ( ) is also periodic.
For a discrete random sequence, we can define the autocorrelation sequence similarly.
If R X ( ) drops quickly , then the signal samples are less correlated which in turn
means that the signal has lot of changes with respect to time. Such a signal has
high frequency components. If R X ( ) drops slowly, the signal samples are highly
correlated and such a signal has less high frequency components.
Figure Frequency Interpretation of Random process: for slowly varying random process
Autocorrelation decays slowly
29
Lets define
X T (t) = X(t)
T < t < T
= 0
otherwise
Define X T ( w) =
t2
t1
X T ( ) X T * ( )
 X ( ) 2
1 TT
jt + jt
=E T
=
EX T (t1 )X T (t 2 )e 1 e 2 dt1dt 2
2T
2T
2T T T
=
=
Substituting t1 t 2 = so that
E
1 T T
j ( t1 t2 )
dt1dt2
RX (t1 t2 )e
2T T T
1 2T
j
(2T  )d
RX ( )e
2T 2T
t 2 = t1 is a line, we get
X T ( ) X T * ( ) 2T
 
= R x ( )e j (1
) d
2T
2T
2T
If R X ( ) is integrable then as T ,
2
limT
E X T ( )
= RX ( )e j d
2T
30
E X T ( )
2T
density.
Thus
R ( )e
SX ( ) =
and R X ( ) =
SX ( )e j dw
Properties
w2
h X ( w) =
E X T ( )
is always positive.
2T
S x ( )
= normalised power spectral density and has properties of PDF,
EX 2 (t )
defined as
R X,Y ( ) = X (t + )Y (t )
so that RYX ( ) = Y (t + ) X (t )
= X (t )Y (t + )
= R X,Y ( )
RYX ( ) = R X,Y ( )
S X ,Y ( w) = RX ,Y ( )e jw d
31
S X ,Y ( w) = SY* , X ( w)
S X ( w) = Rx [ m ] e j m
m =
or S X ( f ) = Rx [ m] e j 2 m
1 f 1
m =
RX [m ] =
1
jm
S X ( w)e dw
2
For a discrete sequence the generalized PSD is defined in the z domain as follows
S X ( z) =
R [m ] z
m =
R X ( ) = e
a>0
2a
a 2 + w2
S X ( w) =
(2) RX ( m) = a
S X ( w) =
 < w <
a >0
1 a2
1 2a cos w + a 2
 w
N
2
< f <
N
( )
2
N
df
2
Pavg =
32
If the system bandwidth(BW) is sufficiently narrower than the noise BW and noise
PSD is flat , we can model it as a white noise process. Thermal and shot noise are well
modelled as white Gaussian noise, since they have very flat psd over very wide band
(GHzs
For a zeromean white noise process, the correlation of the process at any lag 0 is
zero.
Similar role as that of the impulse function in the modeling of deterministic signals.
White Noise
N
2
Therefore
N
( m)
2
where ( m) is the unit impulse sequence.
RX (m) =
RX [ m]
NN
N
N
22
22
h[n]
x[n]
S X ( )
y[n]
y[ n] = x[ n] * h[ n]
E y[ n] = E x[n] * h[n]
33
S XX (w)
H(w )
Example 3:
Suppose
H ( w) = 1
wc w wc
= 0 otherwise
Then
SX ( w) =
N
2
SY ( w) =
N
2
wc w wc
R X ( )
N
R Y ( ) =
sinc(w c )
2
and
Note that though the input is an uncorrelated process, the output is a correlated
process.
Consider the case of the discretetime system with a random sequence x[n ] as an input.
x[n ]
h[n ]
y[n ]
S XX ( z )
H (z )
H ( z 1 )
RYY [m ]
SYY [z ]
34
Example 4:
If
H ( z) =
1
1 z 1
SYY ( z ) = H ( z ) H ( z 1 )
1 1 1
=
1
1 z 1 z 2
1
a m
2
1
X [n ]
sequence.
If S X (w) is an analytic function of w ,
where
H c ( z ) is the causal minimum phase transfer function
H a ( z ) is the anticausal maximum phase transfer function
X [n ]
H c ( z)
Figure Innovation Filter
1
H c ( z)
X [n ]
v[n]
ln S XX ( z ) = c[k ]z k
k =
35
where c[k ] =
1
iwn
ln S XX ( w)e dw is the kth order cepstral coefficient.
2
1
ln S XX ( w)dw
2
c[ k ] z k
S XX ( z ) = ek =
=e
c[ k ] z k
c[0]
ek =1
c[ k ] z k
ek =
Let
c[ k ] z k
z >
H C ( z ) = ek =1
H a ( z) = e
c( k ) z k
k =
c( k ) zk
= ek =1
= H C ( z 1 )
z<
Therefore,
S XX ( z ) = V2 H C ( z ) H C ( z 1 )
where V2 = ec ( 0)
Salient points
filter
1
H C (z)
1
to filter the given signal to get the innovation sequence.
HC ( z)
X [n ] and v[n] are related through an invertible transform; so they contain the
same information.
36
sequence as input.
X p [n] is a predictable process, that is, the process can be predicted from its own
37
v[n] is of 0
SV ( w )
V2
2
v[n]
X [ n ]
X [n] = bi v[n i ]
i =o
e = 0 X = 0
and v[n] is an uncorrelated sequence means
q
X 2 = bi 2 V2
i =0
RX [ m] = E X [ n ] X [ n  m]
q
= bib j RV [m i + j ]
i = 0 j =0
RX [m ] = b j b j + m V2
j =0
0mq
and
RX [ m ] = R X [m ]
RX [m ] =
q m
bb
j =0
j+ m
v2
m q
= 0 otherwise
V2
2
jw
+ ......bq e jqw
B ( w) , where B( w) = = b + b1 e
2
FIR system will give some zeros. So if the spectrum has some valleys then MA will fit
well.
RX [m]
m
Figure: Autocorrelation function of a MA process
39
R X [ m]
S X ( w)
X2 = b12 + b02
RX [1] = b1b0
From above b0 and b1 can be calculated using the variance and autocorrelation at lag 1 of
the signal.
IIR
filter
X [n ]
X [ n] = ai X [ n i ] + v[ n]
i =1
A( w) =
1
n
1 ai e ji
i =1
e2
2  A( ) 2
If there are sharp peaks in the spectrum, the AR(p) model may be suitable.
S X ( )
R X [ m]
= ai EX [n i ] X [ n m] + Ev[n] X [n m]
i =1
p
= ai RX [m i ] + V2 [m]
i =1
RX [ m] = ai RX [ m i ] + V2 [ m]
i =1
mI
The above relation gives a set of linear equations which can be solved to find ai s.
These sets of equations are known as YuleWalker Equation.
Example 2: AR(1) process
X [n] = a1 X [n 1] + v[ n]
RX [m] = a1RX [m 1] + V2 [ m]
RX [0] = a1RX [ 1] + V2
(1)
RX [1]
RX [0]
V2
1 a 2
After some arithmatic we get
RX [ m] =
a V2
1 a 2
41
v[n]
X [ n]
i =1
i =0
x[n] = ai X [ n i ] + bi v[ n i ]
(ARMA 1)
B ( )
A( )
2
S X ( w) =
B ( ) V2
2
A( ) 2
RX [ m] = ai RX [ m i ]
m max( p, q + 1)
i =1
From a set of
~
X [n ] = bi v[n i ]
q
i =0
X [ n] = Cz[ n]
where
42
0 1......0
A=
, B = [1 0...0] and
...............
0 0......1
C = [b0 b1...bq ]
Identification of p and q.
ARIMA model: Here after differencing the data can be fed to an ARMA
model.
SARMA model: Seasonal ARMA model etc. Here the signal contains a
seasonal fluctuation term. The signal after differencing by step equal to the
seasonal period becomes stationary and ARMA model can be fitted to the
resulting data.
43
SECTION II
ESTIMATION THEORY
44
For speech, we have LPC (linear predictive code) model, the LPCparameters are
to be estimated from observed data.
We may have to estimate the correct value of a signal from the noisy observation.
Array of sensors
Signals generated by
the submarine due
Mechanical movements
of the submarine
Estimate
the
location
of
the
submarine.
Generally estimation includes parameter estimation and signal estimation.
We will discuss the problem of parameter estimation here.
We have a sequence of observable random variables X 1 , X 2 ,...., X N , represented by the
vector
X1
X
X= 2
X N
parameter given by
f X ( x1 , x 2 ,..., x N  ) = f X ( x  )
45
An estimator (X) is a rule by which we guess about the value of an unknown on the
basis of X.
(X) is a random, being a function of random variables.
1 N
X i is an estimator for X .
N i =1
X2 =
1 N
2
(Y1 X ) is an estimator for
N i =1
X2 .
12 =
2 2 =
1
N
(X
X ) 2
i =1
1 N
( X i X ) 2
N 1 i =1
E ( X i X ) 2 = E ( X i X + X X ) 2
i =1
= E {( X i X ) 2 + ( X X ) 2 + 2( X i X )( X X )}
Now E ( X i X ) 2 = 2
46
and E ( X X )
Xi
= E X
E
( N X X i )2
N2
E
= 2 (( X i X )) 2
N
E
= 2 ( X i X ) 2 + E ( X i X )( X j X )
N
i j i
E
= 2 ( X i X ) 2 (because of independence)
N
=
X2
N
also E ( X i X )( X X ) = E ( X i X ) 2
N
E ( X i X ) 2 = N 2 + 2 2 2 = ( N 1) 2
i =1
1
E ( X i X ) 2 = 2
N 1
So E 22 =
22 is an unbiased estimator of .2
X =
1
N
E X =
X
i =1
1
N
E{ X } =
i =1
N X
= X
N
47
MSE should be as as small as possible. Out of all unbiased estimator, the MVUE has the
minimum mean square error.
MSE is related to the bias and variance as shown below.
MSE = E ( ) 2 = E ( E + E ) 2
= E ( E) 2 + E ( E ) 2 + 2 E ( E)( E )
= E ( E) 2 + E ( E ) 2 + 2( E E)( E )
= var() + b 2 () + 0 ( why ?)
So
MSE = var() + b 2 ()
N P
P 
E ( ) 2
48
Example 1:
variance X2 .
Let X =
1 N
Xi be an estimator for X . We have already shown that X is unbiased.
N i=1
Also var( X ) =
Clearly
X2
N
Is it a consistent estimator?
2
lim var( X ) = lim X = 0. Therefore X is a consistent estimator of X .
N
N N
1 , X 2 .. X N  ( X 1 , X 2 ... X N
( x1 , x2 ,... x N )
and and
Example 2:
Then X =
1 N
Xi is a sufficient statistic of .
N i=1
49
f X1 , X 2 .. X N / X ( x1 , x2 ,...xN ) =
i =1
Because
1
e
2
2
1
x X
2 i
1
=
e
( 2 ) N
2
1N
x X
2 i =1 i
1
=
e
( 2 ) N
2
1N
x +
2 i =1 i
1
e
( 2 ) N
1N
2
2
( x X ) + ( X ) + 2( xi )( X )
2 i =1 i
1
=
e
( 2 ) N
1N
2
( x X )
2 i =1 i
1N
2
( X )
2 i =1
e0 ( why ?)
is an MVUE if
E ( ) =
and
Var () Var ()
where
Can we reduce the variance of an unbiased estimator indefinitely? The answer is given by
the Cramer Rao theorem.
Suppose is an unbiased estimator of random sequence. Let us denote the sequence by
the vector
X1
X
X= 2
X N
Let f X ( x1 ,......, x N / ) be the joint density function which characterises X. This function
is also called likelihood function. may also be random. In that case likelihood function
will represent conditional joint density function.
50
L 2
) and I ( ) is a measure of average information in the random
L
= c( ) where c is a constant.
( ) f X (x / )dx = 0.
{( ) f
( x / )}dx = 0.
( )
f (x / )}dy
X
( )
Note that
f X (x / )dx = 0.
f (x / )}dy =
X
f X (x / )dx = 1.
(1)
f X (x / ) =
{ln f X ( x / )} f X ( x / )
L
= ( ) f X (x / )
( ){ L(x / )} f
(x / )}dx = 1.
So that
2
L(x / ) f X (x / )dx dx = 1 .
( ) f X (x / )
(2)
since f X (x / ) is 0.
Recall the Cauchy Schawarz Ineaquality
2
51
L(x / ) f X (x / )dx dx
( ) f X (x / )
( ) 2 f X (x / ) dx
 ( L(x / ) f X (x / ) dx
= var( ) I( )
L.H .S var( ) I( )
But R.H.S. = 1
var() I( ) 1.
1
var()
,
I ( )
which is the Cramer Rao Inequality.
The equality will hold when
{L( x / ) f X ( x / )} = c ( ) f X (x / ) ,
so that
L(x / )
= c ( )
Also from
f X (x / )dx = 1,
we get
f X (x / )dx = 0
L
f X (x / )dx = 0
L
f X (x / ) dx = 0
2 f X (x / ) +
2
2 L
L
f
(
x
/
)
2 f X (x / ) +
dx = 0
2L
L
E
=E 2
If
52
Remark:
(1) If the information
I ( )
I1 ( ) = E
ln( f X / ( x))
1
ln( f X , X .. X / ( x1 x2 ..xN ))
I N ( ) = E
N
1 2
= NI1 ( )
Example 3:
are iid Gaussian random sequence with known variance 2 and
Let X 1 ............. X N
unknown mean .
1
N
Suppose =
X
i =1
which is unbiased.
1
( ( 2 ) N N
so that L( X / ) = ln( 2 ) N N
Now
1
2
(x
i =1
1 N ( x )2
2 2 i =1 i
)2
N
L
1
= 0  2 ( 2) (X i )
i =1
2/
2L
N
=  2
2
So that E
2L
N
= 2
2
CR Bound =
1
L
=
2 2
=
1
=
I ( )
(X
i =1
1 2
1
=
=
N
2L
N
 E 2 2
) =
N Xi
2
i N
(  )
53
L
= c (  )
Maximum Likelihood
Bayes Method.
X 1 ................. X N
f X ( x1 , x N / ) = 0
MLE
or
L(x  )
MLE
=0
Thus the MLE is given by the solution of the likelihood equation given above.
1
2
If we have a number of unknown parameters given by =
N
54
1 =
1
1MLE
= .... =
2 = 2MLE
=0
M = MMLE
Example 4:
Let X 1 ................. X N are independent identically distributed sequence of N ( , 2 )
distributed random variables. Find MLE for , 2 .
1 x
i
1
e
f X ( x1 , x 2 ,......... , x N / , ) =
i =1
2
2
L( X / , 2 ) = ln f X ( X 1 ,........., X N / , 2 )
1 N x
=  N ln N ln 2 N ln  i
2 i=1
N
N
L
x
= 0 => i
= (x i MLE ) = 0
i =1
i =1
N
L
=0=
+
MLE
(x
MLE ) 2
MLE
=0
Solving we get
MLE =
1 N
xi
N i=1
MLE 2 =
and
1 N
2
( xi MLE )
N i=1
Example 5:
1 x
e
2
 < x <
1 xi
f X1 , X 2 ... X N / ( x1 , x2 ....x N ) = N e i =1
2
N
L( X / ) = ln f X / ( x1 ,........., xN ) = N ln 2 xi
i =1
x
i =1
L(x / ) = c( )
at = ,
L( x/)
= c( ) = 0
invertible function of .
Parameter
with density
f ( )
The likelihood function will now be the conditional density
f ( x / ).
f X , ( x, ) = f ( ) f X / ( x )
f ( ) f X / ( x)
f X ( x)
56
C ( )
C ( )
=
C = EC (, )
C( , ) f
X ,
C ( )
( x , )dx d
/2
/2
Estimation problem is
( )
Minimize
f X , ( x , )dx d
with respect to .
This is equivalent to minimizing
( )
f(  x)f ( x )d dx
2
( ( ) f(  x)d ) f (x) dx
Since f ( x ) is always +ve, the above integral will be minimum if the inner integral is
minimum. This results in the problem:
Minimize
( )
f(  x)d )
57
with respect to .
=>
( )2 f / X ( ) d = 0
=> 2 ( ) f / X ( ) d = 0
=> f / X ( ) d =
=> =
/ X
( ) d
/ X
( )d
is the conditional mean or mean of the a posteriori density. Since we are minimizing
quadratic cost it is also called minimum mean square error estimator (MMSE).
Salient Points
Risk
=
C = EC (, )
C( , ) f
X ,
( x , )dx d
c( , )
f / X ( ) f X (x)d dx
( c( , ) f
/ X
( )d ) f X (x) dx
We have to minimize
C (,) f
/ X
( )d
with respect to .
+
2
f / X ( ) d
58
+
2
f / X ( ) d f / X ()
This will be maximum if f / X ( ) is maximum. That means select that value of that
maximizes the a posteriori density. So this is known as maximum a posteriori estimation
(MAP) principle.
This estimator is denoted by MAP .
Case III
C (, ) =
C ( )
C = Average cost=E
= f , X ( , x)d dx
= f ( ) f X / (x)d dx
= f X / (x)dxf ( )d
C (, ) f / X (  x) d = 0
( ) f / X (  x) d + ( ) f / X (  x) d = 0
f / X (  x) d f / X (  x) d = 0
At the MAE
MAE
MAE
f / X (  x) d f / X (  x) d = 0
59
Example 6:
Let X 1 , X 2 ...., X N be an iid Gaussian sequence with unity Variance and unknown mean
. Further is known to be a 0mean Gaussian with Unity Variance. Find the MAP
estimator for .
We are given
Solution:
f ( ) =
1 12 2
e
2
N
1
i =1
f / X ( ) =
e
( 2 ) N
Therefore f / X ( ) =
( xi )2
2
f ( ) f X / (x)
f X ( x)
ln f ( ) f X / (x) is maximum
N
(x )2
1
2 i
is maximum
2
2
i =1
N
=0
( x i )
i =1
=MAP
MAP =
1
xi
N + 1 i =1
Example 7:
for 0, > 0
and
f X / ( x) = e x
x >0
f ( ) f X / ( x)
f X ( x)
60
ln f X / (x) ] = 0
MAP
1
MAP =
+X
+
V
0 and variance .2
( y x ) 2 / 2 2
[ ( x 1) + ( x + 1)]
( y 1)2 / 2 2
+ e ( y +1)
/ 2 2
e
X MMSE = E ( X / Y ) = x
( y x )2 / 2 2
Hence
[ ( x 1) + ( x + 1)]
( y 1)2 / 2 2
e ( y +1)
/ 2 2
e ( y 1) / 2 + e ( y +1)
= tanh( y / 2 )
/ 2 2
e ( y 1)
/ 2 2
+ e ( y +1)
/ 2 2
dx
61
To summarise
f X / ( x)
MLE:
Simplest
MLE
MMSE:
MMSE = E ( / X) MMSE
MAP:
f ( ) f X / (x)
f X ( x)
MAP is given by
ln f ( ) + ln( f X / (x)) = 0
ln f ( ) = 0
62
f ( )
MIN
MAX
If MAP MIN
then MAP = MIN
If MLE MAX
then MAP = MAX
63
SECTION III
OPTIMAL FILTERING
64
X [n ] + V [n ]
+
V
Maximum likelihood estimation for X [n ] determines that value of X [n ] for which the
sequence Y [i ], i = 1, 2,..., n
Y [i ], i = 1, 2,..., n by the random vector and the value sequence y[1], y[2],..., y[n] by
f Y[ n ] / X [ n ] ( y[n] / x[n]) =
1
2
i =1
( y [ i ] x[ n ])2
2
1 n
y[i]
n i =1
f X [ n ]/ Y[ n ] ( x[n] / y[n]) =
x [ n ]
1
2
i =1
e
=
f Y[ n ] (y[n])
2
( y [ i ] x[ n ])2
2
Taking logarithm
65
n
1
( y[i ] x[n]) 2
log e f X [ n ]/ Y[ n ] ( x[n]) = x 2 [n]
log e f Y[ n ] (y[n])
2
2
i =1
=0
xMAP [ n ]
xMAP [n] =
y[i]
i =1
n +1
y[i]
i =1
n +1
For MMSE we have to know the joint probability structure of the channel and the
source and hence the a posteriori pdf.
66
x[ n]
y[ n ]
Syste
m
Filter
Noise
y[ n M 1] .. y[ n]
n M 1
n+ N
The linear minimum mean square error criterion is illustrated in the above figure. The
problem can be slated as follows:
Given observations of data y[ n M + 1], y[ n M + 2].. y[ n],... y[ n + N ], determine a set of
parameters h[ M 1], h[ M 2]..., h[o],...h[ N ] such that
x[n] =
M 1
h[i] y[n i]
i = N
E ( x[ n] x[ n]) is
2
minimum with
respect
to
The problem of deterring the estimator parameters by the LMMSE criterion is also called
the Wiener filtering problem. Three subclasses of the problem are identified
1. The optimal smoothing problem N > 0
2. The optimal filtering problem
N =0
67
In the smoothing problem, an estimate of the signal inside the duration of observation of
the signal is made. The filtering problem estimates the curent value of the signal on the
basis of the present and past observations. The prediction problem addresses the issues of
optimal prediction of the future value of the signal on the basis of present and past
observations.
M 1
i = N
We have to minimize Ee 2 [n] with respect to each h[i ] to get the optimal estimation.
Corresponding minimization is given by
E {e2 [n]}
= 0, for j = N ..0..M 1
h[ j ]
can be interchanged)
h[ j ]
Ee[ n] y[ n  j ] = 0, j = N ...0,1,...M 1
(1)
or
e[ n ]
E x[n]
RXY ( j ) =
M 1
i = N a
M 1
h[i]R
i = N a
YY
j = N ...0,1,...M 1
[ j i ], j = N ...0,1,...M 1
(2)
(3)
This set of N + M + 1 equations in (3) are called Wiener Hopf equations or Normal
equations.
The result in (1) is the orthogonality principle which implies that the error is
orthogonal to observed data.
x[ n] is the
observations
The estimation uses second orderstatistics i.e. autocorrelation and crosscorrelation functions.
68
If x[ n] and y[ n] are jointly Gaussian then MMSE and LMMSE are equivalent.
Otherwise we get a suboptimum result.
where x[ n] and e[n] are the parts of x[ n] respectively correlated and uncorrelated
with y[ n]. Thus LMMSE separates out that part of x[ n] which is correlated with y[ n].
Hence
the Wiener filter can be also interpreted as the correlation canceller. (See
Orfanidis).
E x[n]
M 1
h[i]R
YY
i =0
M 1
j = 0,1,...M 1
i =0
[ j i ] = RXY ( j ), j = 0,1,...M 1
R YY
RYY [1 M ]
RYY [2 M ]
RYY [0]
and
rXY
RXY [0]
R [1]
= XY
...
RXY [ M 1]
and
h[0]
h[1]
h=
...
h[ M 1]
69
Therefore,
1
h = R YY
rXY
=E x[n]
M 1
M 1
R XX
rXY
Wiener
Estimation
h[0]
x[n]
Z1
h[1]
h[ M 1]
Z1
70
x[n] = A cos[ w0 n + ], w0 =
A2
cos w 0 m
2
RYY [m] = E y[n] y[n m]
RXX [m] =
A2
cos ( w0 m) + [m]
=
2
RXY [m] = E x[n] y[n m]
XX
RYY [2] RYY 1] RYY [0] h[2] RXX [2]
A2
2 +1
2
A
2 cos 4
2
A cos
2
2
A2
cos
2
4
2
A
+1
2
A2
cos
2
4
A2
A2
cos h[0]
2
2
2
A2
A2
2
2
+ 1 h[2] cos
2
2
suppose A = 5v then
12.5
0
13.5
12.5
2
h[0] =
12.5
12.5
12.5
13.5
[1]
h
2
2
2
h[2]
12.5
0
13.5
0
2
71
13.5
h[0]
h[1] = 12.5
2
h[2]
0
h[0] = 0.707
h[1 = 0.34
h[2] = 0.226
12.5
0 12.5
2
12.5 12.5
13.5
2 2
12.5
13.5 0
Plot the filter performance for the above values of h[0], h[1] and h[2]. The following
figure shows the performance of the 20tap FIR wiener filter for noise filtering.
72
is an MA(1)
noise. We want to control v1[n] with the help of another correlated noise v2 [n] given by
v2 [n] = 0.8v[n 1] + v[n]
x[ n]
x[n] + v1[n]
v2 [n]
2tap FIR
filter
and
1.64 0.8
RV2V2 =
and
0.8 1.64
1.48
rV1V2 =
0.6
h[0] 0.9500
h[1] 0.0976
Example 3:
(Continuous time prediction) Suppose we want to predict the continuoustime process
X (t ) at time (t + ) by
X (t + ) = aX (t )
Then by orthogonality principle
E ( X (t + ) aX (t )) X (t ) = 0
a=
RXX ( )
RXX (0)
d
X (t ) = AX (t ) + v(t )
dt
In this case,
RXX ( ) = RXX (0)e A
a =
RXX ( )
= e A
RXX (0)
Therefore, the linear prediction of such a process based on any past value is same as the
linear prediction based on current value.
h[n]
x[n]
x (n) = h(i ) y (n i )
i =0
We have to minimize Ee 2 [n] with respect to each h[i ] to get the optimal estimation.
h[i]
i =0
74
SYY ( z ) = v2 H c ( z ) H c ( z 1 )
y[n]
v[n]
1
H1 ( z ) =
H c ( z)
Whitening filter
y[n]
v[n]
H1 ( Z )
H 2 ( z)
x[n]
Wiener filter
Now h2 [n] is the coefficient of the Wiener filter to estimate x[ n] I from the innovation
sequence v[n]. Applying the orthogonality principle results in the Wiener Hopf equation
x (n) = h2 (i )v( n i )
i =0
h2 (i ) V2 [ j i ] = RXV ( j ), j = 0,1,...
i =0
So that
h2 [ j ] =
RXV [ j ]
H 2 ( z) =
V 2
j 0
[ S XV ( z )] +
V 2
RXV [ j ] = Ex[n]v[n j ]
= hi [i ]E x[n] y[n  j  i ]
i =0
h [i]R
i =0
XY
[j + i]
S XV ( z ) = H1 ( z 1 ) S XY ( z ) =
H 2 ( z) =
1
S XY ( z )
H c ( z 1 )
1 S XY ( z )
V2 H c ( z 1 ) +
Therefore,
H ( z ) = H1 ( z ) H 2 ( z ) =
S XY ( z )
1
H c ( z ) H c ( z 1 ) +
2
V
We have to
find the power spectrum of data and the cross power spectrum of the of the
desired signal and data from the available model or estimate them from the data
factorize the power spectrum of the data using the spectral factorization theorem
=Ee[n] x[n]
error isorthogonal to data
=E x[n]
1
2
1
2
1
=
2
=
S X ( w)dw
(S
(S
1
2
H (w)S
*
XY
( w)dw
*
( w) H ( w) S XY
( w))dw
( z ) H ( z ) S XY ( z 1 )) z 1dz
76
Example 4:
y[n] = x[n] + v1[n] observation model with
x[ n] = 0.8 x [ n 1] + w[ n]
where v1[n] is and additive zeromean Gaussian white noise with variance 1 and w[ n]
is zeromean white noise with variance 0.68. Signal and noise are uncorrelated.
Find the optimal Causal Wiener filter to estimate x[ n].
Solution:
1
1 0.8z 1
w[n]
S XX ( z ) =
x[n]
0.68
(1 0.8 z 1 ) (1 0.8 z )
SYY ( z ) = S XX ( z ) + 1
Factorize
SYY ( z ) =
=
0.68
+1
(1 0.8z 1 ) (1 0.8 z )
2(1 0.4 z 1 )(1 0.4 z )
(1 0.8z 1 ) (1 0.8z )
H c ( z) =
(1 0.4 z 1 )
(1 0.8 z 1 )
and
V2 = 2
Also
RXY [m] = E x[n] y[n m]
S XY ( z ) = S XX ( z )
=
0.68
(1 0.8z 1 ) (1 0.8z )
77
H ( z) =
=
=
S XY ( z )
1
V2 H c ( z ) H c ( z 1 ) +
1 (1 0.8 z 1 )
0.68
(1 0.4 z )
1
h[n] = 0.944(0.4) n n 0
H ( z)
x[n]
h[i] y[n i]
i =
E x [ n]
i =
i =
h [i ] y [n i ] y[n j ] = 0 j I
H ( z ) SYY ( z ) = S XY ( z )
so that
H ( z) =
S XY ( z )
SYY ( z )
or
H ( w) =
S XY ( w)
SYY ( w)
78
=Ee[n ] x[n ]
error isorthogonal to data
=E x[n ]
=RXX [0]
1
2
1
2
1
=
2
=
i =
i =
h[i ]RXY [i ]
( w)dw
(S
(S
1
2
H ( w) S
*
XY
( w)dw
*
( w) H ( w) S XY
( w))dw
( z ) H ( z ) S XY ( z 1 )) z 1dz
where v[ n] is and additive zeromean Gaussian white noise with variance V2 . Signal
and noise are uncorrelated
SYY ( w) = S XX ( w) + SVV ( w)
and
S XY ( w) = S XX ( w)
H ( w) =
S XX ( w)
S XX ( w) + SVV ( w)
S XY ( w)
SVV ( w)
=
S XX ( w
) +1
SVV ( w)
Suppose SNR is very high
H ( w) 1
S XX ( w)
SVV ( w)
79
(i.e. If noise is high the corresponding signal component will be attenuated in proportion
of the estimated SNR.
H ( w)
SNR
Signal
Noise
Figure  (a) a highSNR signal is passed unattended by the IIR Wiener filter
(b) Variation of SNR with frequency
S XX ( w)
S XX ( w) + SVV ( w)
SYY ( w) SVV ( w)
SYY ( w)
Example 7:
Consider the signal in presence of white noise given by
x[ n] = 0.8 x [ n 1] + w[ n]
where v[ n] is and additive zeromean Gaussian white noise with variance 1 and w[ n]
is zeromean white noise with variance 0.68. Signal and noise are uncorrelated.
Find the optimal noncausal Wiener filter to estimate x[ n].
80
0.68
(10.8z ) (1 0.8z )
1
SXY ( z )
=
SYY ( z )
2 (10.4z 1 ) (1 0.4z )
H ( z) =
(10.6z ) (1 0.6z )
1
0.34
(10.4z ) (1 0.4z )
1
0.4048 0.4048
+
10.4z 1 10.4z
h[n] = 0.4048(0.4) n u (n) + 0.4048(0.4) n u ( n 1)
=
h[ n]
n
Figure  Filter Impulse Response
81
y[n  1], y[n  2], . y[n  M ], what is the best prediction for y[n]?
(onestep ahead prediction)
The minimum mean square error prediction
y[n]
for
y[n]
is given by
For an exact AR (M) process, linear prediction model of order M and the
corresponding AR model have the same parameters. For other signals LP model gives
an approximation.
Speech modeling
DPCM coding
Speech recognition
ECG modeling
LPC (10) is the popular linear prediction model used for speech coding. For a frame of
speech samples, the prediction parameters are estimated and coded. In CELP (Code book
Excited Linear Prediction) the prediction error e[n] = y[n] y[n] is vector quantized and
transmitted.
y [n ] =
It is called
i =1
82
Therefore
is the prediction error and the corresponding filter is called prediction error filter.
Linear Minimum Mean Square error estimates for the prediction parameters are given by
the orthogonality relation
E e[n] y[n  j ] = 0 for j = 1, 2 ,, M
M
j = 1, 2,, M
i =1
RYY [ j ] RYY [ j ] =
h[i] R
YY
i =1
[ j  i] = 0
h[i] R
YY
i =1
[ j  i]
j = 1, 2 ,, M
which is the Wiener Hopf equation for the linear prediction problem and same as the Yule
Walker equation for AR (M) Process.
In Matrix notation
RYY [0] RYY [1] .... RYY [ M 1]
YY
YY
YY
RYY [1]
h[1]
R [2]
h[2]
YY
=
.
h[ M ]
RYY [ M ]
R YY h = rYY
h = ( R YY ) 1 rYY
= Ey[n]e[n]
M
83
y [n] =
i =1
[i ] y[n  i ]
y [n  M ] =
b
i =1
[i] y[n + 1 i ]]
j = 1, 2 , , M
YY
YY
YY
RYY [0]
RYY [ M 1] RYY [ M  2] ......
bM [1]
RYY [ M ]
b [2]
R [ M 1]
M
YY
bM [ M ]
RYY [1]
(1)
RYY [1]
hM [1]
R [2]
h [2]
YY
hM [ M ]
RYY [ M ]
(2)
bM [i ] = hM [ M + 1 i ], i = 1,2..., M
Thus forward prediction parameters in reverse order will give the backward prediction
parameters.
M S prediction error
M
M = E y[n M ] bM [i] y [n + 1 i ] y [n M ]
i =1
Example 1:
Find the second order predictor for y[n ] given y[n] = x[n] + v[n], where v[n] is a 0mean
white noise with variance 1 and uncorrelated with x[n] and x[n] = 0.8x[n  1] + w[n] ,
RYY [1]
RYY [0]
h2 [1]
h [2] =
2
RYY [1]
R [2]
YY
RXX [m] =
0.2004.
85
YY
RYY [0]
hm [1]
RYY [1]
h [ 2]
.
=
RYY [m]
hm [m ]
(1)
hm [m ]
RYY [m]
h [m 1]
.
RYY [1]
hm [1]
(2)
RYY [0]
RYY [m1] ......
.
.
RYY [m 1]
RYY [m]
RYY [m 1]
RYY [ m 2]
RYY [m 2] ....
RYY [0]
RYY [m 1] ....
RYY [1]
RYY [ m 1]
RYY [1]
RYY [0]
RYY [m]
hm +1[1]
RYY [1]
h [2]
R [2]
m +1
YY
.
= .
hm +1[m]
RYY [m]
hm +1[m + 1]
RYY [m + 1]
.
RYY [ m 1] RYY [ m 2 ] ....
RYY [ m 1]
RYY [ m 2 ]
RYY [0]
hm +1 [1]
h [2]
m +1
.
+ h m +1 [m + 1]
.
hm +1 [ m ]
RYY [m]
R [m  1]
YY
=
.
RYY [1]
RYY [1]
R [2]
YY
(3)
.
RYY [ m ]
and
m
h
i =1
m +1
(4)
86
1
From equation (3) premultiplying by R YY
, we get
hm +1 [1]
h [2]
m +1
.
+ h m +1 [m + 1]R YY
.
hm +1 [ m ]
RYY [m]
R [m  1]
YY
= R YY
.
RYY [1]
RYY [1]
R [2]
YY
RYY [ m ]
hm [1])
hm+1 [1]
hm [m]
h [2]
h [2]
h [m 1]
m
m+1
m
+ hm +1 [m + 1]
=
.
hm+1 [m]
hm [1]
hm [m ]
The equations can be rewritten as
hm +1 [i ] = hm [i ] + k m +1hm [m + 1 i ] i = 1,2,...m
(5)
i =1
{h
i =1
i =1
i =1
k m +1 =
RYY [m + 1] + hm [i ]RYY [m + 1 i ]
i =1
[ m]
m
h
i =0
[i ]RYY [m + 1 i ]
[ m]
where
m
87
[ m + 1] = [ m ] 1 k m +1 2
k m2 1
km 1
If km < 1, the LPC error filter will be minimumphase, and hence the corresponding
syntheses filter will be stable.
km represents the direct correlation of the data y[n m] on y[n] when the
correlation due to the intermediate data y[n m + 1], y[n m + 2],..., y[n 1] is
removed. It is defined by
km =
Eemf [n ]emf [n ]
R yy (0)
where
n
and
n
km =
h
i =0
m 1
[i ]RYY [m i ]
[m 1]
88
hm [i ] = hm 1 [i ] + k m hm 1 [m i ] , i = 1,2..., m  1
hm [m ] = k m
m = m 1 (1 k m2 )
Go on computing up to given final value of m.
Some salient points
The reflection parameters and the meansquare error completely determine the LPC
coefficients. Alternately, given the reflection coefficients and the final meansquare
prediction error, we can determine the LPC coefficients.
The algorithm is order recursive. By solving for mth order linear prediction problem
we get all previous order solutions
Then,
m
(1)
i =1
m 1
m 1
m 1
= emf 1 [n] + k m em 1 [n 1]
emf 1[n]
emf [n]
+
km
km
b
m 1
e [ n]
emb [n]
We have
e0f = y[ n] 0
and
e0b = y[ n] 0
Hence
+
e 0f = eb0 = y[ n ]
k1
k1
y[n]
z1
Modular structure can be extended by first cascading another section. New stages
can be added without modifying the earlier stages.
Same elements are used in each stage. So efficient for VLSI implementation.
=> It follows from the fact that for W.S.S. signal, em [n] sequences as a function of m are
uncorrelated.
0 i < m
90
=0
for 0 k < m
(i)
and
km =
E emf 1 [n]emb 1 [n 1]
E e f m 1 [ n]
(ii)
Proof:
(
= E (e
= E e mf [ n]
f
m 1
[ n] + k m e mb 1 [ n 1]
Example 2:
Consider the random signal model y[n] = x[n] + v[n], where v[n] is a 0mean white noise
with variance 1 and uncorrelated with x[n] and x[n] = 0.8x[n  1] + w[n] , w[n] is a 0mean random variable with variance 0.68
a) Find the second order linear predictor for y[n ]
b) Obtain the lattice structure for the prediction error filter
c) Use the above structure to design a secondorder FIR Wiener filter to estimate
91
Assume stationarity within certain data length. Buffering of data is required and
may work in some applications.
One solution is adaptive filtering. Here the filter coefficients are updated as a
function of the filtering error. The basic filter structure is as shown in Fig. 1.
y[n ]
x[n ]
Filter Structure
Adaptive algorithm
e[n]
x[ n ]
The filter structure is FIR of known taplength, because the adaptation algorithm updates
each filter coefficient individually.
92
h[n ] = 1
hM 1[n ]
We want to find the filter coefficients so as to minimize the meansquare error Ee 2 [n]
where
= x[n]  hi [n ] y[n i ]
i =0
y[n 1]
where y[n ] =
y[n M + 1]
Therefore
where rXY
R XY [0]
R [1]
= XY
R XY [ M 1]
and
R YY
RYY [1 M ]
RYY [2 M ]
RYY [0]
93
( Ee 2 [n ] )
where
Eeh [ n ]
0
.........
Ee 2 [n] = .........
........
Ee2 [ n ]
hM 1
= 2rXY + 2R YY h[n]
2
R YY = QQ
where Q is the orthogonal matrix of the eigenvectors of R YY .
is a diagonal matrix with the corresponding eigen values as the diagonal elements.
Also I = QQ = Q Q
Therefore
Then
h[n + 1] = (I )h[n ] + rXY
1 1 0
0
0 .....
...
.0
h[n ] + rxy
1 M
i = 0,1,...M 1
and can be easily solved for stability. The stability condition is given by
95
1 i < 1
1 < 1 i < 1
0 < < 2 / i , i = 1,.......M
Note that all the eigen values of R YY are positive.
Let max be the maximum eigen value. Then,
2
Trace(R yy )
2
M.R YY [0]
if the step size is within the range of specified by the above relation.]
i = 0,1,...M 1
Thus the rate of convergence depends on the statistics of data and is related to the eigen
value spread for the autocorrelation matrix. This rate is expressed using the condition
number of R YY , defined as k =
max
where max and min are respectively the maximum
min
and the minimum eigen values of R YY . The fastest convergence of this system occurs
when k = 1, corresponding to white noise.
Ee 2 [n]
2
Where
96
Eeh [ n ]
0
.........
2
Ee [n] = .........
........
Ee 2 [ n ]
hM 1
2
2
Ee [n] 2.e[n]..........
...........
e[n]
hM 1
Now consider
M 1
e[n].
= y[n j ], j = 0,1,.......M 1
h j
y[n]
e[n]
y[n 1]
0
................
.........
= y[n]
=
................
..........
..............
e[n]
hM 1
y[n M + 1]
Ee 2 [n ] = 2e[n ]y[n ]
The steepest descent update now becomes
h[n + 1] = h[n] + e[n ]y[n]
This modification is due to Widrow and Hopf and the corresponding adaptive filter is
known as the LMS filter.
Hence the LMS algorithm is as follows
Given the input signal y[n] , reference signal x[n] and step size
1. Initialization hi [0] = 0, i = 0,1,2.....M 1
2 For n > 0
97
Filter output
x[n ] = h[n ]y[n ]
98
y[n ]
FIR filter
hi [n], i = 0,1,..M 1
x[n ]
x[n ]
e[n ]
LMS algorithm
The LMS algorithm is convergent in the mean if the step size parameter
satisfies the condition.
0< <
max
Proof:
h[n + 1] = h[n] + e[n ]y[n]
In the practical situation, knowledge of max is not available and Trace R yy can be
taken as the conservative estimate of max so that for convergence
2
Trace(R YY )
0< <
filter.
99
Generally, a too small value of results in slower convergence where as big values of
will result in larger fluctuations from the mean. Choosing a proper value of is very
important for the performance of the LMS algorithm.
In addition, the rate of convergence depends on the statistics of data and is related to the
eigenvalue spread for the autocorrelation matrix. This is defined using the condition
number of R YY , defined as k =
max
where min is the minimum eigenvalue of R YY . The
min
fastest convergence of this system occurs when k = 1, corresponding to white noise. This
states that the fastest way to train a LMS adaptive system is to use white noise as the
training input. As the noise becomes more and more colored, the speed of the training
will decrease.
The average of each filter tap weight converges to the corresponding optimal filter tapweight. But this does not ensure that the coefficients converge to the optimal values.
We have seen that the mean of LMS coefficient converges to the steepest descent
solution. But this does not guarantee that the mean square error of the LMS estimator will
converge to the mean square error corresponding to the wiener solution. There is a
fluctuation of the LMS coefficient from the wiener filter coefficient.
Let h opt = optimal wiener filter impulse response.
The instantaneous deviation of the LMS coefficient from h opt is
h[ n] = h[ n] h opt
An exact analysis of the excess meansquare error is quite complicated and its
approximate value is given by
i
i =1 2 i
= min
M
i
1
i =1 2 i
M
excess
The LMS algorithm is said to converge in the meansquare sense provided the steplength
parameter satisfies the relations
i
<1
2 i
i =1
and
If
0< <
2
i =1
max
<< 1
i
i
2 i
Further, if
<< 1
excess
The factor
1
Trace(R YY )
= min 2
1 0
1
min Trace(R YY )
2
excess M i
is called the misadjustment factor for the LMS filter.
=
min i =1 2 i
excess 1
Trace(R YY )
min
2
is large unless is much smaller. Thus the selection of the stepsize parameter is crucial
in the case of the LMS algorithm.
When the input signal is nonstationary the eigenvalues also change with time and
selection of becomes more difficult.
101
Example 1:
x[ n]
y[ n ]
Chan
nel
Equqlizer
Noise
RXX [m] =
W 2
1 0.8
RXX [0] = 2.78
(0.8)m
h0 1.33
h = 1.07
1
RXY [1] h1RXY [2]
102
h0 = 0.51
h1 = 0.35
1 , 2 = 2.79,1.72
2
2.79
= 0.72
Excess mean square error
2i
= mn
2 i
I 1
i
I 1 2 2 i
2
where h[n] is the modulus of the LMS weight vector and is a positive quantity.
The corresponding algorithm is given by
h[ n + 1] = (1 )h[ n] + e[ n]y[ n]
where is chosen to be less than 1. In such a situation the pole will be inside the unit
circle, instability problem will not be there and the algorithm will converge.
0< <
MAX
0< <
=
=
2
MR YY [0]
2
ME (Y 2 [n ])
0< <
2
M 1
y 2 [n i]
2
y[n]
i =0
103
2
y[n]
h[n + 1] = h[n] +
y[n]
e[n]y[n]
Notice that the NLMS algorithm does not change the direction of updation in steepest
descent algorithm
2
If y[ n] is close to zero, the denominator term ( y[n] ) in NLMS equation becomes very
small and
h[n + 1] = h[n] +
1
y[n]
e[n]y[n]
may diverge
To overcome this drawback a small positive number is added to the denominator term
the NLMS equation. Thus
h[n + 1] = h[n] +
+ y[n]
e[n]y[n]
For computational efficiency, other modifications are suggested to the LMS algorithm.
Some of the modified algorithms are blockedLMS algorithm, signed LMS algorithm etc.
LMS algorithm can be obtained for IIR filter to adaptively update the parameters of the
filter
M 1
N 1
i =1
i =0
How ever, IIR LMS algorithm has poor performance compared to FIR LMS filter.
104
The RLS algorithm considers all the available data for determining the filter parameters.
The filter should be optimum with respect to all the available data in certain sense.
Minimizes the cost function
n
[n ] =
nk 2
e [k ]
k =0
h0 [n ]
with respect to the filter parameter vector h[n ] = h1 [n ]
hM 1 [n ]
where is the weighing factor known as the forgetting factor
k =0
nk
( x[k ] y[k ] y[k ] y [k ] h[n]) = 0
=> 2
k =0
1 n
n k
n k
k =0
[n] =
Let us define R
YY
nk
k =0
y[k ]y [ k ]
k =0
105
Similarly rXY [n ] =
n k
k =0
XY [n ] 1 rXY [n ]
Hence h[n ] = R
Matrix inversion is involved which makes the direct solution difficult. We look forward
for a recursive solution.
YY [n ] = n k y[k ] y [k ] + y[n ] y [n ]
R
k =0
n 1
= n 1k y[k ] y [k ] + y[n ] y [n ]
k =0
YY [n 1] + y[n ] y [n ]
= R
This shows that the autocorrelation matrix can be recursively computed from its previous
values and the present data vector.
Similarly rXY [n ] = rXY [n 1] + x[n ]y[n ]
= (R
[n ] 1 r [n ]
h[n ] = R
YY
XY
YY
For the matrix inversion above the matrix inversion lemma will be useful.
Taking
we will have
1 1
1 1
1
1
YY
yy [n ]1 = 1 R
[n 1] R
[n 1]y[n ] y [n ] R
R
YY [ n 1] y[ n ] + 1 y [ n ]R YY [ n 1]
YY
1
1 [n 1]y[n ]y[n]R
YY
[n 1]
1 1
R
R YY [n 1] YY
1
+ y [n ]R YY [n 1]y[n ]
1
YY
Rename P[n ] = R
[n ]. Then
106
P[n] =
P[n 1]y[n]
+ y [n]P[n 1]y[n]
important to interpret adaptation is also related to the current data vector y[n]
by
k[ n] = P[ n]y[ n]
Multiplying by
YY [n ] 1 rXY [n ]
h[n ] = R
= P[n ](rXY [n 1] + x[n ]y[n ])
= P[n ]rXY [n 1] + x[n ]P[n ]y[n ]
= 1 [P[n 1] k[n ]y [n ]P[n 1]]rXY [n 1] + x[n ]P[n ]y[n ]
= h[n 1] k[n ] y [n ] h[n 1] + x[n ] k[n ]
= h[n 1] + k[n ]( x[n ] y [n ]h[n 1])
At n = 0
P[0] = I MxM , a postive number
y[0] = 0, h[0] = 0
Choose
Operation:
For 1 to n = Final do
1.
Get x[ n], y[ n]
2.
3.
P[n 1]y[n]
+ y [n]P[n 1]y[n]
107
4.
end do
[n]
R
YY
=
n +1
nk
y[k ]y [k ]
k =0
n +1
[n]
R
YY
,
n +1
lim
lim
108
Corresponding to
1 [0] = I
R
YY
we have
[0] = I
R
YY
With this initial condition the matrix difference equation has the solution
n
n +1
[1] + R
[ n]
R
YY
YY
= n +1
[ n]
+R
YY
~
YY [n ]) h
+R
[n ] = r XY [n ]
~
where h[ n] is the modified solution due to assumed initial value of the Pmatrix.
1
n +1R YY
[n ]h[n ] ~
+ h[n ] = h[n ]
If we take as less than 1,then the bias term in the lefthand side of the above equation
will be eventually die down and we will get
~
h[ n] = h[ n]
The filter coefficients converge in the mean to the corresponding Wiener filter
coefficients.
Unlike the LMS filter which converges in the mean at infinite iterations, the
RLS filter converges in a finite time. Convergence is less sensitive to eigen
value spread. This is a remarkable feature of the RLS algorithm.
The RLS filter can also be shown to converge to the Wiener filter in the
meansquare sense so that there is zero excess meansquare error.
109
110
x[n]
y[n]
+
Linear filter
x[n ]
FIR Wiener Filter is optimum when the data length and the filter length are equal.
IIR Wiener Filter is based on the assumption that infinite length of data sequence
is available.
Neither of the above filters represents the physical situation. We need a filter that adds a
tap with each addition of data.
The basic mechanism in Kalman filter is to estimate the signal recursively by the
following relation
(1)
And the observations can be represented as a linear combination of the states and the
observation noise.
y[n] = cx[n] + v[ n]
(2)
111
Equations (1) and (2) have direct relation with the state space model in the control system
where you have to estimate the unobservable states of the system through an observer
that performs well against noise.
Example 1:
where
x1 [n]
x [ n]
, x [n] = x[n], x [ n] = x[ n 1].... and x [n] = x[n M + 1],
x[n] = 2
M
1
2
xM [n]
a1 a2 .. ..aM
1 0 .. .. 0
A=
0 1 .. .. 0
0 0 .. .. 1
1
0
and b =
..
0
Our analysis will include only the simple (scalar) Kalman filter
The
does by the IIR Wiener filter. The innovation representation is shown in the following
diagram.
y[0], y[1]
, y[ n] Orthogonalisation
~
y [0], ~
y [1]
,~
y [ n]
, y[ n].
112
~
In the above representation y [n] is the innovation of y[n] and contains the same
x[n] = k i ~
y [ n]
i =0
j = 0,1..n
so that
k j = Ex[n] y[ j ] / j 2
j = 0,1..n
113
Similarly
x[n 1] = x[n 1] + e[n 1]
n 1
= kiy[i ] + e[n 1]
i =0
k j = Ex[n 1] y[ j ] / 2j
j = 0,1..n 1
= E ( x[n] w[n]) y[ j ] / a 2j
j = 0,1..n 1
= E ( x[n]) y[ j ] / a 2j
j = 0,1..n 1
k j = k j / a
j = 0,1..n 1
Again,
n 1
x[n ] = k i ~
y [i ] = k i ~
y [i ] + k n ~
y [n ]
i =0
i =0
n 1
= a k i~
y [i ] + k n ~
y [n ]
i =0
Or
x[n] = ax[ n 1] + kn ( y[n] ax[ n 1])
y[n]
kn
x[n]
z 1
114
2 [ n]
V2
2
We have to estimate [n] at every value of n.
How to do it?
Consider
2 [n] = Ex[n]e[n]
= Ex[n]( x[n] (1 kn )ax[n 1] kn y[n])
= X2 (1 kn )aEx[n]x[n 1] kn Ex[n] y[n]
= (1 kn ) X2 (1 kn )aE (ax[n 1] + w[n]) x[n 1]
= (1 kn ) X2 (1 kn )a 2 Ex[n 1]x[n 1]
Again
2 [n 1] = Ex[n 1]e[n 1]
= Ex[n 1]( x[n 1] x[n 1])
= X2 Ex[n 1]x[n 1]
Therefore,
Ex[n 1]x[n 1] = X2 2 [n 1]
115
2 [ n] =
Hence
W2 + a 2 2 [n 1]
V2
W2 + V2 + a 2 2 [n 1]
2
2
2
where we have substituted W = (1 a ) X
We have still to find [0]. For this assume x[ 1] = x[ 1] = 0. Hence from the relation
[ n] = (1 kn ) X 2 (1 kn ) a 2 Ex[ n 1] x[ n 1]
we get
2 [0] = (1 k0 ) X 2
Substituting we get
k0 =
2 [0]
V2
2
in the expression for [ n]
2 [0] =
We get
X 2 V 2
X 2 + V 2
Initialisation x[ 1] = 0
Step 1.
n = 0. Calculate
Step 2. Calculate
kn =
2[0] =
X2 V2
X2 + V2
2 [ n]
V2
n = n + 1.
2 [ n] =
Step 5.
Step 6.
W2 + a 2 2 [n 1]
2
W2 + V2 + a 2 2 [n 1] V
Go to Step 2
116
Example 2:
Given
x[n] = 0.6 x[n 1] + w[n]
y[n] = x[n] + v[n]
n0
n0
W2 = 0.25, V2 = 0.5
Find the expression for the Kalman filter equations at convergence and the corresponding
mean square error.
2 [ n] =
Using
W2 + a 2 2 [n 1]
2
W2 + V2 + a 2 2 [n 1] V
0.25 + 0.62 2
0.5
=
0.25 + 0.5 + 0.6 2
We get
2
2 =
0.320
kn = 2 = 0.390
2
We have to initialize [0].
117
SECTION IV
SPECTRAL ESTIMATION
118
Spectral analysis is s very old problem: Started with the Fourier Series (1807) to solve
the wave equation.
Consider the definition of the power spectrum of a random sequence {x[ n], < n < }
S XX ( w) =
m =
j 2 wm
RXX [m]e
The power spectral density is the discrete Fourier Transform (DFT) of the
autocorrelation sequence
But we have to use only finite data. This is not only for our inability to handle
infinite data, but also for the fact that the assumption of stationarity is valid
only for a sort duration. For example, the speech signal is stationary for 20 to
80ms.
Spectral analysis is a preliminary tool it says that the particular frequency content may
be present in the observed signal. Final decision is to be made by Hypothesis Testing.
119
N 1 m
x[n]x[n + m]
(biased)
n =0
1
Nm
N 1 m
x[n]x[n + m]
(unbiased)
n =0
Note that
N 1 m
1
E R XX [m] = E x[n]x[n + m]
n=0
N
Nm
=
RXX [m]
N
m
RXX [m]
= RXX [m]
N
N m
=
RXX [m]
N
of R XX [m ] will
R XX [m ] is
1
N
(R
n =
XX
[ n ]R XX [ n + m 2 m1 ] + R XX [ n m1 ]R XX [ n + m 2 ])
This means that the estimated autocorrelation values are highly correlated.
The variance of R XX [m ] is obtained from above as
1
v a r ( R X X [ m ]
N
(R
n =
2
XX
[n ] + R XX [ n m ]R XX [ n + m ])
Note that the variance of R XX [m ] is large for large lag m, especially as m approaches
N.
120
2
Also as N , var R XX [m] 0 provided R XX [n ] < .
n =
N 1
1
N
x[n]e
2
jnw
,  w
n =0
p
S XX
( w) gives the power output of band pass filters of impulse response
hi [n] =
e wi n rect (
n
)
N
N 1
m = ( N 1)
1
where R XX [m] =
N
R XX [m]e jwm
N 1m
x[n]x[n + m]
n=0
p
S XX
( w) =
x[n]e jwn
N n=0
1
=
N
N 1
x[n]e
N 1
jwn
x[n]e
n =0
2
jwn
n=0
121
So
p
S XX
( w) =
p
E S XX
( w) =
N 1
m = ( N 1)
N 1
m = ( N 1)
R XX [m ]e jwm
E R XX [m ] e jwm
jwm
1 RXX [m ]e
N
m = ( N 1)
N 1
as N the right hand side approaches true power spectral density S XX ( f ). Thus the
periodogram is an asymptotically unbiased estimator for the power spectral density.
To prove consistency of the periodogram is a difficult problem.
We consider the simple case when a sequence of Gaussian white noise in the following
example.
Let us examine the periodogram only at the DFT frequencies wk =
2k
, k = 0,1,... N 1.
N
Example 1:
The periodogram of a zeromean white Gaussian sequence x[n ] , n = 0 , . . . , N1.
The power spectral density is given by
S XX ( w) =
x2
2
 < w
1 N 1
p
S XX
( w) = x[n]e jwn
N n=0
N 1
x[n]e
2
2
kn
N
n =0
N 1
x[n]e
2
jwK n
n =0
N 1 1
N 1 x[n ]sin wK n
x[n ]cos wK n +
=
N
n =0 N
n =0
2
2
= C X ( w K ) + S X ( wK )
where
2k
, k = 0,1,... N 1.
N
p
C X2 ( w K ) and S X2 ( wK ) are the consine and sine parts of S XX
( w).
122
Let us consider
1 N 1
x[n] cos wK n
N n =0
which is a linear combination of a Gaussian process.
C X2 ( w K ) =
Clearly E C X2 ( w K ) = 0
var C X2 ( w K ) =
1 N 1
E x[n ]cos wk n
N n =0
1 N 1 2
Ex [n ]cos2 wK n + E (Cross terms)
=
N n =0
Assume the sequence x[n ] to be independent.
Therefore,
var ( CK (w K ) ) = X
N
2 N 1
cos
n =0
wK n = X
N
1 + cos 2 wK n
n =0
2 N 1
X2 N
sin NwK
+ cos ( N 1) wK
N 2
2sin wK
w
1
= X2 + cos ( N 1) wK sin N K
2 N sin wK
2
wK =
For
2
k
N
sin NwK
=0
2sin wK
var ( CK (w K ) )
k 0, k
=
X2
2
N
(assuming N even).
2
for k =/ 0
sin NwK
= 1 for
N sin wK
Again
k = 0.
var ( CK (w K ) ) = X2 for k = 0, k =
N
2
N1
x[n]sin w
n =0
X2
for k =/ 0.
2
Therefore for k = 0,
Distribution.
CK [k ] ~ N (0, X2 )
S K [k ] = 0.
123
for k =/ 0
2
CK ~ N 0, X
2
2
S K ~ N 0, X
2
EY = E ( X 12 + X 22 + . . . + X N2 ) = N X2
and variance 2 N X4 .
S XX [k ] = C X [k ] + S X2 [k ].
It is a 22
distribution.
E S XX [k ] =
+ X = X2 = S XX [k ]
2
2
S XX [k ] is unbiased
2
X
2
2
var (S XX [k ]) = 2 2 X = S XX
[k ] which is independent of N
2
S XX [0] is unbiased
2
and var S XX [0] = X4 = S XX
[0].
It can be shown that for the Gaussian independent white noise sequence at any
frequency w,
p
2
var (S XX
( w)) = S XX
( w)
124
p
SXX
( w)
N 1
R XX [m]e jwm
m = ( N 1)
1
where R XX [m] =
N
=
N 1m
n =0
N 1
 m 
[m]e jwm
1
RXX
N
m = ( N 1)
N1
m = (N1)
[m]e jwm
wB [m]R XX
{
}
( w) = W ( w) FT { E R [m]}
p
'
( w) = WB ( w) FT R XX
[ m]
S XX
So,
p
E S XX
'
XX
= WB ( w) S XX ( w)
( w ) S XX ( )d
As N ,
p
E S XX
(w)
S XX ( w )
p
( w) cannot be found out exactly (no analytical tool). But an approximate
Now var S XX
p
p
( w1 ), S XX
( w2 )
Cov S XX
2
2
N ( w1 + w2 )
N ( w1 w2 )
sin
sin
2
2
~ S XX ( w1 ) S XX ( w2 )
+
N sin( w1 + w2 ) N sin( w1 w2 )
2
2
125
sin Nw 2
2
p
var S XX ( w) ~ S XX ( w) 1 +
N sin w
p
2
var S XX
( w) ~ 2S XX
( w)
2
S XX
( w)
Consider w1 = 2
for w = 0 ,
k1
k
and f 2 = 2 2 , k1 , k2 integers.
N
N
Then
p
p
Cov S XX
( w2 )
( w1 ), S XX
This means that there will be no correlation between two neighboring spectral estimates.
Therefore periodogram is not a reliable estimator for the power spectrum for the
following two reasons:
p
( w) does
(1) The periodogram is not a consistent estimator in the sense that var S XX
Multiply data with a more suitable window rather than rectangular before finding the
perodogram. The resulting periodogram will have less bias.
N ,
p
E S XX
( w)
S XX ( w)
p
( w) to get a consistent estimator for S XX ( f ).
We have to modify S XX
126
L 1
1
(k )
S XX ( w) =
L
x[n]e
jwn
n =0
1 K 1 ( k )
( av )
Then S XX ( w) = S XX ( w).
K m =0
As shown earlier,
(k )
ES XX
( w) =
WB ( w )S XX ( )d
m
1
,  m  M 1
where wB [m ] =
L
0, otherwise
wL
sin 2
WB ( w) =
w
sin
2
(k )
S XX ( w) =
L 1
m = ( L 1)
1
av
ES XX
( w) =
K
av
ES XX
( w) =
K 1
R XX [m ]e jwm
ES
k =0
(k )
XX
(k )
( w) = E {S XX
( w)}
( w ) S XX ( )d .
To find the mean of the averaged periodogram, the true spectrum is now convolved with
the frequency WB ( f ) of the Barlett window. The effect of reducing the length of the data
from N points to L = N / K results in a window whose spectral width has been increased
by a factor K consequently the frequency resolution has been reduced by a factor K.
Original
WB (w)
Modified
127
Simplification :
Assume the K data segments are independent.
K 1 ( k )
av
Then var(S XX
( w)) = var S XX
( w)
k =0
K 1
1
=
var S XX( m) (w)
K 2 k =0
sin wL 2 2
1
S ( w)
~ 2 K 1 +
L sin w XX
K
1
original variance of the periodogram.
K
So variance will be reduced by a factor less than K because in practical situations, data
segments will not be independent.
av
av
( w) will be a consistent
For large L and large K , var(S XX
( w)) will tend to zero and S XX
estimator.
(2)
(mod)
XX
1
( w) =
UL
L 1
x[n]w[n]e
jwn
n =0
1 L 1 2
w [ n]
L n=0
The window w[n] need not be an is a even function and is used to control spectral leakage.
=
L 1
x[n]w[n]e
jwn
n=0
= 0 otherwise
1
(Welch )
(3) Compute S XX
( w) =
K
K 1
(mod)
XX
( w).
k =0
128
order autocorrelation.
M 1
m = ( M 1)
Issues concerned :
1. How to select w[m]?
There are a large numbers of windows available. Use a window with small sidelobes.
This will reduce bias and improve resolution.
2. How to select M. Normally
N
or M ~ 2 N (Mostly based on experience)
5
if N is large 10,000
N ~ 1000
S BT ( w) = convolution of S p ( w) and W ( w), the F.T. of the window sequence.
M ~
XX
XX
= Smoothing the periodogram, thus decreasing the variance in the estimate at the
expense of reducing the resolution.
129
BT
p
E ( SXX
( w)) = E SXX
( w) * W ( w)
p
where E SXX
( w) = SXX ( )WB ( w ) d
M 1
m = ( M 1)
M 1
m = ( M 1)
BT
SXX
( w) can be proved to be asymptotically unbiased.
2
M 1
S ( w)
BT
and variance of S XX
( w)~ XX
w2 [k ]
N K = ( M 1)
Some of the popular windows are rectangular window, Bartlett window, Hamming
window, Hanning window, etc.
Procedure
1. Given the data x[n], n = 0,1.., N 1
p
(2 k / N ) =
2. Find periodogram. S XX
1
N
N 1
x[n]e
j 2 k / N
n =0
i =1
i =0
x [n] = ai x[ n i ] + bi v[ n i ]
130
v[n]
x[ n]
H(z)
q
H ( z) =
B( z )
=
A( z )
b
i =0
z i
1 a i z i
i =1
and
2
S XX ( w) = H ( w) V2
Find the power spectrum by substituting the values of the model parameters in
expression of power spectrum of the model
V2
P
1 ai e jwi
i =1
Figure an AR spectrum
131
....................
..............
R X [1]
a1
a =
RX [2]
2
..
..
R [ P + 1]
aP
X
V2 = R X [0] ai R XX [i]
i =1
[ n] = e 2 [ n]
n= p
....................
..............
R [1, p] R [2, p] ..... R [ p, p]
XX
XX
XX
R X [0,1]
a1
=
a
[0, 2]
R
2
X
..
..
R [0, P + 1]
aP
X
V2 = R X [0, 0] ai R XX [0, i ]
i =1
where
R XX [k , l ] =
1 N
x[n k ]x[n l ]
N p n= p
132
x[ n], n = 0,1,.., N
R XX [m], m = 0,1,.. p
Select an order p
ai , i = 1, 2,.. p
and
2
Solve for V
S XX ( w) =
Find
V2
P
1 + ai e
2
jwi
i =1
S XX ( w)
Some questions :
Can an
AR( p)
If we use
A R (1)
A R (2)
A R (2)
MSE will give some guidance regarding the selected order is proper or not.
For spectral estimation, some criterion function with respect to the order
parameter p are to be minimized. For example,
133
FPE(p)
2
P
where N = No of data
P2 = mean square prediction error (variance for non zero mean case)
2p
N
134
135