Professional Documents
Culture Documents
Yan-Bin Jia
Dec 9, 2014
Estimation of a Constant
We start with estimation of a constant based on several noisy measurements. Suppose we have a
resistor but do not know its resistance. So we measure it several times using a cheap (and noisy)
multimeter. How do we come up with a good estimate of the resistance based on these noisy
measurements?
More formally, suppose x = (x1 , x2 , . . . , xn )T is a constant but unknown vector, and y =
(y1 , y2 , . . . , yl )T is an l-element noisy measurement vector. Our task is to find the best estimate
of x. Here we look at perhaps the simplest case where each yi is a linear combination of xj ,
x
1 j n, with addition of some measurement noise i . Thus, we are working with the following
linear system,
y = Hx + ,
where = (1 , 2 , . . . , l )T , and H is an l n matrix; or with all terms listed,
y1
H11 H1n
x1
1
.. ..
.. .. + .. .
..
. = .
.
. . .
yl
Hk1 Hkn
xn
l
, we consider the difference between the noisy measurements and the proGiven an estimate x
:
jected values H x
.
= y Hx
that minimizes the cost function
Under the least squares principle, we will try to find the value of x
J(
x) = T
)T (y H x
)
= (y H x
T Hy y T H x
+x
T HT Hx
.
= yT y x
The necessary condition for the minimum is the vanishing of the partial derivative of J with
, that is,
respect to x
J
= 2y T H + 2
xT H T H = 0.
The material is adapted from Sections 3.13.3 in Dan Simons book Optimal State Estimation [1].
(1)
(H T H)1
The inverse
exists if l > n and H is non-singular. In other words, when the number
of measurements is no fewer than the number of variables, and these measurements are linearly
independent.
Example 1. Suppose we are trying to estimate the resistance x of an unmarked resistor based on l noisy
measurements using a multimeter. In this case,
y = Hx + ,
(2)
H = (1, , 1)T .
(3)
where
Substitution of the above into equation (1) gives us the optimal estimate of x as
x
= (H T H)1 H T y
1 T
H y
=
l
y1 + + yl
=
.
l
So far we have placed equal confidence on all the measurements. Now we look at varying confidence
in the measurements. For instance, some of our measurements of an unmarked resistor were taken
with an expensive multimeter with low noise, while others were taken with a cheap multimeter by
a tired student late at night. Even though the second set of measurements is less reliable, we could
get some information about the resistance. We should never throw away measurements, no matter
how unreliable they may seem. This will be shown in the section.
We assume that each measurement yi , 1 i l, may be taken under a different condition so
that the variance i of the measurement noise may be distinct too:
E(i2 ) = i2 ,
1 i l.
Assume that the noise for each measurement has zero mean and is independent. The covariance
matrix for all measurement noise is
R = E( T )
2
1
.. . .
= .
.
0
.. .
.
l2
2l
22
21
.
+
+
+
12 22
l2
2
If a measurement yi is noisy, we care less about the discrepancy between it and the ith element
because we do not have much confidence in this measurement. The cost function J can be
of H x
expanded as follows:
)T R1 (y H x
)
J(
x) = (y H x
T H T R1 y y T R1 H x
+x
T H T R1 H x
.
= y T R1 y x
At a minimum, the partial derivative of J must vanish, yielding
J
= 2y T R1 H + 2
xT H T R1 H = 0.
x
Immediately, we solve the above equation for the best estimate of x:
= (H T R1 H)1 H T R1 y.
x
(4)
Note that the measurement noise matrix R must be non-singular for a solution to exist. In other
words, each measurement yi must be corrupted by some noise for the estimation method to work.
Example 2. We get back to the problem in Example 1 of resistance estimation, for which the equations are
given in (2) and (3). Suppose each of the l noisy measurements has variance
E(i2 ) = i2 .
The measurement noise covariance is given as
R = diag(12 , . . . , l2 ).
Substituting H, R, y into (4), we obtain the estimate
x
=
(1, . . . , 1)
l
X
1
2
i=1 i
!1
1/12
l
X
yi
2
i=1 i
..
1
1/12
..
. (1, . . . , 1)
2
1/l
1
..
y1
..
.
2
1/l
yl
Equation (4) is adequate when we have made all the measurements. More often, we obtain measurements sequentially and want to update our estimate with each new measurement. In this case,
according to (4)
the matrix H needs to be augmented. We would have to recompute the estimate x
for every new measurement. This update can become very expensive. And the overall computation
can become prohibitive as the number of measurements becomes large.
This section shows how to recursively compute the weighted least squares estimate. More
k1 after k 1 measurements, and obtain a new measpecifically, suppose we have an estimate x
surement y k . To be general, every measurement is now an m-vector with values yielded by, say,
k without solving equation (4)?
several measuring instruments. How can we update the estimate to x
3
(5)
(6)
(7)
where Tr is the trace operator1 , and the n n matrix Pk = E(k Tk ) is the estimation-error
covariance, Next, we obtain Pk with a substitution of (6):
T
(I Kk Hk )k1 Kk k (I Kk Hk )k1 Kk k
Pk = E
= (I Kk Hk )E(k1 Tk1 )(I Kk Hk )T Kk E( k Tk1 )(I Kk Hk )T
(I Kk Hk )E(k1 Tk )KkT + Kk E( k Tk )KkT .
The estimation error k1 at time k 1 is independent of the measurement noise k at time k,
which implies that
E( k Tk1 ) = E( k )E(Tk1 ) = 0,
E(k1 Tk ) = E(k1 )E( Tk ) = 0.
1
= C,
= XC + XC T .
(9)
(10)
Jk
=
Tr Pk1 Kk Hk Pk1 Pk1 HkT KkT + Kk (Hk Pk1 HkT )KkT +
Tr(Kk Rk KkT )
Kk
Kk
Kk
= 2
(Pk1 HkT KkT ) + 2Kk (Hk Pk1 HkT ) + 2Kk Rk
(by (10))
Kk
= 2Pk1 HkT + 2Kk Hk Pk1 HkT + 2Kk Rk
(by (9))
(12)
Substitute the above for Kk into equation (8) for Pk . The operation followed by an expansion leads
to a few steps of manipulation as follows:
Pk = (I Pk1 HkT Sk1 Hk )Pk1 (I Pk1 HkT Sk1 Hk )T + Pk1 HkT Sk1 Rk Sk1 Hk Pk1
= Pk1 Pk1 HkT Sk1 Hk Pk1 Pk1 HkT Sk1 Hk Pk1 +
Pk1 HkT Sk1 Hk Pk1 HkT Sk1 Hk Pk1 + Pk1 HkT Sk1 Rk Sk1 Hk Pk1
5
= Pk1 Pk1 HkT Sk1 Hk Pk1 Pk1 HkT Sk1 Hk Pk1 + Pk1 HkT Sk1 Sk Sk1 Hk Pk1
(after merging the underlined terms into Sk )
= Pk1
= Pk1
= Pk1 Kk Hk Pk1
(13)
(by (12))
= (I Kk Hk )Pk1 .
(14)
(15)
(16)
This expression is more complicated than (14) since it requires three matrix inversions. Nevertheless, it has computational advantages in certain situations in practice [1, pp.156158].
We can also derive an alternate form for the convariance Pk as follows. Start with a multiplication of the right of (11) with Pk Pk1 . Then, substitute (15) for Pk1 into the resulting expression.
Multiply the Pk1 Hk factor inside the parenthesized factor on its left, and extract HkT Rk1 out of
the parentheses. The last two parenthesized factors will cancel each other, yielding
Kk = Pk HkT Rk1 .
(17)
In the case of no prior knowledge about x, simply let P0 = I. In the case of perfect prior
knowledge, let P0 = 0.
2. Iterate the follow two steps.
(a) Obtain a new measurement y k , assuming that it is given by the equation
y k = Hk x + k ,
where the noise k has zero mean and covariance Rk . The measurement noise at each
time step k is independent. So,
0,
if i 6= j,
T
E( ) =
Rj , if i = j.
Essentially, we assume white measurement noise.
6
(18)
Pk = (I Kk Hk )Pk1 ,
(20)
(19)
1
Pk1
+ HkT Rk1 Hk
Pk HkT Rk1 ,
1
k = x
k1 + Kk (y k Hk x
k1 ).
x
Note that (19) and (20) can switch their order in one round of update.
Example 3. We revisit the resistance estimation problem presented in Examples 1 and 2. Now, we want
to iteratively improve our estimate of the resistance x. At the kth sampling, our measurement is
yk
Rk
=
=
Hk x + k = x + k ,
E(k2 ).
Here, the measurement vector Hk is a scalar 1. Furthermore, we suppose that each measurement has the
same covariance so Rk is a constant written as R.
Before the first measurement, we have some idea about the resistance x. This becomes our initial
estimate. Also, we have some uncertainty about this initial estimate, which becomes our initial covariance.
Together we have
x
0
P0
=
=
E(x),
E((x x
0 )2 ).
If we have no idea about the resistance, set P0 = . If we are certain about the resistance value, set P0 = 0.
(Of course, then there would be no need to take measurements.)
After the first measurement (k=1), we update the estimate and the error covariance according to equations (18)(20) as follows:
K1
x1
P1
P0
,
P0 + R
P0
x
0 +
(y1 x0 ),
P0 + R
P0 R
P0
P0 =
.
1
P0 + R
P0 + R
x2
=
=
P2
P1
P0
=
,
P1 + R
2P0 + R
P1
x
1 +
(y2 x
1 )
P1 + R
P0
P0 + R
x
1 +
y2 ,
2P0 + R
2P0 + R
P1 R
P0 R
=
.
P1 + R
2P0 + R
xk
Pk
P0
,
kP0 + R
(k 1)P0 + R
P0
x
k1 +
yk ,
kP0 + R
kP0 + R
P0 R
.
kP0 + R
Note that if x is known perfectly a priori, then P0 = 0, which implies that Kk = 0 and x
k = x
0 , for all
k. The optimal estimate of x is independent of any measurements that are obtained. At the opposite end of
the spectrum, if x is completely unknown a priori, then P0 = . The above equation for x
k becomes,
x
k
=
=
=
P0
(k 1)P0 + R
x
k1 +
yk
kP0 + R
kP0 + R
k1
1
xk1 + yk
k
k
1
(k 1)
xk1 + yk .
k
lim
P0
The right hand side of the last equation above is just the running average yk =
ments. To see this, we first have
k
X
j=1
yj
k1
X
1
k
Pk
j=1
yj of the measure-
yj + yk
j=1
k1
X
1
= (k 1)
yj + yk
k 1 j=1
= (k 1)
yk1 + yk .
Since x
1 = y1 , the recurrences for x
k and yk are the same. Hence x
k = yk for all k.
Example 4. Suppose that a tank contains a concentration x1 of chemical 1, and a concentration x2 of
chemical 2. We have an instrument to detect the combined concentration x1 + x2 of the two chemicals but
not able to tell the values of x1 and x2 . Chemical 2 leaks from the tank so that its concentration decreases
by 1% from one measurement to the next. The measurement equation is given as
yk = x1 + 0.99k1 x2 + k ,
where Hk = (1, 0.99k1 )T , and k is a random variable with zero mean and a variance R = 0.01.
Let the real values be x = (x1 , x2 )T = (10, 5)T . Suppose the initial estimates are x
1 = 8 and x
2 = 7 with
P0 equal to the identity matrix. We apply the recursive least squares algorithm. The next figure2 shows the
evolutions of the estimates x
1 and x2 , along with those of the variance of the estimation errors. It can be
seen that after a couple dozen measurements, the estimates are getting very close to the true values 10 and 5.
The variances of the estimation errors asymptotically approach zero. This means that we have increasingly
more confidence in the estimates with more measurements obtained.
2
Proof of Theorem 1
Proof
Denote C = (cij ), X = (xij ), and CX T = (dij ). The trace of CX T is
r
X
dtt
t=1
r X
s
X
Tr(CX ) =
=
ctk xtk .
t=1 k=1
From the above, we easily obtain its partial derivatives with respect to the entries of X:
Tr(CX T ) = cij .
xij
This establishes (9).
To prove (10), we have
Tr(XCX T ) =
X
Tr(XCY T )
+
Tr(Y CX T )
X
Y =X X
Y =X
(by (9))
=
+Y C
Tr(Y C T X T )
X
Y =X
Y =X
= Y CT
+XC
Y =X
= XC T + XC.
References
[1] D. Simon. Optimal State Estimations. John Wiley & Sons, Inc., Hoboken, New Jersey, 2006.
10