You are on page 1of 22

Math for CS Lecture 4 1

Linear Least Squares Problem


Consider an equation for a stretched beam:
Y = x
1
+ x
2
T
Where x
1
is the original length, T is the force applied and x
2
is the inverse coefficient of stiffness.
Suppose that the following measurements where taken:
T 10 15 20
Y 11.60 11.85 12.25
Corresponding to the overcomplete system:
11.60 = x1 + x2 10
11.85 = x1 + x2 15 - can not be satisfied exactly
12.25 = x1 + x2 20
Math for CS Lecture 4 2
Linear Least Squares Problem
Problem:
Given A(m x n), mn, b(m x 1) find x(n x 1) to minimize
||Ax-b||
2
.
If m > n, we have more equations than the number of
unknowns, there is generally no x satisfying Ax=b
exactly.
This is an overcomplete system.
Math for CS Lecture 4 3
Linear Least Squares
There are three different algorithms for computing the least
square minimum.
1. Normal Equations (Cheap, less Accurate).
2. QR decomposition.
3. SVD (expensive, more reliable).
The first algorithm in the fastest and the least accurate among the
three. On the other hand SVD is the slowest and most accurate.
Math for CS Lecture 4 4
1ZC LC O@LCCL LLLCC L L1 LC COLLC
NCLLL.
L ZC VC LC^C LC LCNCLNC VL COCLL LL ? CL
OCL L LL ZCL.
NL CLLLCO LL C _ ? _ CC O_OLC LLL_
^LV CO 1V1f _\f1V1.
Normal Equations 1
Math for CS Lecture 4 5
Normal Equations 2
11.60 = x1 + x2 10
11.85 = x1 + x2 15
12.25 = x1 + x2 20
A x b
min||Ax-b||
2
Math for CS Lecture 4 6
Normal Equations 3

j
j
j
_
j

20 1
15 1
10 1
A

j
_
j
*
725 45
45 3
A A
T

j
_
j

20 15 10
1 1 1
T
A

j
j
j
_
j

25 . 12
85 . 11
60 . 11
B

j
_
j
* * *

650 . 0
925 . 10
) (
1
b A A A x
T T
We must solve the system
For the following values
(A
T
A)
-1
A
T
is called a Pseudo-inverse
Math for CS Lecture 4 7
QR factorization 1
A matrix Q is said to be orthogonal if its columns are
orthonormal, i.e. Q
T
Q=I.
Orthogonal transformations preserve the Euclidean norm
since
Orthogonal matrices can transform vectors in various
ways, such as rotation or reflections but they do not
change the Euclidean length of the vector. Hence, they
preserve the solution to a linear least squares problem.
Math for CS Lecture 4 8
QR factorization 2
Any matrix A
(mn)
can be represented as
A = QR
,where Q
(mn)
is orthonormal and R
(nn)
is upper triangular:
_ _ _ _

j
j
j
j
_
j
*
nn
n
n
n
r
r
r r r
q q a a
0 0 0
0 0
0
| ... | | ... |
11
1 12 11
1
1
! !
! !
!
Math for CS Lecture 4 9
QR factorization 2
Given A , let its QR decomposition be given as A=QR, where
Q is an (m x n) orthonormal matrix and R is upper triangular.
QR factorization transform the linear least square problem into a
triangular least squares.
QRx = b
Rx = Q
T
b
x=R
-1
Q
T
b
Matlab Code:
Math for CS Lecture 4 10
Singular Value Decomposition
Normal equations and QR decomposition only work
for fully-ranked matrices (i.e. rank( A) = n). If A is
rank-deficient, that there are infinite number of
solutions to the least squares problems and we can
use algorithms based on SVD's.
Given the SVD:
U
(m x m)
, V
(n x n)
are orthogonal
is an (m x n) diagonal matrix (singular values of A)
The minimal solution corresponds to:
Math for CS Lecture 4 11
Singular Value Decomposition
Matlab Code:
Math for CS Lecture 4 12
Lc cLLc LNLV * `
( )
T T
T T
T
U U AA
V V A A
v u V U A
I V V
I U U
V U
A A
2
2
1
2 1
T
T
0 ), ,..., , diag(
such that , , exist there
, ) rank( , every for : Fact
=
=

= =
> =
=
=
=

p
i
T
i i i
i p
p p p n p m
n m
p


Math for CS Lecture 4 13
fggL?cL L_ c LV*c^ c?
( )
( )
{ } ( )
1
2 2
2
) ( rank :
2
2 1
1
2 1
1
~ ~
min
: in ion to approximat rank best the Is
), ,..., , diag(
~
,
~

~
Then
... , of SVD the be let
. rank have let : II Fact
+
=
=
=

= = =

< = = =
= =

r
r
r
r
i
T
i i i
p
p
i
T
i i i
n m
r
p r
p



T
X X
T
T
V U A A X A
A
V U v u A
A v u V U A
A
Math for CS Lecture 4 14
XLLLL LgLcL L1 L
`
[ g
x A b
n m
*
7
The image of the unit sphere under any m
x
n matrix is a hyperellipse
v
1
v
2
v
2
v
1
Math for CS Lecture 4 15
L1 cL OLc NLLLO
[ g
S A AS
n m
*
7
We can define the properties of A in terms of the shape of AS
v
1
v
2
u
2
u
1
S AS
Singular values of A are the lengths of principal axes of AS, usually
written in non-increasing order 1 2 n
n left singular vectors of A are the unit vectors {u
1
,, u
n
}, oriented in the
directions of the principal semiaxes of AS numbered in correspondance with {
i
}
n right singular vectors of A are the unit vectors {v
1
,, v
n
}, of S, which are the
preimages of the principal semiaxes of AS: Av
i
=
i
u
i
Math for CS Lecture 4 16
Lc `cLL LLLgLOL
Av
i
=
i
u
i
,
1 i n
) , (
2
1
) , (
2 1
) , (
2 1
) , (
n n
n
n m
n
n n
n
n m
u u u A

j
j
j
j
_
j

j
j
j
_
j

j
j
j
_
j

j
j
j
j
j
j
_
j
V
V
V
Q Q Q
!
! !
Z U AV
Matrices U,V are orthogonal and is diagonal
- Singular Value decomposition
*
V U A Z
Math for CS Lecture 4 17
WcLLO L cLc L
Every matrix is diagonal in appropriate basis:
Any vector b
(m,1)
can be expanded in the basis of left singular vectors of A {u
i
};
Any vector x
(n,1)
can be expanded in the basis of right singular vectors of A {v
i
};
Their coordinates in these new expansions are:
Then the relation b=Ax can be expressed in terms of b and x:
[ g
x A b
n m
*
7
x V x b U b * *
* *
;

* * * *
x b x V U U x A U b U x A b Z Z *
Math for CS Lecture 4 18
c^ L1 f
Let p=min{m,n}, let r`p denote the number of nonzero
singlular values of A,
Then:
The rank of A equals to r, the number of nonzero
singular values
Proof:
The rank of a diagonal matrix equals to the number of its
nonzero entries, and in the decomposition A=U V
*
,U and V
are of full rank
Math for CS Lecture 4 19
LLc L1 f
For A
(m,m)
,
Proof:
The determinant of a product of square matrices is the
product of their determinants. The determinant of a Unitary
matrix is 1 in absolute value, since: U
*
U=I. Therefore,

m
i
i
A
1
| ) det( | V

Z Z Z
m
i
i
V U V U A
1
* *
| ) det( | | ) det( || ) det( || ) det( | | ) det( | | ) det( | V
Math for CS Lecture 4 20
For A
(m,n)
, can be written as a sum of r rank-one matrices:
(1)
Proof:
If we write as a sum of
i
, where
i
=diag(0,..,
i
,..0), then
(1)
Follows from
(2)
f LO L1 OLc NLLLO

r
j
j j j
v u A
1
*
V
*
V U A Z
Math for CS Lecture 4 21
The L
2
norm of the vector is defined as:
(1)
The L2 norm of the matrix is defined as:
Therefore
,where
i
are the eigenvalues
1L L1 L c?


n
i
i
T
x x x x
1
2
2
2
2
2
sup
x
x A
A
) max(
2
i
A O
i i i
x x A O
Math for CS Lecture 4 22
For any with 0 ` `r, define
(1)
If =p=min{m,n}, define
v+1
=0. Then
Wc? fggL?cL ` LcOO

v
j
j j j
v u A
1
*
V
Q
1
2
) (
2
inf

2
<

7
Q
Q
Q
V B A A A
B rank
C B
n m

You might also like