You are on page 1of 24

Covariance Matrix

Applications
Dimensionality Reduction
Outline
What is the covariance matrix?
Example
Properties of the covariance matrix
Spectral Decomposition
Principal Component Analysis
Covariance Matrix
Covariance matrix captures the variance
and linear correlation in
multivariate/multidimensional data.
If data is an N x D matrix, the Covariance
Matrix is a d x d square matrix
.Think of N as the number of data
instances (rows) and D the number of
attributes (columns).
Covariance Formula
Let Data = N x D matrix.

The Cov(Data)

E ( X 1 1 ) 2 ... E ( X 1 1 )( X d d )


E ( X )( X )
d d 1 1 E ( X d d ) 2

Example
COV(R)

1 2 3
1.67 1 0.5
2 4 1
R 1 2 0.33
3 1 1 0.5 0.33 0.9167

4 1 2

0.07 0.008 0.07 0.06 0.07 0.0

0.008 0.087 0.06 0.078 0.0 0.68

Moral: Covariance can only capture linear relationships


Dimensionality Reduction
If you work in data analytics it is common
these days to be handed a data set which
has lots of variables (dimensions).
The information in these variables is often
redundant there are only a few sources
of genuine information.
Question: How can be identify these
sources automatically?
Hidden Sources of Variance

X1

X2
H1

X1 X2 X3 X4
X3
D A T A
H2 D A T A
X4 D A T A
D A T A
Model: Hidden Sources are Linear Combinations of Original Variables
Hidden Sources
If the information that the known variables
provided was different then the covariance
matrix between the variables should be a
diagonal matrix i.e, the non-zero entries
only appear on the diagonal.
In particular, if Hi and Hj are independent
then E(Hi-i)(Hj-j)=0.
Hidden Sources
So the question is what should be the
hidden sources.
It turns out that the best hidden sources
are the eigenvectors of the covariance
matrix.
If A is a d x d matrix, then <, x> is an
eigenvalue-eigenvector pair if
Ax = x
Explanation
a

We have two axis, X1 and X2. We want to project the data along the direction
of maximum variance.
Covariance Matrix Properties
The Covariance matrix is symmetric.
Non-negative eigenvalues.
0 1 2 d
Corresponding eigenvectors
u1,u2,,ud
Principal Component Analysis
Also known as
Singular Value Decomposition
Latent Semantic Indexing
Technique for data reduction. Essentially
reduce the number of columns while
losing minimal information
Also think in terms of lossy compression.
Motivation
Bulk of data has a time component
For example, retail transactions, stock
prices
Data set can be organized as N x M table
N customers and the price of the calls they
made in 365 days
M << N
Objective
Compress the data matrix X into Xc, such
that
The compression ratio is high and the
average error between the original and the
compressed matrix is low
N could be in the order of millions and M in
the order of hundreds
Example database
We Thr Fri Sat Sun
7/10 7/11 7/12 7/13 7/14
ABC 1 1 1 0 0
DEF 2 2 2 0 0
GHI 1 1 1 0 0
KLM 5 5 5 0 0
smit 0 0 0 2 2
h
john 0 0 0 3 3
tom 0 0 0 1 1
Decision Support Queries
What was the amount of sales to GHI on
July 11?
Find the total sales to business customers
for the week ending July 12th?
Intuition behind SVD
y

Customer are 2-D points


SVD Definition
An N x M matrix X can be expressed as

X U V t

Lambda is a diagonal r x r matrix.


SVD Definition
More importantly X can be written as

X u v 2 u 2 v 2 r u r v r
t t t
1 1 1

Where the eigenvalues are in decreasing order.

X c u v 2u 2 v2 k u k vk
t t t
1 1 1

k,<r
Example

.18 0
.36 .58
t
0 0
.58 0
.18 0
X 9.64 .90 .58 5.29
0 0

0 0 .53 0.71
0 0 .80 0.71

0 .27
Compression

r
X i ui vi
t

i 1

k
X c i ui v t i
i 1

Where k <=r <= M

You might also like