Post on 07-Jul-2020
Machine Learning for Signal Processing
Eigenfaces and Eigenrepresentations
Class 6. 16 Sep 2014
Instructor: Bhiksha Raj
16 Sep 2014 11-755/18-797 1
Administrivia
• Project teams?
• Project proposals? – Please send proposals to TA, and cc me
– Rahul Raj is no longer a TA
• Reminder: Assignment 1 due in ?? days
16 Sep 2014 11-755/18-797 2
Recall: Representing images
• The most common element in the image: background
– Or rather large regions of relatively featureless shading
– Uniform sequences of numbers
16 Sep 2014 11-755/18-797 3
Adding more bases
• Checkerboards with different variations
16 Sep 2014 11-755/18-797 4
BWPROJECTION
ageBpinvW
ageBW
Im)(
Im
] [
.
.
...Im
3213
2
1
332211
BBBBw
w
w
W
BwBwBwage
B1 B2 B3 B4 B5 B6
Getting closer at 625 bases!
“Bases”
• “Bases” are the “standard” units such that all instances can be expressed a weighted combinations of these units
• Ideal requirements: Bases must be orthogonal
• Checkerboards are one choice of bases – Orthogonal
– But not “smooth”
• Other choices of bases: Complex exponentials, Wavelets, etc..
16 Sep 2014 11-755/18-797 5
B1 B2 B3 B4 B5 B6
...332211 BwBwBwimage
Data specific bases?
• Issue: All the bases we have considered so far are data agnostic
– Checkerboards, Complex exponentials, Wavelets..
– We use the same bases regardless of the data we analyze • Image of face vs. Image of a forest
• Segment of speech vs. Seismic rumble
• How about data specific bases
– Bases that consider the underlying data • E.g. is there something better than checkerboards to describe
faces
• Something better than complex exponentials to describe music?
16 Sep 2014 11-755/18-797 6
The Energy Compaction Property
• Define “better”?
• The description
• The ideal:
– If the description is terminated at any point, we should
still get most of the information about the data
• Error should be small
16 Sep 2014 11-755/18-797 7
NN BwBwBwBwX ...332211
iii BwBwBwX ...ˆ2211
2ˆ
ii XXError
1 ii ErrorError
• A collection of images
– All normalized to 100x100 pixels
• What is common among all of them?
– Do we have a common descriptor?
16 Sep 2014 11-755/18-797 8
Data-specific description of faces
A typical face
• Assumption: There is a “typical” face that captures most of
what is common to all faces
– Every face can be represented by a scaled version of a typical face
– We will denote this face as V
• Approximate every face f as f = wf V
• Estimate V to minimize the squared error
– How? What is V?
16 Sep 2014 11-755/18-797 9
The typical face
A collection of least squares typical faces
• Assumption: There are a set of K “typical” faces that captures most of all faces
• Approximate every face f as f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,k Vk
– V2 is used to “correct” errors resulting from using only V1. So on average
– V3 corrects errors remaining after correction with V2
– And so on..
– V = [V1 V2 V3]
• Estimate V to minimize the squared error
– How? What is V?
16 Sep 2014 11-755/18-797 10
2
1,1,
2
2,2,1,1, )( ffffff VwfVwVwf
2
2,2,1,1,
2
3,3,2,2,1,1, )( )( ffffffffff VwVwfVwVwVwf
A recollection
16 Sep 2014 11-755/18-797 11
M =
N =
S=Pinv(N) M
? U =
• Finding the best explanation of music M in terms of notes N
• Also finds the score S of M in terms of N
MNpinvS
MNSU
)(
How about the other way?
16 Sep 2014 11-755/18-797 12
N = M Pinv(S)
M =
N = ? ?
S =
U =
• Finding the notes N given music M and score S • Also finds best explanation of M in terms of S
)( SpinvMN
MNSU
Finding Everything
16 Sep 2014 11-755/18-797 13
Find the four notes and their score that generate the closest approximation to M
M =
N = ? ?
S =
U =
?
MNSU
The same problem
• Here V, W and U are ALL unknown and must be determined
– Such that the squared error between U and F is minimum
• For each face
– f = wf,1 V1+ wf,2 V2 + wf,3 V3 +.. + wf,K VK
• For the collection of faces: F ≈ V W
– V is D x K and W is K x N
• D is the no. of pixels, N, is the no. of faces in the set
– Find V and W such that ||F – VW||2 is minimized 14
F =
U = Approximation
W
V
Typical faces
Abstracting the problem: Finding the FIRST typical face
• Each “point” represents a face in “pixel space”
16 Sep 2014 11-755/18-797 15
Pixel 1
Pix
el 2
Abstracting the problem: Finding the FIRST typical face
• Each “point” represents a face in “pixel space”
• Any “typical face” V is a vector in this space
16 Sep 2014 11-755/18-797 16
Pixel 1
Pix
el 2
V
Abstracting the problem: Finding the FIRST typical face
• Each “point” represents a face in “pixel space”
• The “typical face” V is a vector in this space
• The approximation wf, V for any face f is the projection of f onto V
• The distance between f and its projection wfV is the projection error for f
16 Sep 2014 11-755/18-797 17
Pixel 1
Pix
el 2
V
f
error
Abstracting the problem: Finding the FIRST typical face
• Every face in our data will suffer error when approximated by its projection on V
• The total squared length of all error lines is the total squared projection error
16 Sep 2014 11-755/18-797 18
Pixel 1
Pix
el 2
V
Abstracting the problem: Finding the FIRST typical face
• The problem of finding the first typical face V1: Find the V for which the total projection error is minimum!
• This “minimum squared error” V is our “best” first typical face
• It is also the first Eigen face
16 Sep 2014 11-755/18-797 19
Pixel 1
Pix
el 2
V
Abstracting the problem: Finding the FIRST typical face
• The problem of finding the first typical face V1: Find the V for which the total projection error is minimum!
• This “minimum squared error” V is our “best” first typical face
• It is also the first Eigen face
16 Sep 2014 11-755/18-797 20
Pixel 1
Pix
el 2
V
Abstracting the problem: Finding the FIRST typical face
• The problem of finding the first typical face V1: Find the V for which the total projection error is minimum!
• This “minimum squared error” V is our “best” first typical face
• It is also the first Eigen face
16 Sep 2014 11-755/18-797 21
Pixel 1
Pix
el 2
V
Abstracting the problem: Finding the FIRST typical face
• The problem of finding the first typical face V1: Find the V for which the total projection error is minimum!
• This “minimum squared error” V is our “best” first typical face
• It is also the first Eigen face
16 Sep 2014 11-755/18-797 22
Pixel 1
Pix
el 2
V
Abstracting the problem: Finding the FIRST typical face
• The problem of finding the first typical face V1: Find the V for which the total projection error is minimum!
• This “minimum squared error” V is our “best” first typical face
• It is also the first Eigen face
16 Sep 2014 11-755/18-797 23
Pixel 1
Pix
el 2
V1
Formalizing the Problem: Error from approximating a single vector
• Consider: approximating x = wv
– E.g x is a face, and “v” is the “typical face”
• Finding an approximation wv which is closest to x
– In a Euclidean sense
– Basically projecting x onto v
16 Sep 2014 11-755/18-797 24
x
y
v
x
Approximating: x = wv
Formalizing the Problem: Error from approximating a single vector
• Projection of a vector x on to a vector v
• Assuming v is of unit length:
16 Sep 2014 11-755/18-797 25
x
y
v
x
v
xvvx
T
ˆ
xvvxTˆ
xvvxxxTerror ˆ
vvTx
x-vvTx
2
error squared xvvxT
Approximating: x = wv
Error from approximating a single vector
• Minimum squared approximation error from approximating x as it as wv
• Optimal value of w: w = vTx
16 Sep 2014 11-755/18-797 26
x
y
v
x
vvTx
x-vvTx
2
)( xvvxxTe
Error from approximating a single vector
• Error from projecting a vector x on to a vector onto a unit vector v
16 Sep 2014 11-755/18-797 27
x
y
v
x
vvTx
x-vvTx
2
)( xvvxxTe
xvvxvvxxxvvxxvvxxTTTTTTTe )(
xvvvvxxvvxxvvxxxTTTTTTTT
Error from approximating a single vector
• Error from projecting a vector x on to a vector onto a unit vector v
16 Sep 2014 11-755/18-797 28
x
y
v
x
vvTx
x-vvTx
xvvvvxxvvxxvvxxxTTTTTTTT
=1
xvvxvvxxxvvxxvvxxTTTTTTTe )(
2
)( xvvxxTe
Error from approximating a single vector
• Error from projecting a vector x on to a vector onto a unit vector v
16 Sep 2014 11-755/18-797 29
x
y
v
x
vvTx
x-vvTx
xvvxxvvxxvvxxxTTTTTTT
xvvxxxxTTTe )(
xvvxvvxxxvvxxvvxxTTTTTTTe )(
2
)( xvvxxTe
Error from approximating a single vector
16 Sep 2014 11-755/18-797 30
x
y
v
x
vvTx
x-vvTx
This is the very familiar pythogoras’ theorem!!
e(x) = xTx – xTv.vTx
Length of projection
Error for many vectors
• Error for one vector:
• Error for many vectors
• Goal: Estimate v to minimize this error!
16 Sep 2014 11-755/18-797 31
xvvxxxxTTTe )(
x
y
v
i
i
TT
ii
T
i
i
ieE xvvxxxx )( i i
i
TT
ii
T
i xvvxxx
Error for many vectors
• Total error:
• Add constraint: vTv = 1
• Constrained objective to minimize:
16 Sep 2014 11-755/18-797 32
x
y
v
1 vvxvvxxxT
i i
i
TT
ii
T
iE
i i
i
TT
ii
T
iE xvvxxx
Two Matrix Identities
• Derivative w.r.t v
16 Sep 2014 11-755/18-797 33
vxxv
vxxv
v
xvvx TTTTT
d
d
d
d2
vv
vv2
d
d T
Minimizing error
• Differentiating w.r.t v and equating to 0
16 Sep 2014 11-755/18-797 34
x
y
v
i
T
ii 022 vvxx
1 vvxvvxxxT
i i
i
TT
ii
T
iE
vvxx
i
T
ii
The correlation matrix
• The encircled term is the correlation matrix
16 Sep 2014 11-755/18-797 35
vvxx
i
T
ii
X = Data Matrix
XT =
Tra
nsposed
Data
Matr
ix
Correlation =
NxxxX ... 21RXXxx T
i
T
ii
The best “basis”
• The minimum-error basis is found by solving
• v is an Eigen vector of the correlation matrix R
– is the corresponding Eigen value
16 Sep 2014 11-755/18-797 36
x
y
v
vRv
What about the total error?
• xTv = vTx (inner product)
16 Sep 2014 11-755/18-797 37
i i
i
TT
ii
T
iE xvvxxx
i i
T
ii
T
i
T
iE vxxvxx
i
T
i
T
i
i
T
i
T
i
i
T
i
T
iE vvxxvvxxRvvxx
i
i
T
iE xx
i i
T
ii
T
i
T
i vxxvxx
Minimizing the error • The total error is
• We already know that the optimal basis is an Eigen vector
• The total error depends on the negative of the corresponding Eigen value
• To minimize error, we must maximize
• i.e. Select the Eigen vector with the largest Eigen value
16 Sep 2014 11-755/18-797 38
i
i
T
iE xx
The typical face
• Compute the correlation matrix for your data – Arrange them in matrix X and compute R = XXT
• Compute the principal Eigen vector of R – The Eigen vector with the largest Eigen value
• This is the typical face
16 Sep 2014 11-755/18-797 39
The typical face
The second typical face
• The first typical face models some of the characteristics of the faces
• Simply by scaling its grey level
• But the approximation has error
• The second typical face must explain some of this error
16 Sep 2014 11-755/18-797 40
?
The second typical face
• Approximation with only the first typical face has error
• The second face must explain this error
• How do we find this this face?
16 Sep 2014 11-755/18-797 41
The first typical face
The second typical face?
Solution: Iterate
• Get the “error” faces by subtracting the first-level approximation from the original image
• Repeat the estimation on the “error” images
16 Sep 2014 11-755/18-797 42
-
-
-
-
=
=
=
=
Solution: Iterate
• Get the “error” faces by subtracting the first-level approximation from the original image
• Repeat the estimation on the “error” images
16 Sep 2014 11-755/18-797 43
-
-
-
-
=
=
=
=
Abstracting the problem: Finding the second typical face
• Each “point” represents an error face in “pixel space”
• Find the vector V2 such that the projection of these error faces on V2 results in the least error
16 Sep 2014 11-755/18-797 44
Pixel 1
Pix
el 2
ERROR
FACES
Minimizing error
• Differentiating w.r.t v and equating to 0
16 Sep 2014 11-755/18-797 45
i
T
ii 022 vvee
1 vvevveeeT
i i
i
TT
ii
T
iE
vvee
i
T
ii
Pixel 1
Pix
el 2
ERROR
FACES
The same math applies
but now to the set
of error data points
Minimizing error
16 Sep 2014 11-755/18-797 46
Pixel 1
Pix
el 2
ERROR
FACES
The same math applies
but now to the set
of error data points
• The minimum-error basis is found by solving
• v2 is an Eigen vector of the correlation matrix Re
corresponding to the largest eigen value of Re
22 vvR e T
e eeR
Which gives us our second typical face
• But approximation with the two faces will still result in error
• So we need more typical faces to explain this error
• We can do this by subtracting the appropriately scaled version of the second “typical” face from the error images and repeating the process
16 Sep 2014 11-755/18-797 47
Solution: Iterate
• Get the second-level “error” faces by subtracting the scaled second typical face from the first-level error
• Repeat the estimation on the second-level “error” images
16 Sep 2014 11-755/18-797 48
-
-
-
-
=
=
=
=
Error face Second-level error
An interesting property
• Each “typical face” will be orthogonal to all other typical faces
– Because each of them is learned to explain what the rest could not
– None of these faces can explain one another!
16 Sep 2014 11-755/18-797 49
To add more faces
• We can continue the process, refining the error each time
– An instance of a procedure is called “Gram-Schmidt” orthogonalization
• OR… we can do it all at once
16 Sep 2014 11-755/18-797 50
With many typical faces
• Approximate every face f as f = wf,1 V1+ wf,2 V2 +... + wf,k Vk
• Here W, V and U are ALL unknown and must be determined – Such that the squared error between U and M is minimum
16 Sep 2014 11-755/18-797 51
M =
U = Approximation
W
V
Typical faces
With multiple bases
• Assumption: all bases v1 v2 v3.. are unit length
• Assumption: all bases are orthogonal to one another: viTvj = 0 if i != j
– We are trying to find the optimal K-dimensional subspace to project the data
– Any set of vectors in this subspace will define the subspace
– Constraining them to be orthogonal does not change this
• I.e. if V = [v1 v2 v3 … ], VTV = I
– Pinv(V) = VT
• Projection matrix for V = VPinv(V) = VVT
16 Sep 2014 11-755/18-797 52
With multiple bases
• Projection for a vector
• Error vector =
• Error length =
16 Sep 2014 11-755/18-797 53
xVVxxxxTTTe )(
xVVxTˆ
xVVxxxT ˆ
x
y
V
x
VVTx
x-VVTx
Represents a
K-dimensional subspace
With multiple bases
• Error for one vector:
• Error for many vectors
• Goal: Estimate V to minimize this error!
16 Sep 2014 11-755/18-797 54
xVVxxxxTTTe )(
x
y
i i
i
TT
ii
T
iE xVVxxx
V
Minimizing error
• With constraint VTV = I, objective to minimize
– Note: now L is a diagonal matrix
– The constraint simply ensures that vTv = 1 for every basis
• Differentiating w.r.t V and equating to 0
16 Sep 2014 11-755/18-797 55
IVVxVVxxx L T
i i
i
TT
ii
T
i traceE
022 L
VVxx
i
T
ii VRV L
i
K
j
ji
T
iE1
xx
Finding the optimal K bases
• Compute the Eigendecompsition of the correlation matrix
• Select K Eigen vectors
• But which K?
• Total error =
• Select K eigen vectors corresponding to the K largest Eigen values
16 Sep 2014 11-755/18-797 56
VRV L
Eigen Faces!
• Arrange your input data into a matrix X
• Compute the correlation R = XXT
• Solve the Eigen decomposition: RV = LV
• The Eigen vectors corresponding to the K largest eigen values are our optimal bases
• We will refer to these as eigen faces.
16 Sep 2014 11-755/18-797 57
How many Eigen faces
• How to choose “K” (number of Eigen faces)
• Lay all faces side by side in vector form to form a matrix – In my example: 300 faces. So the matrix is 10000 x 300
• Multiply the matrix by its transpose – The correlation matrix is 10000x10000
16 Sep 2014 11-755/18-797 58
M = Data Matrix
MT =
Tra
nsposed
Data
Matr
ix
Correlation =
10000x300
300x10000
10000x10000
Eigen faces
• Compute the eigen vectors – Only 300 of the 10000 eigen values are non-zero
• Why?
• Retain eigen vectors with high eigen values (>0) – Could use a higher threshold
16 Sep 2014 11-755/18-797 59
[U,S] = eig(correlation)
10000
2
1
.0.0
.....
.....
0.00
0.0.
S
U
eig
enfa
ce1
eig
enfa
ce2
Eigen Faces
• The eigen vector with the highest eigen value is the first typical face
• The vector with the second highest eigen value is the second typical face.
• Etc.
16 Sep 2014 11-755/18-797 60
U
eig
enfa
ce1
eig
enfa
ce2
eigenface1 eigenface2
eigenface3
Representing a face
• The weights with which the eigen faces must be combined to compose the face are used to represent the face!
16 Sep 2014 11-755/18-797 61
= w1 + w2 + w3
Representation = [w1 w2 w3 …. ]T
Energy Compaction Example
• One outcome of the “energy compaction principle”: the approximations are recognizable
• Approximating a face with one basis:
16 Sep 2014 11-755/18-797 62
11vwf
Energy Compaction Example
• One outcome of the “energy compaction principle”: the approximations are recognizable
• Approximating a face with one Eigenface:
16 Sep 2014 11-755/18-797 63
11vwf
Energy Compaction Example
• One outcome of the “energy compaction principle”: the approximations are recognizable
• Approximating a face with 10 eigenfaces:
16 Sep 2014 11-755/18-797 64
10102211 ... vvv wwwf
Energy Compaction Example
• One outcome of the “energy compaction principle”: the approximations are recognizable
• Approximating a face with 30 eigenfaces:
16 Sep 2014 11-755/18-797 65
303010102211 ...... vvvv wwwwf
Energy Compaction Example
• One outcome of the “energy compaction principle”: the approximations are recognizable
• Approximating a face with 60 eigenfaces:
16 Sep 2014 11-755/18-797 66
6060303010102211 ......... vvvvv wwwwwf
How did I do this?
• Hint: only changing weights assigned to Eigen faces..
16 Sep 2014 11-755/18-797 67
Class specificity
• The Eigenimages (bases) are very specific to the class of data they are trained on
– Faces here
• They will not be useful for other classes
16 Sep 2014 11-755/18-797 68
eigenface1 eigenface2
eigenface3
Class specificity
• Eigen bases are class specific
• Composing a fishbowl from Eigenfaces
16 Sep 2014 11-755/18-797 69
Class specificity
• Eigen bases are class specific
• Composing a fishbowl from Eigenfaces
• With 1 basis
16 Sep 2014 11-755/18-797 70
11vwf
Class specificity
• Eigen bases are class specific
• Composing a fishbowl from Eigenfaces
• With 10 bases
16 Sep 2014 11-755/18-797 71
10102211 ... vvv wwwf
Class specificity
• Eigen bases are class specific
• Composing a fishbowl from Eigenfaces
• With 30 bases
16 Sep 2014 11-755/18-797 72
303010102211 ...... vvvv wwwwf
Class specificity
• Eigen bases are class specific
• Composing a fishbowl from Eigenfaces
• With 100 bases
16 Sep 2014 11-755/18-797 73
100100303010102211 ......... vvvvv wwwwwf
Universal bases • Universal bases..
• End up looking a lot like discrete cosine transforms!!!! • DCTs are the best “universal” bases
– If you don’t know what your data are, use the DCT
16 Sep 2014 11-755/18-797 74
SVD instead of Eigen
• Do we need to compute a 10000 x 10000 correlation matrix and then perform Eigen analysis? – Will take a very long time on your laptop
• SVD – Only need to perform “Thin” SVD. Very fast
• U = 10000 x 300 – The columns of U are the eigen faces!
– The Us corresponding to the “zero” eigen values are not computed
• S = 300 x 300
• V = 300 x 300
16 Sep 2014 11-755/18-797 75
M = Data Matrix
10000x300
U=10000x300 S=300x300 V=300x300
=
U
eig
enfa
ce1
eig
enfa
ce2
Using SVD to compute Eigenbases
[U, S, V] = SVD(X)
• U will have the Eigenvectors
• Thin SVD for 100 bases:
[U,S,V] = svds(X, 100)
• Much more efficient
16 Sep 2014 11-755/18-797 76
Eigen Decomposition of data
• Nothing magical about faces – can be applied to any data.
– Eigen analysis is one of the key components of data compression and representation
– Represent N-dimensional data by the weights of the K leading Eigen vectors
• Reduces effective dimension of the data from N to K
• But requires knowledge of Eigen vectors
16 Sep 2014 11-755/18-797 77
Eigen decomposition of what?
• Eigen decomposition of the correlation matrix
• Is there an alternate way?
16 Sep 2014 11-755/18-797 78
Linear vs. Affine
• The model we saw
– Approximate every face f as f = wf,1 V1+ wf,2 V2 +... + wf,k Vk
– Linear combination of bases
• If you add a constant
f = wf,1 V1+ wf,2 V2 +... + wf,k Vk + m
– Affine combination of bases
16 Sep 2014 11-755/18-797 79
Estimation with the constant
• Estimate
f = wf,1 V1+ wf,2 V2 +... + wf,k Vk + m
• Lets do this incrementally first:
• f ≈ m
– For every face
– Find m to optimize the approximation
16 Sep 2014 11-755/18-797 80
Estimation with the constant
• Estimate
f ≈ m
– for every f!
• Error over all faces 𝐸 = ||𝑓 − 𝑚||2𝑓
• Minimizing the error with respect to m, we simply get
– 𝑚 = 1
𝑁 𝑓𝑓
• The mean of the data
16 Sep 2014 11-755/18-797 81
Estimation the remaining
• Same procedure as before: – Remaining “typical faces” must model what the constant m
could not
• Subtract the constant from every data point
– 𝑓 = 𝑓 − 𝑚
• Now apply the model: – 𝑓 = wf,1 V1+ wf,2 V2 +... + wf,k Vk
• This is just Eigen analysis of the “mean-normalized” data – Also called the “centered” data
16 Sep 2014 11-755/18-797 82
Estimating the Affine model
f = wf,1 V1+ wf,2 V2 +... + wf,k Vk + m
• First estimate the mean m
𝑚 = 1
𝑁 𝑓
𝑓
• Compute the correlation matrix of the “centered” data 𝑓 = 𝑓 − 𝑚
– 𝑪 = 𝑓 𝑓 𝑇𝑓 = (𝑓 − 𝑚 )(𝑓 − 𝑚 )𝑇
𝑓
– This is the covariance matrix of the set of f
16 Sep 2014 11-755/18-797 83
Estimating the Affine model
f = wf,1 V1+ wf,2 V2 +... + wf,k Vk + m
• First estimate the mean m
𝑚 = 1
𝑁 𝑓
𝑓
• Compute the covariance matrix – 𝑪= (𝑓 − 𝑚 )(𝑓 − 𝑚 )𝑇
𝑓
• Eigen decompose!
• The Eigen vectors corresponding to the top k Eigen values give us the bases Vk
16 Sep 2014 11-755/18-797 84
VCV L
Properties of the affine model
• The bases V1, V2 ,...,Vk are all orthogonal to one another
– Eigen vectors of the symmetric Covariance matrix
• But they are not orthogonal to m
– Because m is an unscaled constant
• We could jointly estimate all V1, V2 ,...,Vk and m by minimizing
||𝑓 − ( 𝑤𝑓,𝑖𝑉𝑖 + 𝑚𝑓 )||2𝑓 +𝑡𝑟𝑎𝑐𝑒(Λ 𝑽𝑇𝑽 − 𝑰 )
16 Sep 2014 11-755/18-797 85
Linear vs. Affine
• The model we saw
– Approximate every face f as f = wf,1 V1+ wf,2 V2 +... + wf,k Vk
– The Karhunen Loeve Transform
– Retains maximum Energy for any order k
• If you add a constant
f = wf,1 V1+ wf,2 V2 +... + wf,k Vk + m
– Principal Component Analysis
– Retains maximum Variance for any order k
16 Sep 2014 11-755/18-797 86
How do they relate
• Relationship between correlation matrix and covariance matrix
R = C + mmT
• Karhunen Loeve bases are Eigen vectors of R
• PCA bases are Eigen vectors of C
• How do they relate
– Not easy to say..
16 Sep 2014 11-755/18-797 87
The Eigen vectors
• The Eigen vectors of C are the major axes of the ellipsoid Cv, where v are the vectors on the unit sphere
16 Sep 2014 11-755/18-797 88
The Eigen vectors
• The Eigen vectors of R are the major axes of the ellipsoid Cv + mmTv
• Note that mmT has rank 1 and mmTv is a line
16 Sep 2014 11-755/18-797 89
mmT
The Eigen vectors
• The principal Eigenvector of R lies between the principal Eigen vector of C and m
• Similarly the principal Eigen value
• Similar logic is not easily extendable to the other Eigenvectors, however
16 Sep 2014 11-755/18-797 90
mmT
||||)1(
m
mee CR
10
2||||)1( m CR
Eigenvectors
• Turns out: Eigenvectors of the correlation matrix represent the major and minor axes of an ellipse centered at the origin which encloses the data most compactly
• The SVD of data matrix X uncovers these vectors
• KLT
16 Sep 2014 11-755/18-797 91
Pixel 1
Pix
el 2
Eigenvectors
• Turns out: Eigenvectors of the covariance represent the major and minor axes of an ellipse centered at the mean which encloses the data most compactly
• PCA uncovers these vectors
• In practice, “Eigen faces” refers to PCA faces, and not KLT faces
16 Sep 2014 11-755/18-797 92
Pixel 1
Pix
el 2
What about sound?
• Finding Eigen bases for speech signals:
• Look like DFT/DCT
• Or wavelets
• DFTs are pretty good most of the time
16 Sep 2014 11-755/18-797 93
Eigen Analysis
• Can often find surprising features in your data
• Trends, relationships, more
• Commonly used in recommender systems
• An interesting example..
16 Sep 2014 11-755/18-797 94
Eigen Analysis
• Cheng Liu’s research on pipes.. • SVD automatically separates useful and uninformative
features
16 Sep 2014 11-755/18-797 95
Eigen Analysis
• But for all of this, we need to “preprocess” data
• Eliminate unnecessary aspects
– E.g. noise, other externally caused variations..
16 Sep 2014 11-755/18-797 96