PRINCIPAL COMPONENT ANALYSIS - KINX CDN
Transcript of PRINCIPAL COMPONENT ANALYSIS - KINX CDN
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PRINCIPAL COMPONENT ANALYSIS
ELEC801 Pattern Recognition, Fall 2017, KNU
Instructor: Gil-Jin Jang
Slide credits:
Srinivasa Narasimhan, CS, CMU; James J. Cochran, Louisiana Tech University;
Barnabás Póczos, University of Alberta
Slide credit: Narasimhan, Cochran, Póczos 10/23/2017 1
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Example:
– Input data type: 53 blood and urine measurements (wet
chemistry)
– Samples: 65 people (33 alcoholics, 32 non-alcoholics).
� Data given in a matrix format
Problem: Hi-Dimensional Features
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 2
Slide credit: S. Narasimhan
H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHCH-MCHC
A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000
A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000
A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000
A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000
A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000
A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000
A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000
A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000
A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
010 20 30 40 50 60 70
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Person
H-Bands
0 50 150 250 350 45050
100
150
200
250
300
350
400
450
500
550
C-Triglycerides
C-LDH
0100
200300
400500
0
200
400
6000
1
2
3
4
C-TriglyceridesC-LDH
M-EPI
Univariate Bivariate
Trivariate
Data Representation
Slide credit: S. Narasimhan
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 3
Multi-variate
data
distributions can
be represented
by visualizing
distributions of
pairs or triplets.
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Issues
– Any other better presentation?
– Do we need all 53 dimensions?
– How to find the BEST lower dimensional space that conveys
maximum useful information?
� One answer: Find PRINCIPAL COMPONENTS
– Goal: We wish to explain/summarize the underlying
variance-covariance structure of a large set of variables
through a few linear combinations of these variables.
Data Representation
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 4
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Principle Component
AnalysisGiven N points in a p-dimensional
space, for large N,– how does one project on to a 1-
dimensional space while preserving broad trends in the data and allowing it to be visualized?
Choose a line that fits the data so the points are spread out well along the line, such that– maximizes variance of projected data
(purple line)
– minimizes mean squared distance between
» data point and
» projections (sum of blue lines)
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 5
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Minimize sum of squares of distances to the line.
– Minimizing sum of squares of distances to the line is the
same as maximizing the sum of squares of the projections
on that line, thanks to Pythagoras.
Algebraic Interpretation – 1D
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 6
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA: Mathematical Formulation
∑=
==
N
i
i
N 1
0
1xmx
∑=
−=
N
i
iJ
1
2
000)( xxx
Let us say we have xi, i=1…N data points in p dimensions (p is large)
If we want to represent the data set by a single point x0, then
Can we justify this choice mathematically?
Source: Chapter 3 of [DHS]
It turns out that if you minimize J0, you get the above solution,
viz., sample mean
Sample mean
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 7
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA: An Intuitive Approach
emx a+=
Niaii
...1, =+= emx
Representing the data set xi, i=1…N by its mean is quite uninformative
So let’s try to represent the data by a straight line of the form:
This is equation of a straight line that says that it passes through m
e is a unit vector along the straight line
And the signed distance of a point x from m is a
The individual points projected on this straight line would be
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 8
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA: An Intuitive Approach
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 9
( ) ( )
∑∑∑
∑∑∑
∑∑
===
===
==
−+−−=
−+−−=
−+=+≡
N
i
ii
N
i
T
i
N
i
i
N
i
ii
N
i
T
i
N
i
i
N
i
ii
N
i
iiN
aa
aa
aaDaaJ
1
2
11
2
1
2
11
22
1
2
1
11
||||)(2
||||)(2||||
,),,...,(
mxmxe
mxmxee
xemxeme
)( mxe −=i
T
ia
∑∑∑===
−+−=−+−−−=
N
i
i
T
N
i
i
N
i
T
ii
TJ
1
2
1
2
1
1||||||||))(()( mxSeemxemxmxee
∑=
−−=
N
i
T
ii
1
))(( mxmxS
Let’s now determine ai’s
Partially differentiating with respect to aiwe get:
Plugging in this expression for aiin J
1we get:
where is called the scatter matrix
(in this case, covariance matrix)
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
So minimizing J1
is equivalent to maximizing:
PCA: An Intuitive Approach…
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 10
SeeT
1=eeT
)1( −− eeSeeTT
λ
eSe0eSe λλ ==− or22
Subject to the constraint that e is a unit vector:
Use Lagrange multiplier method to form the objective function:
Differentiate to obtain the equation:
Solution is that e is the eigenvector of S corresponding to the largest eigenvalue
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Sx points in some other direction in general
e is an eigenvector and λ an eigenvalue if
Eigensystem
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 11
x
Sx
eSe = λe
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA: Extension to Multi-dimensions
12
ddaa eemx +++= ...
11
∑ ∑= =
−+=
N
i
i
d
k
kikdaJ
1
2
1
||)(|| xem
∑=
−−=
N
i
T
ii
1
))(( mxmxS
The preceding analysis can be extended in the following way.
Instead of projecting the data points on to a straight line, we may
now want to project them on a d-dimensional plane of the form:
d is much smaller than the original dimension p
In this case one can form the objective function:
It can also be shown that the vectors e1, e
2, …, e
dare d eigenvectors
corresponding to d largest eigenvalues of the scatter matrix
10/23/2017Slide credit: Narasimhan, Cochran, Póczos
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Given a square matrix �, for some scalar
� (eigenvalue), a non-zero vector � is an eigenvector if
it satisfies (note: notation � ≡ �)
�� = �� ↔ (� − ��)� = �
� Characteristics
– There are � eigenvectors for non-singular � × � matrix �
– solution to the characteristic equation is obtained by finding
det(� − ��) = 0 (MATLAB function ‘eig’)
– example: 2x2 case
Eigenvalues and Eigenvectors
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 13
( ) 3,1012021
12det,
21
122
=⇒=−−⇒=
−
−
= λλ
λ
λS
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Eigenvectors on Covariance Matrix
• if � is a covariance matrix (� = �[���]), and � is a unit vector
(|�| = 1), then �[(���)�] = �,
since ���� = ����
� � = [��; ��; … ; ��] decorrelates �, i.e., zeros off-
diagonal terms of covariance matrix:
�[���] = �[��
����] =��
�[���]� =
=���� =
�� 0
0 ��⋯
0
0⋮ ⋱ ⋮
0 0 ⋯ ��
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 14
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Dimension Reduction
� Suppose each data point is a vector of dimension d.
� �[����] = �[��
���
���] =��
��[���]�� =��
����
– The eigenvectors of S define a new coordinate system
» eigenvector with largest eigenvalue captures the most variation among
training vectors x
» eigenvector with smallest eigenvalue has least variation
– We can compress the data (represent with little error) by only using the top few eigenvectors
» corresponds to choosing a “linear subspace”
� represent points on a line, plane, or “hyper-plane”
» these eigenvectors are known as the principal components
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 15
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Code Example: pca.m
function [W, eigvector, eigvalue] = pca(R)
% PCA Principal component analysis
% [W, EIGVECTOR, EIGVALUE] = PCA(covX)
% covX: covariance matrix
% EIGVECTOR: each column is a
% eigenvector of covX
% EIGVALUE: eigenvalues of covX
% W: E[WxxW'] = I
[v, d] = eig(R);
eigvector = v;
eigvalue = diag(d);
% Sort in a descending order
[-, index] = sort(-eigvalue);
% or use sort(eigvalue,'descend‘);
eigvalue = eigvalue(index);
eigvector = eigvector(:, index);
N = length(eigvalue);
W = zeros(N,N);
for m=1:N,
W(m,:) = ...
eigvector(:,m)'/sqrt(eigvalue(m));
end
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 16
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA Applications:
Face recognition
Facial expression recognition
Barnabás Póczos
University of Alberta
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 17
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Challenge: Face Recognition
Task: Identify specific
person, based on
facial image
regardless of wearing
glasses, lighting,…
Can we use all the
given 256 x 256
pixels?
� 2^16 = 64KB when
256 gray-level
images
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 18
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� An image is a point in a high dimensional space
– An N x M image is a point in RNM
– We can define vectors in this space as we did in the 2D case
The Space of Faces
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 19
+=
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Key Idea
}ˆ{ P
RLx=χ• Images in the possible set are highly correlated.
• So, compress them to a low-dimensional subspace that
captures key appearance characteristics
• EIGENFACES: [Turk and Pentland]
USE PCA!
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 20
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Eigenfaces
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 21
Eigenfaces look somewhat like generic faces.
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Eigenfaces are
the eigenvectors of
the covariance matrix of
the vector space of
human faces
� Eigenfaces are the ‘standardized face ingredients’ derived from the statistical analysis of many pictures of human faces
� A human face may be considered to be a combination of these standardized faces
Eigenfaces – verbal summary
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 22
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
1. Large set of images of human faces is taken.
2. The images are normalized to line up the eyes, mouths and other features.
3. The eigenvectors of the covariance matrix of the face image vectors are then extracted.
4. These eigenvectors are called eigenfaces.
Generating Eigenfaces
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 23
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� When properly weighted, eigenfaces can be summed together to create an approximate gray-scale rendering of a human face.
� Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces.
� Hence eigenfaces provide a means of applying data compression to faces for identification purposes.
Eigenfaces for Face Recognition
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 24
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Dimensionality Reduction
The set of faces is a “subspace” of the set of images– Suppose it is K dimensional
– We can find the best subspace using PCA
– This is like fitting a “hyper-plane” to the set of faces
» spanned by vectors v1, v
2, ..., v
K
Any face:
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 25
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Projecting onto the Eigenfaces
� The eigenfaces v1, ..., v
Kspan the space of faces
– A face is converted to eigenface coordinates by
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 26
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Choosing the Dimension K
K NMi =
eigenvalues
� How many eigenfaces to use? – find the “knee”
� Look at the decay of the eigenvalues
– the eigenvalue tells you the amount of variance “in the direction” of that eigenface
– ignore eigenfaces with low variance
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 27
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Applying PCA: Eigenfaces
� Example data set: Images of faces
– Eigenface approach[Turk & Pentland], [Sirovich & Kirby]
� Each face x is …
– 256 × 256 values (luminance at location)
» x in ℜ256×256 (view as 64K dim vector)
� Form X = [ x1
, …, xm
] centered data matrix
� Compute Σ = XXT
� Problem: Σ is 64K × 64K … HUGE!!!
256 x
256
real v
alu
es
m faces
X =
x1, …, x
m
Method A: Build a PCA subspace for each person and check which
subspace can reconstruct the test image the best
Method B: Build one PCA database for the whole dataset and then
classify based on the weights.
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 28
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Computational Complexity
� Suppose m instances, each of size N
– Eigenfaces: m=500 faces, each of size N=64K
� Given N×N covariance matrix Σ, can compute
– all N eigenvectors/eigenvalues in O(N3)
– first k eigenvectors/eigenvalues in O(kN2)
� But if N=64K, it often becomes computationally
intractable
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 29
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
A Clever Workaround
� Note that m<<64K
� Use L=XTX instead of Σ=XXT
� If v is eigenvector of L
then Xv is eigenvector of Σ
Proof: L v = γ v
XTX v = γ v
X (XTX v) = X(γ v) = γ Xv
(XXT)X v = γ (Xv)
Σ (Xv) = γ (Xv)
256 x
256
real v
alu
es
m faces
X =
x1, …, x
m
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 30
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Happiness subspace (method A)
Method A: Build a PCA subspace for each person and check which subspace
can reconstruct the test image the best
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 31
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Disgust subspace (method A)
Method A: Build a PCA subspace for each person and check which subspace
can reconstruct the test image the best
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 32
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Principle Components (Method B)
Slide credit: Barnabás Póczos
Method B: Build one PCA database for the whole dataset and then classify
based on the weights.
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 33
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Classification with Eigenfaces (Method B)
1. Process the image database (set of images with labels)
– Run PCA—compute eigenfaces
– Calculate the K coefficients for each image
2. Given a new image (to be recognized) x, calculate Kcoefficients
3. Detect if x is a face
4. If it is a face, who is it?
• Find closest labeled face in database
• nearest-neighbor in K-dimensional space
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 34
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Reconstructing… (Method B)
� … faster if train with…
– only people w/out glasses
– same lighting conditions
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 35
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Advantages
– PCA is completely knowledge free
– Reduction in computation and memory requirements
� Shortcomings
– PCA Requires carefully controlled data:
» All faces centered in frame, same sizes, and sensitive to angles
– Alternative:
» “Learn” one set of PCA vectors for each angle
» Use the one with lowest error
Summary
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 36
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
LINEAR DISCRIMINANT ANALYSIS
PCA vs LDA
LDA for two classes and multi-classes
Example problem
Applications of LDA
Slide credit: Narasimhan, Cochran, Póczos 10/23/2017 37
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Principal Component Analysis (PCA)
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 38
Find a transformation w, such that the wTx is dispersed the most
(maximum distribution)
XwYT
=
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Linear Discriminant Analysis (LDA)
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 39
Find a transformation w, such that the wTX1
and wTX2
are maximally separated &
each class is minimally dispersed (maximum separation)
11XwY
T=
22XwY
T=
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Linear Discriminant Analysis (LDA)
• Perform dimensionality reduction “while
preserving as much of the class discriminatory
information as possible”.
• Seeks to find directions along which the classes
are best separated.
• Takes into consideration the scatter within-
classes but also the scatter between-classes.
• More capable of distinguishing image variation
due to identity from variation due to other sources
such as illumination and expression.
4010/23/2017Slide credit: Narasimhan, Cochran, Póczos
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Separation by mean, variation, …
Linear Discriminant Analysis (LDA)
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 41
1
2
3
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Linear Discriminant Analysis (LDA)
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 42
• Two classes: ω1, ω
2
• Introduce
• SB
: between-class scatter matrix
• SW
: within-class scatter matrix
( )( )∑∈
−−=
cx
T
cccxxS
ω
µµ
21SSS
w+=
∑∈
=
cxc
cx
Nω
µ1
( )( )TB
S2121
µµµµ −−=
class mean
scatter of the means
class covariance
sum of the covariances
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition 43
LDA
( )( ) ( )( ) wSwwwSB
TTTT
B=−−=−−=
21212121
~~~~~
µµµµµµµµ
wSw
wSwwJ
W
T
B
T
=)(Objective function J(w)
maximizes SB
of y
minimizes SW
of y
XwYT
=
c
T
yc
c wyN
c
µµ
ω
== ∑∈
1~
( )( ) ( )( )
( )( ) wSwwxxw
wxwwxwyyS
c
T
x
T
cc
T
x
T
c
TT
c
TT
y
T
ccc
c
cc
=−−=
−−=−−=
∑
∑∑
∈
∈∈
ω
ωω
µµ
µµµµ~~
~
wSwSSW
T=+
21
~~
10/23/2017Slide credit: Narasimhan, Cochran, Póczos
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
LDA by Eigenvector
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 44
wSw
wSwwJ
W
T
B
T
=)( How to maximize this
1such that 2
1 minimize =wSwwSw-
W
T
B
T
)1(2
1
2
1 )(w, −+−=Λ wSwwSw
W
T
B
Tλλ
0 =+−=∂
Λ∂wSwS
wWBT
λ
wSwSWB
λ =
wwSSBW
λ 1=
−
Eigen value problem!!
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
LDA Example
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 45
• Example
• X1={(4,1),(2,4),(2,3),(3,6),(4,4)}
• X2={(9,10),(6,8),(9,5),(8,7),(10,8)}
−
−=
−
−=
==
64.204.
04.84.1,
6.24.
4.8.
]6.74.8[],6.30.3[
21
21
SS
µµ
• Class statistics
• Within and between class scatter
−
−=
=
28.544.
44.64.2,
00.1660.21
60.2116.29WBSS
• Solve the Eigen value problem
wwSSBW
λ 1=
−
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
LDA on more than 2 classes
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 46
wSw
wSw
S
S
wJ
W
T
B
T
W
B
== ~
~
)(
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
LDA on C > 2 classes
Slide credit: Narasimhan, Cochran, Póczos 47
1[ ]
kW V V= L
1 1 2 2
1 2 1 2[ ]
c
c
nU X X X X X= L
1 1
( )( )cnc
i i T
w j i j i
i j
S X Xµ µ
= =
= − −∑∑
1
( )( )c
T
b i i
i
S µ µ µ µ
=
= − −∑
B
W
SV V
Sλ=
• C-classes
wSwSWB
λ =
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Comparison of PCA and LDA
� PCA – finds axes of maximal variances
– Computed by eigenvalue decomposition
– Eigenfaces when applied to face recognition
� LDA – finds axes of maximal separation
– Often referred to Fisher’s linear discriminant
– Fisherfaces when applied to face recognition
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 48
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Which is a set of Eigenfaces and Fisherfaces?
Images from Wikipedia.org
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 49
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Eigenfaces vs. Fisherfaces
� Independent Comparative Study of PCA and LDA on the FERET
Data Set, by Kresimir Delac, Mislav Grgic, Sonja Grgic
– PCA: blurred, like average faces
– LDA finds more discriminant features from face images
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 50
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
References
� Peter N. Belhumeur, João P. Hespanha, and David J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997
� Kresimir Delac, Mislav Grgic, Sonja Grgic, Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set
� M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991.
� Gregory Shakhnarovich, Baback Moghaddam, Face Recognition in Subspaces, Mitsubishi TR2004-041 May 2004
� Naotoshi Seo, Project: Eigenfaces and Fisherfaces, http://note.sonots.com/SciSoftware/FaceRecognition.html
� http://www.face-rec.org/algorithms/
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 51
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA Applications:
Image Compression and Denoising
Barnabás Póczos
University of Alberta
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 52
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Original Image
• Divide the original 372x492 image into patches:• Each patch is an instance that contains 12x12 pixels on a grid
• View each as a 144-D vectorSlide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 53
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
L2
error and PCA dim
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 54
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Looks like the discrete cosine bases of JPG!...
60 most important eigenvectors
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 55
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
2D Discrete Cosine Basis
http://en.wikipedia.org/wiki/Discrete_cosine_transform
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 56
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA compression: 144D to 60D
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 57
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
16 most important eigenvectors
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 58
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA compression: 144D to 16D
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 59
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
6 most important eigenvectors
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 60
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA compression: 144D to 6D
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 61
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
2 4 6 8 10 12
2
4
6
8
10
12
3 most important eigenvectors
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 62
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA compression: 144D to 3D
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 63
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Noise Filtering by Auto-Encoder
x x’
U x
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 64
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Noisy image
Slide credit: Barnabás Póczos
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 65
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Denoised: 15 PCA components
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 66
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Kernel PCA
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 67
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
z
v
-3 -2 -1 0 1 2 3
-4-2
02
4Motivation
z
u
-3 -2 -1 0 1 2 3
02
46
8
???
????
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 68
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Motivation
Linear projections
will not detect the
pattern.
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 69
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Nonlinear PCA
Three popular methods are available:
1) Neural-network based PCA (E. Oja, 1982)
2)Method of Principal Curves (T.J. Hastie and W. Stuetzle,
1989)
3) Kernel based PCA (B. Schölkopf, A. Smola, and K.
Müller, 1998)
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 70
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
PCA
NPCA
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 71
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
KPCA: Basic Idea
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 72
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� Let C be the scatter matrix of the centered mapping φ(x):
� Let w be an eigenvector of C, then w can be written as a linear
combination:
� Also, we have:
� Combining, we get:
Kernel PCA Formulation…
∑=
=
N
i
T
iiC
1
)()( xx φφ
∑=
=
N
k
kk
1
)(xw φα
ww λ=C
∑∑∑===
=
N
k
kk
N
k
kk
N
i
T
ii
111
)())()()()(( xxxx φαλφαφφ
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 73
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Kernel PCA Formulation…
).()(where,
,,2,1,)()()()()()(
)()()()(
)())()()()((
2
11 1
11 1
111
j
T
iij
N
k
k
T
lk
N
i
N
k
kk
T
ii
T
l
N
k
kk
N
i
N
k
kk
T
ii
N
k
kk
N
k
kk
N
i
T
ii
KK
KK
Nl
xxαα
αα
xxxxxx
xxxx
xxxx
φφλ
λ
φφαλαφφφφ
φαλαφφφ
φαλφαφφ
==
⇒=
⇒==
⇒=
⇒=
∑∑∑
∑∑∑
∑∑∑
== =
== =
===
Κ
Kernel or Gram matrix
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 74
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Kernel PCA Formulation…
αα λ=KFrom the eigen equation
And the fact that the eigenvector w is normalized to 1, we obtain:
λ
φαφα
1
1))(())((||||11
2
=
⇒=== ∑∑==
αα
ααxxw
T
T
N
i
ii
T
N
i
iiK
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 75
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
KPCA Algorithm
Step 1: Compute the Gram matrix: NjikKjiij
,,1,),,( Κ== xx
Step 2: Compute (eigenvalue, eigenvector) pairs of K: Mll
l ,,1),,( Κ=λα
Step 3: Normalize the eigenvectors:
l
l
l
λ
α
α ←
Thus, an eigenvector wl of C is now represented as: ∑=
=
N
k
k
l
k
l
1
)(xw φα
To project a test feature φ(x) onto wl we need to compute:
∑∑==
==
N
k
k
l
k
N
k
k
l
k
TlTk
11
),())(()()( xxxxwx αφαφφ
So, we never need φ explicitly
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 76
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Feature Map Centering
So far we assumed that the feature map φ(x) is centered for thedata points x1,… xN
Actually, this centering can be done on the Gram matrix without ever
explicitly computing the feature map φ(x).
)/11()/11(~
NIKNIKTT
−−=
Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44,
Max Plank Institute, 1996.
is the kernel matrix for centered features, i.e., 0)(1
=∑=
N
i
ixφ
A similar expression exist for projecting test features on the feature eigenspace
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 77
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
KPCA: USPS Digit Recognition
Scholkopf, Smola, Muller, “Nonlinear component analysis as a kernel eigenvalue problem,” Technical report #44,
Max Plank Institute, 1996.
dTyxk )(),( yx=Kernel function:
(d)
Classier: Linear SVM with features as kernel principal components
N = 3000, p = 16-by-16 image
Linear PCA
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 78
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Input points before kernel PCA
http://en.wikipedia.org/wiki/Kernel_principal_component_analysis
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 79
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
Output after kernel PCA
The three groups are distinguishable using the first component only
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 80
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
� PCA
– finds orthonormal basis for data
– Sorts dimensions in order of “importance”
– Discard low significance dimensions
� Uses:
– Get compact description
– Ignore noise
– Improve classification (hopefully)
� Not magic:
– Doesn’t know class labels
– Can only capture linear variations
� One of many tricks to reduce dimensionality!
PCA Conclusions
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 81
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
*Matrix and Vector DerivativesMatrix and vector derivatives are obtained first by element-wise derivatives
and then reforming them into matrices and vectors.
Slide credit: Tae-Kyun Kim
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 82
Machine Intelligence lab - School of Electronics Engineering - Kyungpook National University
ELEC801 Pattern Recognition
*Matrix and Vector Derivatives
Slide credit: Tae-Kyun Kim
10/23/2017Slide credit: Narasimhan, Cochran, Póczos 83