Post on 21-Dec-2015
SVD and PCASVD and PCA
COS 323, Spring 05COS 323, Spring 05
SVD and PCASVD and PCA
• Principal Components Analysis (PCA): Principal Components Analysis (PCA): approximating a high-dimensional data approximating a high-dimensional data setsetwith a lower-dimensional subspacewith a lower-dimensional subspace
Original axesOriginal axes
****
******
**** **
**
********
**
**
****** **
**** ******
Data pointsData points
First principal componentFirst principal componentSecond principal componentSecond principal component
SVD and PCASVD and PCA
• Data matrix with points as rows, take Data matrix with points as rows, take SVDSVD– Subtract out mean (“whitening”)Subtract out mean (“whitening”)
• Columns of Columns of VVkk are principal components are principal components
• Value of Value of wwii gives importance of each gives importance of each
componentcomponent
PCA on Faces: “Eigenfaces”PCA on Faces: “Eigenfaces”
AverageAveragefaceface
First principal componentFirst principal component
OtherOthercomponentscomponents
For all except average,For all except average,“gray” = 0,“gray” = 0,
“white” > 0,“white” > 0,““black” < 0black” < 0
Uses of PCAUses of PCA
• Compression: each new image can be Compression: each new image can be approximated by projection onto first approximated by projection onto first few principal componentsfew principal components
• Recognition: for a new image, project Recognition: for a new image, project onto first few principal components, onto first few principal components, match feature vectorsmatch feature vectors
PCA for RelightingPCA for Relighting
• Images under different illuminationImages under different illumination
[Matusik & McMillan][Matusik & McMillan]
PCA for RelightingPCA for Relighting
• Images under different illuminationImages under different illumination
• Most variation capturedMost variation capturedby first 5 principalby first 5 principalcomponents – cancomponents – canre-illuminate byre-illuminate bycombining onlycombining onlya few imagesa few images
[Matusik & McMillan][Matusik & McMillan]
PCA for DNA MicroarraysPCA for DNA Microarrays
• Measure gene activation under different Measure gene activation under different conditionsconditions
[Troyanskaya][Troyanskaya]
PCA for DNA MicroarraysPCA for DNA Microarrays
• Measure gene activation under different Measure gene activation under different conditionsconditions
[Troyanskaya][Troyanskaya]
PCA for DNA MicroarraysPCA for DNA Microarrays
• PCA shows patterns of correlated PCA shows patterns of correlated activationactivation– Genes with same pattern might have similar Genes with same pattern might have similar
functionfunction
[Wall et al.][Wall et al.]
PCA for DNA MicroarraysPCA for DNA Microarrays
• PCA shows patterns of correlated PCA shows patterns of correlated activationactivation– Genes with same pattern might have similar Genes with same pattern might have similar
functionfunction
[Wall et al.][Wall et al.]
Multidimensional ScalingMultidimensional Scaling
• In some experiments, can only measure In some experiments, can only measure similarity or dissimilaritysimilarity or dissimilarity– e.g., is response to stimuli similar or e.g., is response to stimuli similar or
different?different?
• Want to recover absolute positions in k-Want to recover absolute positions in k-dimensional spacedimensional space
Multidimensional ScalingMultidimensional Scaling
• Example: given pairwise distances Example: given pairwise distances between citiesbetween cities
– Want to recover locationsWant to recover locations [Pellacini et al.][Pellacini et al.]
Euclidean MDSEuclidean MDS
• Formally, let’s say we have Formally, let’s say we have nn nn matrix matrix DDconsisting of squared distances consisting of squared distances ddijij = ( = (xxi i – –
xxjj))22
• Want to recover Want to recover nn dd matrix matrix XX of of positionspositionsin in dd-dimensional space-dimensional space
)(
)(
0)()(
)(0)(
)()(0
2
1
232
231
232
221
231
221
x
x
X
xxxx
xxxx
xxxx
D
)(
)(
0)()(
)(0)(
)()(0
2
1
232
231
232
221
231
221
x
x
X
xxxx
xxxx
xxxx
D
Euclidean MDSEuclidean MDS
• Observe thatObserve that
• Strategy: convert matrix Strategy: convert matrix DD of of ddijij22 into into
matrix matrix BB of of xxiixxjj
– ““Centered” distance matrixCentered” distance matrix
– BB = = XXXXTT
2222 2)( jjiijiij xxxxxxd 2222 2)( jjiijiij xxxxxxd
Euclidean MDSEuclidean MDS
• Centering:Centering:– Sum of row Sum of row ii of of DD = sum of column = sum of column ii of of DD = =
– Sum of all entries in D =Sum of all entries in D =
jj
jjii
jjij j
iiji
xxxnx
xxxxds
22
222
2
2
jj
jjii
jjij j
iiji
xxxnx
xxxxds
22
222
2
2
2
2 22
ii
ii
ii xxnss
2
2 22
ii
ii
ii xxnss
Euclidean MDSEuclidean MDS
• Choose Choose xxii = 0 = 0– Solution will have average position at originSolution will have average position at origin
– Then,Then,
• So, to get So, to get BB::– compute row (or column) sumscompute row (or column) sums
– compute sum of sumscompute sum of sums
– apply above formula to each entry of apply above formula to each entry of DD
– Divide by –2Divide by –2
j
jj
jii xnsxnxs 222 2, j
jj
jii xnsxnxs 222 2,
jinjninij xxsssd 221112 jinjninij xxsssd 221112
Euclidean MDSEuclidean MDS
• Now have Now have BB, want to factor into , want to factor into XXXXTT
• If If XX is is nn dd, , BB must have rank must have rank dd
• Take SVD, set all but top Take SVD, set all but top dd singular singular values to 0values to 0– Eliminate corresponding columns of U and VEliminate corresponding columns of U and V
– Have Have BB33==UU33WW33VV33TT
– BB is square and symmetric, so is square and symmetric, so UU = = VV
– Take Take XX = = UU33 times square root of times square root of WW33
Multidimensional ScalingMultidimensional Scaling
• Result (Result (dd = 2): = 2):
[Pellacini et al.][Pellacini et al.]
Multidimensional ScalingMultidimensional Scaling
• Caveat: actual axes, center not necessarilyCaveat: actual axes, center not necessarilywhat you want (can’t recover them!)what you want (can’t recover them!)
• This is “classical” or “Euclidean” MDS This is “classical” or “Euclidean” MDS [Torgerson [Torgerson
52]52]
– Distance matrix assumed to be actual Euclidean Distance matrix assumed to be actual Euclidean distancedistance
• More sophisticated versions availableMore sophisticated versions available– ““Non-metric MDS”: not Euclidean distance,Non-metric MDS”: not Euclidean distance,
sometimes just sometimes just relativerelative distances distances
– ““Weighted MDS”: account for observer biasWeighted MDS”: account for observer bias