Methods for Sparse PCA - Stanford...

OutlineIntroduction

Three Methods from the LiteratureRelationships Between these Methods

Conclusions

Methods for Sparse PCA

May 4, 2012


OutlineIntroduction


Conclusions

IntroductionPrincipal Components Analysis

Three Methods from the LiteratureMaximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Relationships Between these MethodsEfficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion

Conclusions


OutlineIntroduction


Conclusions

Principal Components Analysis


Principal Components Analysis is a popular tool for exploratorydata analysis and dimension reduction in applied statistics.


OutlineIntroduction


Conclusions


Principal Components Analysis: Example


OutlineIntroduction


Conclusions


Notation

Let X be a n × p matrix with standardized columns, that is:∑pj=1 Xij = 0,

∑ni=1 X 2

ij = 1.


OutlineIntroduction


Conclusions


Three Ways to Arrive at First Principal Component

1. Maximal variance

2. Minimal reconstruction error

3. Best rank-1 approximation


OutlineIntroduction


Conclusions


Maximal Variance Approach

The first PC, v, is the direction of maximal variance:

v = argmaxvvTXTXv subject to ||v||2 = 1


OutlineIntroduction


Conclusions


Minimal Reconstruction Error Approach

The first PC, v, minimizes the reconstruction error:

(u, v) = argminu,v||X− XvuT ||2F subject to ||u||2 = ||v||2 = 1


OutlineIntroduction


Conclusions


Best Rank-1 Approximation Approach

The first PC, v, follows from the best rank-1 approximation:

(u, v, d) = argminu,v,d ||X− duvT ||2F subject to ||u||2 = ||v||2 = 1


OutlineIntroduction


Conclusions


Principal Components: Three Approaches


OutlineIntroduction


Conclusions


Sparse Principal Components Analysis

Suppose we want sparse principal components.

e.g. - Gene expression data - want to identify a sparse set of genesalong which most of the variation in the data is really taking place.


OutlineIntroduction


Conclusions


Example

From a study involving 569 elderly persons

An example of a mid-aggital brain slice, with the

corpus collosum annotated with landmarks.


OutlineIntroduction


Conclusions


Example- continued

Walking Speed

Verbal Fluency

Principal Components Sparse Principal Components

Standard and sparse principal components from a study of the corpus

callosum variation. The shape variations corresponding to significant

principal components (red curves) are overlaid on the mean CC shape

(black curves).


OutlineIntroduction


Conclusions

Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation

Three Ways to Arrive at First Sparse Principal Component

1. Maximal variance ... subject to L1 penalty

2. Minimal reconstruction error ... subject to L1 penalty

3. Best rank-1 approximation ... subject to L1 penalty


OutlineIntroduction


Conclusions



v = argmaxvvTXTXv subject to ||v||2 = 1, ||v||1 ≤ c

Citation: “SCoTLASS” method of Jolliffe et al. (2003)


OutlineIntroduction


Conclusions



1. Criterion follows naturally from maximal variance descriptionof principal components.

2. But, we are maximizing a convex function subject to apenalty... Not convex

Citation: Trendafilov and Jolliffe (2006)


OutlineIntroduction


Conclusions


Minimal Reconstruction Error Approach

(u, v) = argminu,v||X−XvuT ||2F+λ1||v||1+λ2||v||2 subject to ||u||2 = 1

Citation: “SPCA” method of Zou, Hastie, and Tibshirani (2006)

Iterative algorithm to solve for u and v.


OutlineIntroduction


Conclusions


Rank-1 Matrix Approximation

(u, v, d) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2

Citations: “Low rank matrix decomposition” of Shen and Huang (2008); “Penalizedmatrix decomposition” of Witten, Hastie, and Tibshirani (2008)

Fast iterative algorithm to solve for u and v using soft thresholding


OutlineIntroduction


Conclusions

Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion


OutlineIntroduction


Conclusions


Rank-1 Approximation leads to Maximal VarianceApproach

It is not hard to show that we can re-write the criterion for therank-1 approximation in a way that looks more like a variancecriterion:

(u, v) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2

= argmax uT Xv subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2


OutlineIntroduction


Conclusions


Rank-1 Approximation leads to Maximal VarianceApproach

Suppose we apply the Rank-1 approximation to X.

(u, v, d) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||v||1 ≤ c

Then, the solution v solves maximal variance criterion.So, rather than solving maximal variance criterion by maximizing aconvex function, we can use the quick iterative algorithm for thesparse rank-1 approximation.


OutlineIntroduction


Conclusions


Minimal Reconstruction Error as a Variance criterion

In a similar way, one can also show equivalence between minimalreconstruction error and maximal variance criterion, if we add anL1 constraint on u to the former.


OutlineIntroduction


Conclusions

Conclusions

1. There is no unique definition of sparse PCA: 3+ methodshave been proposed.

2. There exist previously unknown connections between these(seemingly different) methods; in fact, they are almostidentical!!

3. These connections have not only improved our understandingof each of the different methods, but have resulted in a newfast algorithm for a previously very difficult problem (MaximalVariance Criterion).


OutlineIntroduction


Conclusions

References

1. Jolliffe, Trendafilov, and Uddin (2003) ’A modified principal componenttechnique based on the lasso’, Journal of Computational and GraphicalStatistics 12 531-547.

2. Trendafilov and Jolliffe (2006) ’Projected gradient approach to the numericalsolution of the SCoTLASS’, Computational Statistics and Data Analysis 50242-253.

3. Zou, Hastie, and Tibshirani (2006) ’Sparse principal component analysis’Journal of Computational and Graphical Statistics 15 262-286.

4. Shen and Huang (2008) ’Sparse principal component analysis via regularized lowrank matrix approximation’ Journal of Multivariate Analysis.

5. Witten, Hastie, and Tibshirani (2008) ’A penalized matrix decomposition, withapplications to canonical correlation analysis and principal components’,Submitted.


Methods for Sparse PCA - Stanford...

Documents

Transcript of Methods for Sparse PCA - Stanford...