Methods for Sparse PCA - Stanford...
Transcript of Methods for Sparse PCA - Stanford...
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Methods for Sparse PCA
May 4, 2012
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
IntroductionPrincipal Components Analysis
Three Methods from the LiteratureMaximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation
Relationships Between these MethodsEfficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion
Conclusions
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Principal Components Analysis
Principal Components Analysis is a popular tool for exploratorydata analysis and dimension reduction in applied statistics.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Principal Components Analysis: Example
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Principal Components Analysis: Example
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Principal Components Analysis: Example
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Notation
Let X be a n × p matrix with standardized columns, that is:∑pj=1 Xij = 0,
∑ni=1 X 2
ij = 1.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Three Ways to Arrive at First Principal Component
1. Maximal variance
2. Minimal reconstruction error
3. Best rank-1 approximation
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Maximal Variance Approach
The first PC, v, is the direction of maximal variance:
v = argmaxvvTXTXv subject to ||v||2 = 1
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Minimal Reconstruction Error Approach
The first PC, v, minimizes the reconstruction error:
(u, v) = argminu,v||X− XvuT ||2F subject to ||u||2 = ||v||2 = 1
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Best Rank-1 Approximation Approach
The first PC, v, follows from the best rank-1 approximation:
(u, v, d) = argminu,v,d ||X− duvT ||2F subject to ||u||2 = ||v||2 = 1
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Principal Components: Three Approaches
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Sparse Principal Components Analysis
Suppose we want sparse principal components.
e.g. - Gene expression data - want to identify a sparse set of genesalong which most of the variation in the data is really taking place.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Example
From a study involving 569 elderly persons
An example of a mid-aggital brain slice, with the
corpus collosum annotated with landmarks.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Principal Components Analysis
Example- continued
Walking Speed
Verbal Fluency
Principal Components Sparse Principal Components
Standard and sparse principal components from a study of the corpus
callosum variation. The shape variations corresponding to significant
principal components (red curves) are overlaid on the mean CC shape
(black curves).
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation
Three Ways to Arrive at First Sparse Principal Component
1. Maximal variance ... subject to L1 penalty
2. Minimal reconstruction error ... subject to L1 penalty
3. Best rank-1 approximation ... subject to L1 penalty
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation
Maximal Variance Approach
v = argmaxvvTXTXv subject to ||v||2 = 1, ||v||1 ≤ c
Citation: “SCoTLASS” method of Jolliffe et al. (2003)
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation
Maximal Variance Approach
1. Criterion follows naturally from maximal variance descriptionof principal components.
2. But, we are maximizing a convex function subject to apenalty... Not convex
Citation: Trendafilov and Jolliffe (2006)
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation
Minimal Reconstruction Error Approach
(u, v) = argminu,v||X−XvuT ||2F+λ1||v||1+λ2||v||2 subject to ||u||2 = 1
Citation: “SPCA” method of Zou, Hastie, and Tibshirani (2006)
Iterative algorithm to solve for u and v.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Maximal Variance ApproachMinimal Reconstruction Error ApproachRank-1 Matrix Approximation
Rank-1 Matrix Approximation
(u, v, d) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2
Citations: “Low rank matrix decomposition” of Shen and Huang (2008); “Penalizedmatrix decomposition” of Witten, Hastie, and Tibshirani (2008)
Fast iterative algorithm to solve for u and v using soft thresholding
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion
Rank-1 Approximation leads to Maximal VarianceApproach
It is not hard to show that we can re-write the criterion for therank-1 approximation in a way that looks more like a variancecriterion:
(u, v) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2
= argmax uT Xv subject to ||u||2 = ||v||2 = 1, ||u||1 ≤ c1, ||v||1 ≤ c2
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion
Rank-1 Approximation leads to Maximal VarianceApproach
Suppose we apply the Rank-1 approximation to X.
(u, v, d) = argmin||X− duvT ||2F subject to ||u||2 = ||v||2 = 1, ||v||1 ≤ c
Then, the solution v solves maximal variance criterion.So, rather than solving maximal variance criterion by maximizing aconvex function, we can use the quick iterative algorithm for thesparse rank-1 approximation.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Efficient Algorithm for Maximal Variance ApproachMinimal Reconstruction Error as a Variance Criterion
Minimal Reconstruction Error as a Variance criterion
In a similar way, one can also show equivalence between minimalreconstruction error and maximal variance criterion, if we add anL1 constraint on u to the former.
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
Conclusions
1. There is no unique definition of sparse PCA: 3+ methodshave been proposed.
2. There exist previously unknown connections between these(seemingly different) methods; in fact, they are almostidentical!!
3. These connections have not only improved our understandingof each of the different methods, but have resulted in a newfast algorithm for a previously very difficult problem (MaximalVariance Criterion).
Methods for Sparse PCA
OutlineIntroduction
Three Methods from the LiteratureRelationships Between these Methods
Conclusions
References
1. Jolliffe, Trendafilov, and Uddin (2003) ’A modified principal componenttechnique based on the lasso’, Journal of Computational and GraphicalStatistics 12 531-547.
2. Trendafilov and Jolliffe (2006) ’Projected gradient approach to the numericalsolution of the SCoTLASS’, Computational Statistics and Data Analysis 50242-253.
3. Zou, Hastie, and Tibshirani (2006) ’Sparse principal component analysis’Journal of Computational and Graphical Statistics 15 262-286.
4. Shen and Huang (2008) ’Sparse principal component analysis via regularized lowrank matrix approximation’ Journal of Multivariate Analysis.
5. Witten, Hastie, and Tibshirani (2008) ’A penalized matrix decomposition, withapplications to canonical correlation analysis and principal components’,Submitted.
Methods for Sparse PCA