Sparse Generalized Principal Components Analysis with ...
Transcript of Sparse Generalized Principal Components Analysis with ...
Sparse Generalized Principal Components Analysis withApplications to Neuroimaging
Genevera I. Allen
Department of Statistics, Rice University,Department of Pediatrics-Neurology, Baylor College of Medicine,
& Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital.
March 11, 2013
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 1 / 24
1 Motivation
2 Generalized PCA and Sparse GPCA
3 Results
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 2 / 24
Review: Principal Components Analysis
Principal Components Analysis (PCA):
Dimension reduction.
Exploratory data analysis.
PCA Problem:
maximizevk
Var(Xvk) = vTk XT Xvk
subject to vTk vk = 1 & vT
k vk ′ = 0 ∀ k ′ < k.
PC: zk = Xvk .
Given by the singular value decomposition (SVD) of the data matrix:X = UDVT , then Z = XV.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 3 / 24
When does PCA (SVD) fail?
1 High-dimensional data.I Fix: Sparsity - Sparse PCA (Johnstone and Lu, 2004).
2 Structured Factors.I Fix: Smoothness - Functional PCA (Rice and Silverman, 1991).I Fix: Sparsity - Sparse PCA (Jolliffe et. al, 2003).
3 Strong dependencies among row and/or column variables? StructuredData?
I Transposable Data: Dependencies among the rows and/or column of adata matrix.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 4 / 24
When does PCA (SVD) fail?
1 High-dimensional data.I Fix: Sparsity - Sparse PCA (Johnstone and Lu, 2004).
2 Structured Factors.I Fix: Smoothness - Functional PCA (Rice and Silverman, 1991).I Fix: Sparsity - Sparse PCA (Jolliffe et. al, 2003).
3 Strong dependencies among row and/or column variables? StructuredData?
I Transposable Data: Dependencies among the rows and/or column of adata matrix.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 4 / 24
PCA in Neuroimaging
Multivariate analysis techniques used for finding Regions of Interest andActivation Patterns, understanding Functional Connectivity, but . . .
(Viviani et. al, 2005)
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 5 / 24
PCA and Correlated Noise
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 6 / 24
PCA and Correlated Noise
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 6 / 24
StarPlus fMRI Data
StarPlus Data (Subject 04847): (Mitchell et al., 2004)
Task: Object identification.
20 tasks in which sentence agrees withimage.
20 tasks in which sentence opposes image.
Each task lasted 27 seconds (55 timepoints).
Images: 64× 64× 8.
Data Set: 4,698 voxels × 40 tasks × 55time points.
Goal: Use pattern recognition techniques tofind regions of interest and activation patternsrelated to object identification.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 7 / 24
Starplus PCA Results
Classical PCA:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 8 / 24
Starplus PCA Results
Sparse PCA:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 8 / 24
Objectives
Goals1 Incorporate known noise structure and/or dependencies into PCA
problems.
2 Develop a framework for regularization of PCA factors.
3 Provide computationally feasible solutions and algorithms in ultrahigh-dimensional settings.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 9 / 24
1 Motivation
2 Generalized PCA and Sparse GPCA
3 Results
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 10 / 24
PCA Model
Xn×p =K∑
k=1
dk uk vTk +En×p .
Random: dk & Fixed: U = [u1, . . .uK ] and V = [v1, . . . vK ].
Independent Noise: Eijiid∼ (0, σ2), or Cov(vec(E)) = σ2I(p) ⊗ I(n).
SVD Loss Function: ||X−UDVT ||2F .
|| · ||F is the Frobenius norm (sums of squared errors).
Error terms weighted equally.
Cross-product errors between elements ij and i ′j ′ are ignored.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 11 / 24
Our Generative Model
Xn×p =K∑
k=1
dk uk vTk +En×p .
Random: dk & Fixed: U = [u1, . . .uK ] and V = [v1, . . . vK ].
Noise: Two-way (separable) dependencies:
Cov(vec(E)) = ∆⊗Σ,
with ∆ ∈ <p×p the column covariance and Σ ∈ <n×n the rowcovariance.
∆⊗Σ =
0BBB@∆11 Σ ∆12 Σ . . . ∆1p Σ∆21 Σ ∆22 Σ
.... . .
...∆p1 Σ . . . ∆pp Σ
1CCCASignal factors assumed to be orthogonal to the noise covariance:UT ΣU = I & VT ∆V = I.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 12 / 24
GPCA Optimization Problem
SVD Problem
minimizeU,D,V
||X−UDVT ||2F
subject to UT U = I(K), VT V = I(K) & diag(D) ≥ 0.
How can we modify the Frobenius norm?
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 13 / 24
GPCA Optimization Problem
Generalized Least Squares Matrix Decomposition (GMD / GPCA)Problem
minimizeU,D,V
||X−UDVT ||2Q,R
subject to UT QU = I(K), VT RV = I(K) & diag(D) ≥ 0.
Quadratic Operators: Q ∈ <n×n and R ∈ <p×p positive semi-definite.
Q,R-norm: ||X ||Q,R =
√tr(QXRXT ).
Generalization of the Frobenius norm: If Q,R = I, then GMDequivalent to the SVD.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 13 / 24
Quadratic Operators
Interpretations:1 Matrix-variate Normal:
I ||X−UDVT ||2Q,R ∝ `n,p(UDVT ,Q−1,R−1).I Q and R behave like inverse row and column covariances.
2 Covariance Decomposition:Under certain model assumptions:
Cov(vec(X)) =∑
Var(dk)(vk vTk )⊗ (uk uT
k ) + R⊗Q .
where VT RV = I and UT QU = I.
3 Smoothing Matrices: Factors U and V as smooth as the smallesteigenvectors of Q and R.
4 Weighting Matrices: Up-weight cross-product errors in the lossaccording to the covariance between variables.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 14 / 24
Quadratic Operators
Classes of Quadratic Operators:1 Model-Based Operators.
I Random Field Covariances.I Temporal Processes Covariance
or Inverse Covariances.I Gaussian Markov Random
Fields.
2 Smoothing Operators.I Functional Data Analysis.
3 Graphical Operators.I Graph Laplacians.
Spatial Graphical Operator
Temporal Smoothing Operator
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 14 / 24
GPCA Solution
GMD Solution
Let X = Q1/2 XR1/2 and X = UDVT
be the SVD of X. Then the GMDsolution, X = U∗ D∗(V∗)T , is given by:
U∗ = Q−1/2
U, V∗ = R−1/2
V, & D∗ = D.
U∗ sample GPCs (scores) and V∗ GPCA loadings (directions)
Alternative Computational Approaches:
Linear algebra tricks when n << p.
Deflation via the Generalized Power Method: Performs alternatinggeneralized least squares regression.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 15 / 24
Regularized GPCA
Regularized GPCA Optimization Problem
maximizev,u
uT QXRv−λv P1(v)− λu P2(u)
subject to uT Qu ≤ 1 & vT Rv ≤ 1.
Theorem: If P() is any norm or semi-norm, then factor-wise solutionsgiven by solving a penalized regression problem and re-scaling.
Options: P(x) = ||x ||1 - sparsity, group sparsity, `q balls, totalvariation, and etc.
Multiple factors computed via deflation.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 16 / 24
1 Motivation
2 Generalized PCA and Sparse GPCA
3 Results
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 17 / 24
Starplus PCA Results
Classical PCA:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 18 / 24
Starplus PCA Results
Sparse PCA:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 18 / 24
Starplus GPCA Results
Generalized PCA:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 19 / 24
Starplus GPCA Results
Sparse Generalized PCA:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 19 / 24
Starplus GPCA Results
Results:
Identified “Ventral Stream” (brainregions associated with objectidentification).
Anatomical Regions:I Bilateral occipital.I Left-lateralized inferior temporal.I Inferior frontal.
(Pennick & Kana, 2012)
SGPCA 1, Axial Slice 2
SGPCA 1, Axial Slice 3
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 19 / 24
PCA Comparisons
Extent of dimension reduction achieved:
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 20 / 24
Other Extensions
Non-negative (and sparse) GPCA.
Tensor GPCA or Higher-Order GPCA.I Based on the Tucker Decomposition.
Sparse Higher-Order GPCA.I Computes sequential rank-one
decompositions that are a relaxation ofthe CANDENCOMP/PARAFACdecomposition.
Applications: Multi-subject neuroimaging data.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 21 / 24
Concluding Remarks
When to use GPCA vs. PCA:
Structured Data (variables are associated with a specific location).
Smooth or functional data.
Data with low signal to noise ratio.
Future Work:
Statistical Work: How to choose Q and R, the rank of thedecomposition, the level of sparsity, and consistency studies.
Applications in Neuroimaging: Comparisons to ICA methods.
Other Applications: Genomcis, proteomics, image data, time seriesand longitudinal data, spatio-temporal data, climate studies, remotesensing.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 22 / 24
Concluding Remarks
R Package & Matlab Toolbox
Coming Soon . . .
Code available from www.stat.rice.edu∼/gallen.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 22 / 24
Acknowledgments
Funding:
National Science Foundation DMS-1209017
Collaborators:
Logan Grosenick, Center for Mind and Brain, Stanford University.
Jonathan Taylor, Statistics, Stanford University.
Mirjana Maletic-Savatic, Jan and Dan Duncan Neurological ResearchInstitute & Baylor College of Medicine.
Frederick Campbell, PhD Candidate, Statistics, Rice University.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 23 / 24
References
G. I. Allen, L. Grosenick & J. Taylor, A generalized least squares decomposition, arXiv:
1102.3074, Rice University Technical Report No. TR2011-03, 2011.
G. I. Allen, Regularized tensor decompositions and higher-order principal components analysis,
arXiv:1202.2476, 2012.
G. I. Allen, Sparse higher-order principal components analysis, In Artificial Intelligence and
Statistics, 2012.
G. I. Allen & M. Maletic-Savatic, Sparse non-negative generalized PCA with applications to
metabolomics, Bioinformatics, 27:21, 3029-3035, 2011.
F. D. Campbell & G. I. Allen, Algorithms and approaches for analyzing massive structured data
with Sparse Generalized PCA, In Preparation.
G. I. Allen, C. Peterson, M. Vannucci, and M. Maletic-Savatic, ”Regularized Partial Least
Squares with an Application to NMR Spectroscopy”, (To Appear) Statistical Analysis and Data
Mining, 2013.
G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 24 / 24