Sparse Generalized Principal Components Analysis with ...

33
Sparse Generalized Principal Components Analysis with Applications to Neuroimaging Genevera I. Allen Department of Statistics, Rice University, Department of Pediatrics-Neurology, Baylor College of Medicine, & Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital. March 11, 2013 G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 1 / 24

Transcript of Sparse Generalized Principal Components Analysis with ...

Page 1: Sparse Generalized Principal Components Analysis with ...

Sparse Generalized Principal Components Analysis withApplications to Neuroimaging

Genevera I. Allen

Department of Statistics, Rice University,Department of Pediatrics-Neurology, Baylor College of Medicine,

& Jan and Dan Duncan Neurological Research Institute, Texas Children’s Hospital.

March 11, 2013

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 1 / 24

Page 2: Sparse Generalized Principal Components Analysis with ...

1 Motivation

2 Generalized PCA and Sparse GPCA

3 Results

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 2 / 24

Page 3: Sparse Generalized Principal Components Analysis with ...

Review: Principal Components Analysis

Principal Components Analysis (PCA):

Dimension reduction.

Exploratory data analysis.

PCA Problem:

maximizevk

Var(Xvk) = vTk XT Xvk

subject to vTk vk = 1 & vT

k vk ′ = 0 ∀ k ′ < k.

PC: zk = Xvk .

Given by the singular value decomposition (SVD) of the data matrix:X = UDVT , then Z = XV.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 3 / 24

Page 4: Sparse Generalized Principal Components Analysis with ...

When does PCA (SVD) fail?

1 High-dimensional data.I Fix: Sparsity - Sparse PCA (Johnstone and Lu, 2004).

2 Structured Factors.I Fix: Smoothness - Functional PCA (Rice and Silverman, 1991).I Fix: Sparsity - Sparse PCA (Jolliffe et. al, 2003).

3 Strong dependencies among row and/or column variables? StructuredData?

I Transposable Data: Dependencies among the rows and/or column of adata matrix.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 4 / 24

Page 5: Sparse Generalized Principal Components Analysis with ...

When does PCA (SVD) fail?

1 High-dimensional data.I Fix: Sparsity - Sparse PCA (Johnstone and Lu, 2004).

2 Structured Factors.I Fix: Smoothness - Functional PCA (Rice and Silverman, 1991).I Fix: Sparsity - Sparse PCA (Jolliffe et. al, 2003).

3 Strong dependencies among row and/or column variables? StructuredData?

I Transposable Data: Dependencies among the rows and/or column of adata matrix.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 4 / 24

Page 6: Sparse Generalized Principal Components Analysis with ...

PCA in Neuroimaging

Multivariate analysis techniques used for finding Regions of Interest andActivation Patterns, understanding Functional Connectivity, but . . .

(Viviani et. al, 2005)

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 5 / 24

Page 7: Sparse Generalized Principal Components Analysis with ...

PCA and Correlated Noise

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 6 / 24

Page 8: Sparse Generalized Principal Components Analysis with ...

PCA and Correlated Noise

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 6 / 24

Page 9: Sparse Generalized Principal Components Analysis with ...

StarPlus fMRI Data

StarPlus Data (Subject 04847): (Mitchell et al., 2004)

Task: Object identification.

20 tasks in which sentence agrees withimage.

20 tasks in which sentence opposes image.

Each task lasted 27 seconds (55 timepoints).

Images: 64× 64× 8.

Data Set: 4,698 voxels × 40 tasks × 55time points.

Goal: Use pattern recognition techniques tofind regions of interest and activation patternsrelated to object identification.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 7 / 24

Page 10: Sparse Generalized Principal Components Analysis with ...

Starplus PCA Results

Classical PCA:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 8 / 24

Page 11: Sparse Generalized Principal Components Analysis with ...

Starplus PCA Results

Sparse PCA:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 8 / 24

Page 12: Sparse Generalized Principal Components Analysis with ...

Objectives

Goals1 Incorporate known noise structure and/or dependencies into PCA

problems.

2 Develop a framework for regularization of PCA factors.

3 Provide computationally feasible solutions and algorithms in ultrahigh-dimensional settings.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 9 / 24

Page 13: Sparse Generalized Principal Components Analysis with ...

1 Motivation

2 Generalized PCA and Sparse GPCA

3 Results

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 10 / 24

Page 14: Sparse Generalized Principal Components Analysis with ...

PCA Model

Xn×p =K∑

k=1

dk uk vTk +En×p .

Random: dk & Fixed: U = [u1, . . .uK ] and V = [v1, . . . vK ].

Independent Noise: Eijiid∼ (0, σ2), or Cov(vec(E)) = σ2I(p) ⊗ I(n).

SVD Loss Function: ||X−UDVT ||2F .

|| · ||F is the Frobenius norm (sums of squared errors).

Error terms weighted equally.

Cross-product errors between elements ij and i ′j ′ are ignored.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 11 / 24

Page 15: Sparse Generalized Principal Components Analysis with ...

Our Generative Model

Xn×p =K∑

k=1

dk uk vTk +En×p .

Random: dk & Fixed: U = [u1, . . .uK ] and V = [v1, . . . vK ].

Noise: Two-way (separable) dependencies:

Cov(vec(E)) = ∆⊗Σ,

with ∆ ∈ <p×p the column covariance and Σ ∈ <n×n the rowcovariance.

∆⊗Σ =

0BBB@∆11 Σ ∆12 Σ . . . ∆1p Σ∆21 Σ ∆22 Σ

.... . .

...∆p1 Σ . . . ∆pp Σ

1CCCASignal factors assumed to be orthogonal to the noise covariance:UT ΣU = I & VT ∆V = I.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 12 / 24

Page 16: Sparse Generalized Principal Components Analysis with ...

GPCA Optimization Problem

SVD Problem

minimizeU,D,V

||X−UDVT ||2F

subject to UT U = I(K), VT V = I(K) & diag(D) ≥ 0.

How can we modify the Frobenius norm?

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 13 / 24

Page 17: Sparse Generalized Principal Components Analysis with ...

GPCA Optimization Problem

Generalized Least Squares Matrix Decomposition (GMD / GPCA)Problem

minimizeU,D,V

||X−UDVT ||2Q,R

subject to UT QU = I(K), VT RV = I(K) & diag(D) ≥ 0.

Quadratic Operators: Q ∈ <n×n and R ∈ <p×p positive semi-definite.

Q,R-norm: ||X ||Q,R =

√tr(QXRXT ).

Generalization of the Frobenius norm: If Q,R = I, then GMDequivalent to the SVD.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 13 / 24

Page 18: Sparse Generalized Principal Components Analysis with ...

Quadratic Operators

Interpretations:1 Matrix-variate Normal:

I ||X−UDVT ||2Q,R ∝ `n,p(UDVT ,Q−1,R−1).I Q and R behave like inverse row and column covariances.

2 Covariance Decomposition:Under certain model assumptions:

Cov(vec(X)) =∑

Var(dk)(vk vTk )⊗ (uk uT

k ) + R⊗Q .

where VT RV = I and UT QU = I.

3 Smoothing Matrices: Factors U and V as smooth as the smallesteigenvectors of Q and R.

4 Weighting Matrices: Up-weight cross-product errors in the lossaccording to the covariance between variables.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 14 / 24

Page 19: Sparse Generalized Principal Components Analysis with ...

Quadratic Operators

Classes of Quadratic Operators:1 Model-Based Operators.

I Random Field Covariances.I Temporal Processes Covariance

or Inverse Covariances.I Gaussian Markov Random

Fields.

2 Smoothing Operators.I Functional Data Analysis.

3 Graphical Operators.I Graph Laplacians.

Spatial Graphical Operator

Temporal Smoothing Operator

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 14 / 24

Page 20: Sparse Generalized Principal Components Analysis with ...

GPCA Solution

GMD Solution

Let X = Q1/2 XR1/2 and X = UDVT

be the SVD of X. Then the GMDsolution, X = U∗ D∗(V∗)T , is given by:

U∗ = Q−1/2

U, V∗ = R−1/2

V, & D∗ = D.

U∗ sample GPCs (scores) and V∗ GPCA loadings (directions)

Alternative Computational Approaches:

Linear algebra tricks when n << p.

Deflation via the Generalized Power Method: Performs alternatinggeneralized least squares regression.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 15 / 24

Page 21: Sparse Generalized Principal Components Analysis with ...

Regularized GPCA

Regularized GPCA Optimization Problem

maximizev,u

uT QXRv−λv P1(v)− λu P2(u)

subject to uT Qu ≤ 1 & vT Rv ≤ 1.

Theorem: If P() is any norm or semi-norm, then factor-wise solutionsgiven by solving a penalized regression problem and re-scaling.

Options: P(x) = ||x ||1 - sparsity, group sparsity, `q balls, totalvariation, and etc.

Multiple factors computed via deflation.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 16 / 24

Page 22: Sparse Generalized Principal Components Analysis with ...

1 Motivation

2 Generalized PCA and Sparse GPCA

3 Results

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 17 / 24

Page 23: Sparse Generalized Principal Components Analysis with ...

Starplus PCA Results

Classical PCA:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 18 / 24

Page 24: Sparse Generalized Principal Components Analysis with ...

Starplus PCA Results

Sparse PCA:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 18 / 24

Page 25: Sparse Generalized Principal Components Analysis with ...

Starplus GPCA Results

Generalized PCA:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 19 / 24

Page 26: Sparse Generalized Principal Components Analysis with ...

Starplus GPCA Results

Sparse Generalized PCA:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 19 / 24

Page 27: Sparse Generalized Principal Components Analysis with ...

Starplus GPCA Results

Results:

Identified “Ventral Stream” (brainregions associated with objectidentification).

Anatomical Regions:I Bilateral occipital.I Left-lateralized inferior temporal.I Inferior frontal.

(Pennick & Kana, 2012)

SGPCA 1, Axial Slice 2

SGPCA 1, Axial Slice 3

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 19 / 24

Page 28: Sparse Generalized Principal Components Analysis with ...

PCA Comparisons

Extent of dimension reduction achieved:

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 20 / 24

Page 29: Sparse Generalized Principal Components Analysis with ...

Other Extensions

Non-negative (and sparse) GPCA.

Tensor GPCA or Higher-Order GPCA.I Based on the Tucker Decomposition.

Sparse Higher-Order GPCA.I Computes sequential rank-one

decompositions that are a relaxation ofthe CANDENCOMP/PARAFACdecomposition.

Applications: Multi-subject neuroimaging data.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 21 / 24

Page 30: Sparse Generalized Principal Components Analysis with ...

Concluding Remarks

When to use GPCA vs. PCA:

Structured Data (variables are associated with a specific location).

Smooth or functional data.

Data with low signal to noise ratio.

Future Work:

Statistical Work: How to choose Q and R, the rank of thedecomposition, the level of sparsity, and consistency studies.

Applications in Neuroimaging: Comparisons to ICA methods.

Other Applications: Genomcis, proteomics, image data, time seriesand longitudinal data, spatio-temporal data, climate studies, remotesensing.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 22 / 24

Page 31: Sparse Generalized Principal Components Analysis with ...

Concluding Remarks

R Package & Matlab Toolbox

Coming Soon . . .

Code available from www.stat.rice.edu∼/gallen.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 22 / 24

Page 32: Sparse Generalized Principal Components Analysis with ...

Acknowledgments

Funding:

National Science Foundation DMS-1209017

Collaborators:

Logan Grosenick, Center for Mind and Brain, Stanford University.

Jonathan Taylor, Statistics, Stanford University.

Mirjana Maletic-Savatic, Jan and Dan Duncan Neurological ResearchInstitute & Baylor College of Medicine.

Frederick Campbell, PhD Candidate, Statistics, Rice University.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 23 / 24

Page 33: Sparse Generalized Principal Components Analysis with ...

References

G. I. Allen, L. Grosenick & J. Taylor, A generalized least squares decomposition, arXiv:

1102.3074, Rice University Technical Report No. TR2011-03, 2011.

G. I. Allen, Regularized tensor decompositions and higher-order principal components analysis,

arXiv:1202.2476, 2012.

G. I. Allen, Sparse higher-order principal components analysis, In Artificial Intelligence and

Statistics, 2012.

G. I. Allen & M. Maletic-Savatic, Sparse non-negative generalized PCA with applications to

metabolomics, Bioinformatics, 27:21, 3029-3035, 2011.

F. D. Campbell & G. I. Allen, Algorithms and approaches for analyzing massive structured data

with Sparse Generalized PCA, In Preparation.

G. I. Allen, C. Peterson, M. Vannucci, and M. Maletic-Savatic, ”Regularized Partial Least

Squares with an Application to NMR Spectroscopy”, (To Appear) Statistical Analysis and Data

Mining, 2013.

G. I. Allen (BCM & Rice) Sparse GPCA March 11, 2013 24 / 24