Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H...

32
Lyle Ungar, University of Pennsylvania PCA, PCR, CCA and friends Barry Slaff modified by Lyle Ungar Based on: CIS 520 Wiki Slides by Jia Li (PSU) Works cited throughout

Transcript of Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H...

Page 1: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle Ungar, University of Pennsylvania

PCA, PCR, CCA and friends

Barry Slaff modified by Lyle Ungar

Based on:

CIS 520 Wiki Slides by Jia Li (PSU)

Works cited throughout

Page 2: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Overview u  Ordinary Least Squares (OLS) Regression: Finds the projection

direction for which the x’s are maximally correlated with the y’s u  PCA: Finds projection directions of the x’s with maximal

covariance u  Principal Component Regression (PCR): Does PCA for

dimensionality reduction on X, and then OLS using PC features. l  Regularization with Ridge Regression vs. PCR.

u  Canonical Covariance Analysis: Finds the projection directions of X and Y that maximize their covariance. l  Related to Partial Least Squares (PLS)

u  Canonical Correlation Analysis (CCA): Finds the projection directions of X and Y that maximize their correlation.

u  All use SVD to minimize reconstruction error or maximize variance/covariance

Page 3: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Singular Value Decomposition u  Singular value decomposition of matrix X (n x p)

l  X = UDVT

u  U: orthogonal, UTU=I (n x n) l  Columns of U are the left singular vectors of X.

u  D: diagonal (n x p) l  Diagonal elements of D are the singular values of X. l  All non-negative; in decreasing order of magnitude down the

diagonal. u  V: orthogonal, VTV=I (p x p)

l  Columns of V are the right singular vectors of X.

Page 4: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Thin SVD

Page 5: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Singular Value Decomposition

Page 6: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Principal Component Analysis

Page 7: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Principal Component Analysis

Page 8: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Principal Component Analysis

= 100 λi / Σi λi

Page 9: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCA

True or false: If X is any matrix, and X has singular value decomposition

X = UDVT

then the principal component scores for X are the columns of Z = UD.

(a)   True (b)  False

Page 10: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCA

If X is mean-centered, then PCA finds…? (a)   Eigenvectors of XTX (b)  Right singular vectors of X (c)   Projection directions of maximum covariance of X (d)  All of the above

Page 11: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCA:ReconstructionProblem

Page 12: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Sparse PCA

Page 13: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCR: Principal Component Regression

PCR has two steps: 1.  Do a PCA for dimensionality reduction of X 2.  Do OLS regression using the PC features, Z, usually with

feature selection.

Page 14: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCR: Principal Component Regression

Page 15: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCR: Principal Component Regression

Page 16: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Ridge Regression in terms of SVD

λi here are singular values, not eigenvalues

Page 17: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

OLS vs. Ridge vs. PCR

Page 18: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Ridge Shrinkage

Page 19: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Ridge Shrinkage Example Suppose X, Y have a unique OLS solution.

Suppose X = UDVT and the nonzero singular values are 5, 4, 3, 2, and 1.

•  What are the nonzero eigenvalues of XXT?

•  When constructing the hat matrix, how are these eigenvalues shrunk by PCR?

•  When constructing the hat matrix, how are these eigenvalues shrunk by Ridge?

Page 20: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Covariance Analysis If Y is high-dimensional, we might want to do dimension reduction for both Y and X. Canonical covariance analysis finds the projection directions for both X and Y to maximize their covariance.

or to best reconstruct X from Y and to reconstruct Y for X (Comparison: PCA finds the projection directions of maximum covariance for X with itself.) This is one type of Partial Least Squares (PLS), which find projections of x that explain all the y’s.

Page 21: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PLS Finds the projection directions of maximum covariance for X and Y. Project Xc down to T. Project Yc down to U T and U are k-dimensional bases for X and Y, respectively “Inner model”: regress U on T

One scalar regression weight per pair ui, vi. Final model: to predict Y from X

Project each new x down into T-space Predict u’s based on t’s (inner model) Project each u up to each final y-hat.

*PLS can refer to many similar algorithms.

Page 22: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Covariance Analysis (PLS)

Page 23: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Covariance Analysis (PLS)

Page 24: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCR and PLS Feature Scores

principal component regression uses…? canonical covariance (PLS regression) uses…? (a)  The X matrix only (b) The Y matrix only (c)  Both the X and Y matrices

Page 25: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

OLS vs PCR vs PLS Suppose I have a data set with

p = 400 features, n = 100 observations

Then use: a)  Ordinary least squares (OLS) regression b)  Ridge regression c)  Principal component regression (PCR) d)  Partial least squares regression (PLS)

Page 26: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Correlation Analysis

Page 27: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Correlation Analysis

Page 28: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Correlation Analysis

Page 29: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Canonical Correlation Analysis

Page 30: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCA vs. CCA vs. PLS

Bie et al: http://www.ofai.at/~roman.rosipal/Papers/eig_book04.pdf

Page 31: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

PCA, PLS, CCA, MLR

From:Borga,M.2001.https://www.cs.cmu.edu/~tom/10701_sp11/slides/CCA_tutorial.pdf

Page 32: Barry Slaff modified by Lyle Ungar - Penn Engineeringcis520/lectures/CCA_and_friends.pdf · Lyle H Ungar, University of Pennsylvania Singular Value Decomposition u Singular value

Lyle H Ungar, University of Pennsylvania

Recap u  OLS: Finds the projection direction for which the x’s are maximally

correlated with the y’s u  PCA: Finds projection directions of the x’s with maximal covariance

l  SVD of X’X u  Principal Component Regression (PCR): Do PCA on X, and then

OLS using PC features. l  PCR zeros small eigenvectors; Ridge regression shrinks them all

u  Canonical Covariance Analysis: Finds the projection directions of X and Y that maximize their covariance. l  SVD of Y’X a form of Partial Least Squares (PLS)

u  Canonical Correlation Analysis (CCA): Finds the projection directions of X and Y that maximize their correlation. l  SVD of (X’X)-1/2X’Y(Y’Y) -1/2 l  The whitening makes it scale invariant

All minimize reconstruction error and maximize variance/covariance