Canonical Correlation Analysis (CCA)

20
Canonical Correlation Canonical Correlation Analysis (CCA) Analysis (CCA)

description

Canonical Correlation Analysis (CCA). CCA. This is it! The mother of all linear statistical analysis. When ? We want to find a structural relation between a set of independent variables and a set of dependent variables. CCA. When ? (part 2) - PowerPoint PPT Presentation

Transcript of Canonical Correlation Analysis (CCA)

Page 1: Canonical Correlation Analysis (CCA)

Canonical Correlation Analysis Canonical Correlation Analysis (CCA)(CCA)

Page 2: Canonical Correlation Analysis (CCA)

CCACCA This is it!

The mother of all linear statistical analysis

When ? We want to find a structural relation between a set of independent variables and a set of

dependent variables.

and Canonical Coefficients

and Canonical Variates

ij ij

k k

f g

F G

1 11 1 12 2 1

1 11 1 12 2 1

2 21 1 22 2 2

21 1 22 2 2

1 1 2 2

1 1 2 2

First Canonical Variate Pairs

Second Canonical Variate Pairs2

p p

q q

p p

q q

m m m mp p

m m m mq q

F f x f x f x

G g x g x g x

F f x f x f x

G g x g x g x

F f x f x f x

G g x g x g x

th Canonical Variate Pairsm

T T0 and 0, i j i jF F G G i j

Page 3: Canonical Correlation Analysis (CCA)

CCACCA When ? (part 2)

1. To what extend can one set of two or more variables be predicted or “explained” by another set of two or more variables?

2. What contribution does a single variable make to the explanatory power to the set of variables to which the variable belongs?

3. What contribution does a single variable contribute to predicting or “explaining” the composite of the variables in the variable set to which the variable does not belong?

4. What different dynamics are involved in the ability of one variable set to “explain” in different ways different portions of other variable set?

5. What relative power do different canonical functions have to predict or explain relationships?

6. How stable are canonical results across samples or sample subgroups?7. How closely do obtained canonical results conform to expected canonical

results?

Page 4: Canonical Correlation Analysis (CCA)

CCACCA Assumptions

Linearity: if not, nonlinear canonical correlation analysis. Absence of multicollinearity: If not, Partial Least Squares (PLS) regression to reduce

the space. Homoscedasticity: If not, data transformation. Normality: If not, re-sampling. A lot of data: Max(p, q)20nb of pairs. Absence of outliers.

Page 5: Canonical Correlation Analysis (CCA)

CCACCA Toy example

IVsIVs DVsDVs

= X

Page 6: Canonical Correlation Analysis (CCA)

CCACCA Z score transformation

IV1IV1 DV2DV2IV1IV1 DV2DV2

= Z

Page 7: Canonical Correlation Analysis (CCA)

CCACCA Canonical Correlation Matrix

T ( 1)pp pc

cp cc

n

R R

Z Z

R R

1 1cc cp pp pc R R R R R

Page 8: Canonical Correlation Analysis (CCA)

CCACCA Relations with other subspace methods

1

Eigenvalues matrix

Eigenvectors matrix

B AV λV

λ

V

Page 9: Canonical Correlation Analysis (CCA)

CCACCA Eigenvalues and eigenvectors decomposition

R =

T T T1 1 1 2 2 2 Min( ) Min( ) Min( )p q p q p q R V V V V V V

PCA

Page 10: Canonical Correlation Analysis (CCA)

CCACCA Eigenvalues and eigenvectors decomposition

The roots of the eigenvalues are the canonical correlation values

2i ir

Page 11: Canonical Correlation Analysis (CCA)

CCACCA Significance test for the canonical correlation

2

1

11 ln , with *

2

Where, (1 ) and =nb of variables Min( , )

m

m

m ii

p qN df p q

m p q

A significant output indicates that there is a variance share between IV and DV sets Procedure:

We test for all the variables (m=1,…,min(p,q)) If significant, we removed the first variable (canonical correlate) and test for the remaining ones (m=2,…, min(p,q) Repeat

Page 12: Canonical Correlation Analysis (CCA)

CCACCA Significance test for the canonical correlation

2

1

11 ln , with *

2

Where, (1 ) and =nb of variables Min( , )

m

m

m ii

p qN df p q

m p q

Since all canonical variables are significant, we will keep them all.

Page 13: Canonical Correlation Analysis (CCA)

CCACCA Canonical Coefficients

Analogous to regression coefficients

T1/ 2y yy y

B R V

Eigenvectors Correlation matrix of the

dependant variables

1

1

1

1,

where

x xx xy y

c

c cm

c

c cm

r r

r r

B R R Br

r

BY=

Bx=

Page 14: Canonical Correlation Analysis (CCA)

CCACCA Canonical Variates

Analogous to regression coefficients

y y X Z B

x x Y Z B

Page 15: Canonical Correlation Analysis (CCA)

CCACCA Loading matrices

Matrices of correlations between the variables and the canonical coefficients

x xx xA R B y yy yA R B

Ax

Ay

Page 16: Canonical Correlation Analysis (CCA)

CCACCA Loadings and canonical correlations for both canonical variate pairs

Only coefficient higher than |0.3| are interpreted.

Loading Canonical correlation

Page 17: Canonical Correlation Analysis (CCA)

CCACCA Proportion of variance extracted

How much variance does each of the canonical variates extract form the variables on its own side of the equation?

2

1

pixc

xci

apv

p

2

1

qiyc

yci

apv

p

First

Second

First

Second

Page 18: Canonical Correlation Analysis (CCA)

CCACCA Redundancy

How much variance the canonical variates form the IVs extract from the DVs, and vice versa.

2crd pv r

Eigenvaluesrdyx

Page 19: Canonical Correlation Analysis (CCA)

CCACCA Redundancy

How much variance the canonical variates form the IVs extract from the DVs, and vice versa.

Summary

The first canonical variate from IVs extract 40% of the variance in the y variable. The second canonical variate form IVs extract 30% of the variance in the y variable. Together they extract 70% of the variance in the DVs.

The first canonical variate from DVs extract 49% of the variance in the x variable. The second canonical variate form DVs extract 24% of the variance in the x variable. Together they extract 73% of the variance in the IVs.

Page 20: Canonical Correlation Analysis (CCA)

CCACCA Rotation

A rotation does not influence the variance proportion or the redundancy.

= Loading matrix =