PCR, CCA, and other pattern-based regression techniques
Transcript of PCR, CCA, and other pattern-based regression techniques
![Page 1: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/1.jpg)
PCR, CCA, and other pattern-basedregression techniques
Michael K. Tippett
International Research Institute for Climate and SocietyThe Earth Institute, Columbia University
Statistical Methods in Seasonal Prediction, ICTPAug 2-13, 2010
![Page 2: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/2.jpg)
Principal component analysis and regression
Key idea of PCA: data compressionI Many (colinear) variables are replaced by a few new
variables.I The new variables are optimally chosen to approximate the
original variables.
[Assumption: “important” components have large variance]
How is PCA useful in regression problems?
![Page 3: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/3.jpg)
Pop quiz
I What is the variance of a PC (time-series)?I What is the correlation between PCs?
Fact about regressionI (Easy) If y = ax is regression between x and y , and x and
y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data
transform the regression coefficients the same way.
y = Ax
y ′ = Ly , x ′ = Mx
y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′
(LAM−1) = regression coefficient matrix between x ′ and y ′
![Page 4: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/4.jpg)
Pop quiz
I What is the variance of a PC (time-series)?I What is the correlation between PCs?
Fact about regressionI (Easy) If y = ax is regression between x and y , and x and
y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data
transform the regression coefficients the same way.
y = Ax
y ′ = Ly , x ′ = Mx
y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′
(LAM−1) = regression coefficient matrix between x ′ and y ′
![Page 5: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/5.jpg)
Pop quiz
I What is the variance of a PC (time-series)?I What is the correlation between PCs?
Fact about regressionI (Easy) If y = ax is regression between x and y , and x and
y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data
transform the regression coefficients the same way.
y = Ax
y ′ = Ly , x ′ = Mx
y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′
(LAM−1) = regression coefficient matrix between x ′ and y ′
![Page 6: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/6.jpg)
Pop quiz
I What is the variance of a PC (time-series)?I What is the correlation between PCs?
Fact about regressionI (Easy) If y = ax is regression between x and y , and x and
y have unit variance, what does a measure?I (Hard) Linear (invertible) transformations of the data
transform the regression coefficients the same way.
y = Ax
y ′ = Ly , x ′ = Mx
y = Ax → y ′ = Ly = LAx = LAM−1Mx = (LAM−1)x ′
(LAM−1) = regression coefficient matrix between x ′ and y ′
![Page 7: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/7.jpg)
Example: Philippines
Problem: predict gridded April-June precipitation over thePhilippines from proceeding (January-March) SST.
100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
SST anomaly JFM 1971
−1.6
−1.2
−0.8
−0.4
0
0.4
0.8
1.2
1.6
115E 120E 125E 130E
5N
10N
15N
20N
Rainfall anomalies AMJ 1971
−8
−6
−4
−2
0
2
4
6
8
![Page 8: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/8.jpg)
Example: Philippines
Problem: predict gridded April-June precipitation over thePhilippines from proceeding (January-March) sea surfacetemperature.
Details:I Data from 1971-2007 (37 years).I 194 precipitation gridpoints.I 1378 SST gridpoints.
What is the problem?
![Page 9: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/9.jpg)
PCA and regression
For climate forecasts, the length of the historical recordseverely limits the number of predictors
If the predictors are spatial fields such as SST or the output of aGCM, the number of grid point values (100’s, 1000’s) is largecompared to the number time samples (10’s for climate)
Need to represent the information in the predictor spatial fieldusing fewer numbers.
I Spatial averages e.g., NINO 3.4.I Principal component analysis (PCA).
I Weighted spatial average.I Weights are chosen in an optimal manner to maximize
explained variance.
![Page 10: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/10.jpg)
Example: PCA of SST
EOF 1 – Correlation with NINO 3.4 = -0.96
100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
EOF 1 variance explained =50%
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1975 1980 1985 1990 1995 2000 2005
−101
![Page 11: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/11.jpg)
Example: PCA of SST
EOF 2
1975 1980 1985 1990 1995 2000 2005−2−1
01
100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
EOF 2 variance explained =16%
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
![Page 12: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/12.jpg)
Example: PCA of SST
EOF 3
1975 1980 1985 1990 1995 2000 2005−2−1
01
100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
EOF 3 variance explained =11%
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
![Page 13: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/13.jpg)
Principal component regression
PCRI y = a1x1 + a2x2 + . . . amxm + bI Predictors xi are PCs.
In this example:I y = observed precipitation at a gridpoint.I PCs of SST anomalies.
How many PCs to use?
Predictor selection problem.
![Page 14: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/14.jpg)
Principal component regression
PCRI y = a1x1 + a2x2 + . . . amxm + bI Predictors xi are PCs.
In this example:I y = observed precipitation at a gridpoint.I PCs of SST anomalies.
How many PCs to use?
Predictor selection problem.
![Page 15: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/15.jpg)
Example: Philippines
Two models: climatology or ENSO PC as predictor.
Use AIC to select model.
115E 120E 125E 130E
5N
10N
15N
20N
AIC selected model
CL
ENSO
![Page 16: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/16.jpg)
Example: Philippines
I This modelseems to havesome skill(cross-validated)
I Why the negativecorrelation?[Later]
I How are the twoskill measuresrelatedin-sample?
115E 120E 125E 130E
5N
10N
15N
20N
correlation
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
115E 120E 125E 130E
5N
10N
15N
20N
1−error var/clim var
0
0.1
0.2
0.3
0.4
0.5
![Page 17: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/17.jpg)
PCA and regression
Is there any benefit to using PCA on the predictand as well asthe predictors?
Is there any benefit to predicting the PCs of y rather than y?
Perhaps. One could imagine a spatial average (like a PC) beingmore predictable than a value at a gridpoint.
![Page 18: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/18.jpg)
PCA and regression
Is there any benefit to using PCA on the predictand as well asthe predictors?
Is there any benefit to predicting the PCs of y rather than y?
Perhaps. One could imagine a spatial average (like a PC) beingmore predictable than a value at a gridpoint.
![Page 19: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/19.jpg)
PCA and regression
I Predicting the PCs of y leads to a different predictorselection problem
I Before: select a model for each gridpoint?I Now: select a model for each PC?
![Page 20: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/20.jpg)
Example: Philippines
36 PCs of y . (Why?)
Use AIC to select model. ENSO or climatology.
5 10 15 20 25 30 35
CL
ENSO
PCs of y
5 10 15 20 25 30 35
0
5
10
PCs of y
∆ A
IC
![Page 21: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/21.jpg)
Example: PhilippinesFirst 2 EOFs of AMJ precipitation:
115E 120E 125E 130E
5N
10N
15N
20N
EOF 1 of prcp
−4
−3
−2
−1
0
1
2
3
4
1975 1980 1985 1990 1995 2000 2005
−1012
1975 1980 1985 1990 1995 2000 2005−2
0
2115E 120E 125E 130E
5N
10N
15N
20N
EOF 2 of prcp
−4
−3
−2
−1
0
1
2
3
4
![Page 22: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/22.jpg)
Example: Philippines
I Correlations ofgridpoint andpatternregressions aresimilar.
I Normalized errorof gridpoint andpatternregressions aresimilar.
115E 120E 125E 130E
5N
10N
15N
20N
correlation
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
115E 120E 125E 130E
5N
10N
15N
20N
1−error var/clim var
0
0.1
0.2
0.3
0.4
0.5
![Page 23: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/23.jpg)
Example: Philippines
Pattern regression is:
PCy1 = 0.61 PCx1
PCy2 = −0.36 PCx1
What do these numbers mean? (Hint: PCs have unit variance.)
115E 120E 125E 130E
5N
10N
15N
20N
EOF 1 of prcp
−4
−3
−2
−1
0
1
2
3
4
1975 1980 1985 1990 1995 2000 2005
−1012
1975 1980 1985 1990 1995 2000 2005−2
0
2115E 120E 125E 130E
5N
10N
15N
20N
EOF 2 of prcp
−4
−3
−2
−1
0
1
2
3
4
![Page 24: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/24.jpg)
Example: Philippines
Reconstructing the spatial field:
Predicted rainfall = Climatology + PCy1EOFy1 + PCy2EOFy2
Difference between prediction and climatology (anomaly) is:
Predicted anomaly =
PCy1EOFy1 + PCy2EOFy2 = 0.61 PCx1EOFy1 − 0.36 PCx1EOFy1
= PCx1(0.61 EOFy1 − 0.36 EOFy1)
Is this simpler? Why?
![Page 25: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/25.jpg)
Example: Philippines
One pattern of rainfall goes with one pattern of SST.
100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
EOF 1 variance explained =50%
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1975 1980 1985 1990 1995 2000 2005
−101
1975 1980 1985 1990 1995 2000 2005
−101
115E 120E 125E 130E
5N
10N
15N
20N
Pattern of prcp
−4
−3
−2
−1
0
1
2
3
4
What is the time series of the pattern?
![Page 26: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/26.jpg)
Example: Philippines
Define the pattern to be
P ≡ 1√0.612 + 0.362
(0.61 EOFy1 − 0.36 EOFy1)
(Why this scaling of P?) The time series of the pattern P is:
TS =1√
0.612 + 0.362(0.61 PCy1 − 0.36 PCy1)
What is the variance of TS? (Hint: PCs are independent.) Key
Predicted anomaly = PCx1(0.61 EOFy1 − 0.36 EOFy1)
= 0.71 PCx1P
TS = 0.71 PCx1
What is 0.71? Hint: TS and PCx1 have unit variance.
![Page 27: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/27.jpg)
Example: Philippines
Define the pattern to be
P ≡ 1√0.612 + 0.362
(0.61 EOFy1 − 0.36 EOFy1)
(Why this scaling of P?) The time series of the pattern P is:
TS =1√
0.612 + 0.362(0.61 PCy1 − 0.36 PCy1)
What is the variance of TS? (Hint: PCs are independent.) Key
Predicted anomaly = PCx1(0.61 EOFy1 − 0.36 EOFy1)
= 0.71 PCx1P
TS = 0.71 PCx1
What is 0.71? Hint: TS and PCx1 have unit variance.
![Page 28: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/28.jpg)
Example: Philippines
Define the pattern to be
P ≡ 1√0.612 + 0.362
(0.61 EOFy1 − 0.36 EOFy1)
(Why this scaling of P?) The time series of the pattern P is:
TS =1√
0.612 + 0.362(0.61 PCy1 − 0.36 PCy1)
What is the variance of TS? (Hint: PCs are independent.) Key
Predicted anomaly = PCx1(0.61 EOFy1 − 0.36 EOFy1)
= 0.71 PCx1P
TS = 0.71 PCx1
What is 0.71? Hint: TS and PCx1 have unit variance.
![Page 29: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/29.jpg)
In summary,
1975 1980 1985 1990 1995 2000 2005
−101
115E 120E 125E 130E
5N
10N
15N
20N
Pattern of prcp
−4
−3
−2
−1
0
1
2
3
4
= 0.71×100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
EOF 1 variance explained =50%
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1975 1980 1985 1990 1995 2000 2005
−101
In general, any pattern regression can be decomposed intopairs of patterns related by the correlation of their time series.
Canonical correlation analysis (CCA) is an example of such adecomposition.
![Page 30: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/30.jpg)
In summary,
1975 1980 1985 1990 1995 2000 2005
−101
115E 120E 125E 130E
5N
10N
15N
20N
Pattern of prcp
−4
−3
−2
−1
0
1
2
3
4
= 0.71×100E 120E 140E 160E 180 160W 140W
20S
10S
0
10N
20N
EOF 1 variance explained =50%
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1975 1980 1985 1990 1995 2000 2005
−101
In general, any pattern regression can be decomposed intopairs of patterns related by the correlation of their time series.
Canonical correlation analysis (CCA) is an example of such adecomposition.
![Page 31: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/31.jpg)
Pattern regression
y1y2...yl
=
a11 a12 . . . a1ma21 a22 . . . a2m. . . . . .. . . . . .. . . . . .
al1 al2 . . . alm
x1x2...
xm
I l predictand PCsI m predictor PCsI l ×m regression coefficients.
A = Cov (PCy , PCx)[Cov
(PCx , PCT
x
)]−1
![Page 32: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/32.jpg)
Pattern regression
y = Ax
A = Cov (PCy , PCx)[Cov
(PCx , PCT
x
)]−1
What is Cov(
PCx , PCTx
)? Why?
Hint: PCs are . . . .
A = Cov (PCy , PCx)
What do the elements of A measure?
A = Corr (PCy , PCx)
![Page 33: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/33.jpg)
Pattern regression
y = Ax
A = Cov (PCy , PCx)[Cov
(PCx , PCT
x
)]−1
What is Cov(
PCx , PCTx
)? Why?
Hint: PCs are . . . .
A = Cov (PCy , PCx)
What do the elements of A measure?
A = Corr (PCy , PCx)
![Page 34: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/34.jpg)
Pattern regression
y = Ax
A =
Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)
. . . . . .
. . . . . .
. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)
In general, each predicted PC of y depends on all the PCs of x .
What if A were diagonal? Is it likely that A is diagonal?Angle.
![Page 35: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/35.jpg)
Pattern regression
y = Ax
A =
Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)
. . . . . .
. . . . . .
. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)
In general, each predicted PC of y depends on all the PCs of x .
What if A were diagonal? Is it likely that A is diagonal?Angle.
![Page 36: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/36.jpg)
Pattern regression
y = Ax
A =
Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)
. . . . . .
. . . . . .
. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)
In general, each predicted PC of y depends on all the PCs of x .
What if A were diagonal? Is it likely that A is diagonal?Angle.
![Page 37: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/37.jpg)
Pattern regression
y = Ax
A =
Corr(PCy1, PCx1) Corr(PCy1, PCx2) . . . Corr(PCy1, PCxm)Corr(PCy2, PCx1) Corr(PCy2, PCy2) . . . Corr(PCy2, PCxm)
. . . . . .
. . . . . .
. . . . . .Corr(PCyl , PCx1) Corr(PCyl , PCx2) . . . Corr(PCyl , PCxm)
In general, each predicted PC of y depends on all the PCs of x .
What if A were diagonal? Is it likely that A is diagonal?Angle.
![Page 38: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/38.jpg)
Pattern regression
I To decompose the regression into pairs of patterns,diagonalize A.
I Many ways to diagonalize A. The singular valuedecomposition (SVD) is:
A = USV T
where U and S are orthogonal and S is diagonal.(orthogonal matrix = columns are unit vectors = preserves angles and
magnitudes)
Substitutingy = Ax = USV T x
ory ′ = Sx ′
where y ′ = UT y and x ′ = V T x
![Page 39: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/39.jpg)
Pattern regression
I To decompose the regression into pairs of patterns,diagonalize A.
I Many ways to diagonalize A. The singular valuedecomposition (SVD) is:
A = USV T
where U and S are orthogonal and S is diagonal.(orthogonal matrix = columns are unit vectors = preserves angles and
magnitudes)
Substitutingy = Ax = USV T x
ory ′ = Sx ′
where y ′ = UT y and x ′ = V T x
![Page 40: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/40.jpg)
Diagonalized pattern regression
What can we say about the new variables
y ′ = UT y and x ′ = V T x?I y ′ and x ′ have unit variance and are uncorrelated (like
PCs).I PCs have unit variance and are uncorrelated.I Orthogonal transformation of PCs gives new uncorrelated
(angle) variables with unit variance (magnitude).I Each new predictand related to just one new predictor.
I y ′ = Sx ′ S is diagonal,I What do the values of S measure?
(Hint: what is the regression coefficient for two variables withunit variance?)
![Page 41: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/41.jpg)
Diagonalized regression and CCA
This procedure is the same as canonical correlation analysis.I Regress PCs (uncorrelated unit variance) of y and y .
y = Ax
I Use SVD of A to get diagonal relation: y ′ = Sx ′.I New variables (canonical variates) are linear (orthogonal)
combinations of the PCs.I New variables have unit variance and are uncorrelated.I Associated patterns are linear combinations of EOFs.
(Generally not orthogonal).I Elements of S are correlations (canonical correlations).
![Page 42: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/42.jpg)
More CCA
CCA is usually described as finding linear combinations of thex ’s and the y ’s which have maximum correlation.
Did we do that???
Finding maximum correlation between linear combinations of xand y is the same as finding maximum correlation betweenlinear combinations of x ′ and y ′. Why?
This means we can look at the correlation between linearcombinations of x ′ and y ′.
![Page 43: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/43.jpg)
More CCA
CCA is usually described as finding linear combinations of thex ’s and the y ’s which have maximum correlation.
Did we do that???
Finding maximum correlation between linear combinations of xand y is the same as finding maximum correlation betweenlinear combinations of x ′ and y ′. Why?
This means we can look at the correlation between linearcombinations of x ′ and y ′.
![Page 44: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/44.jpg)
A calculation
Corr(∑
aix ′i ,
∑bjy ′
j
)=
Cov(∑
aix ′i ,
∑bjy ′
j
)√
Var(∑
aix ′i
)Var
(∑bjy ′
j
)Var
(∑aix ′
i
)=
∑a2
i Var(x ′
i)
=∑
a2i = ‖a‖2
Var(∑
bjy ′j
)=
∑b2
j Var(
y ′j
)=
∑b2
j = ‖b‖2
Cov(∑
aix ′i ,
∑bjy ′
j
)=
∑aibiCov
(x ′
i , y ′i)
=∑
aibiSi ≤ S1∑
aibi ≤ ‖a‖ ‖b‖
Corr(∑
aix ′i ,
∑biy ′
i
)≤ S1
![Page 45: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/45.jpg)
A calculation
Corr(∑
aix ′i ,
∑bjy ′
j
)=
Cov(∑
aix ′i ,
∑bjy ′
j
)√
Var(∑
aix ′i
)Var
(∑bjy ′
j
)Var
(∑aix ′
i
)=
∑a2
i Var(x ′
i)
=∑
a2i = ‖a‖2
Var(∑
bjy ′j
)=
∑b2
j Var(
y ′j
)=
∑b2
j = ‖b‖2
Cov(∑
aix ′i ,
∑bjy ′
j
)=
∑aibiCov
(x ′
i , y ′i)
=∑
aibiSi ≤ S1∑
aibi ≤ ‖a‖ ‖b‖
Corr(∑
aix ′i ,
∑biy ′
i
)≤ S1
![Page 46: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/46.jpg)
A calculation
Corr(∑
aix ′i ,
∑bjy ′
j
)=
Cov(∑
aix ′i ,
∑bjy ′
j
)√
Var(∑
aix ′i
)Var
(∑bjy ′
j
)Var
(∑aix ′
i
)=
∑a2
i Var(x ′
i)
=∑
a2i = ‖a‖2
Var(∑
bjy ′j
)=
∑b2
j Var(
y ′j
)=
∑b2
j = ‖b‖2
Cov(∑
aix ′i ,
∑bjy ′
j
)=
∑aibiCov
(x ′
i , y ′i)
=∑
aibiSi ≤ S1∑
aibi ≤ ‖a‖ ‖b‖
Corr(∑
aix ′i ,
∑biy ′
i
)≤ S1
![Page 47: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/47.jpg)
A calculation
Corr(∑
aix ′i ,
∑bjy ′
j
)=
Cov(∑
aix ′i ,
∑bjy ′
j
)√
Var(∑
aix ′i
)Var
(∑bjy ′
j
)Var
(∑aix ′
i
)=
∑a2
i Var(x ′
i)
=∑
a2i = ‖a‖2
Var(∑
bjy ′j
)=
∑b2
j Var(
y ′j
)=
∑b2
j = ‖b‖2
Cov(∑
aix ′i ,
∑bjy ′
j
)=
∑aibiCov
(x ′
i , y ′i)
=∑
aibiSi ≤ S1∑
aibi ≤ ‖a‖ ‖b‖
Corr(∑
aix ′i ,
∑biy ′
i
)≤ S1
![Page 48: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/48.jpg)
More CCA components
Look for the linear combination of x and y that maximizescorrelation but is uncorrelated with the first component means
I looking for the linear combination of x ′i and y ′
i i = 2, 3, . . .that maximizes correlation. Why?
I Previous argument give that it is S2.
![Page 49: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/49.jpg)
CCA and regression
Knowing CCA is regression is useful . . .I What happens if many (compared to sample size) PCs are
included in a CCA calculation? What happens to thecanonical correlations?
I How can the number of PCs included in CCA be decided?
![Page 50: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/50.jpg)
CCA and regression
Knowing CCA is regression is useful . . .I What happens if many (compared to sample size) PCs are
included in a CCA calculation? What happens to thecanonical correlations?
I How can the number of PCs included in CCA be decided?
![Page 51: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/51.jpg)
CCA and regression
Knowing CCA is regression is useful . . .I What happens if many (compared to sample size) PCs are
included in a CCA calculation? What happens to thecanonical correlations?
I How can the number of PCs included in CCA be decided?
![Page 52: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/52.jpg)
Other pattern regression methods
Other diagonalizations of the regression coefficient matrix A(based on variants of the SVD) give other diagonalized patternregressions with components that optimize other quantities.
E.g., Redundancy analysis give components that maximizeexplained variance.
Maximum covariance analysis (MCA) finds components withmaximum covariance. However, the regression between thesepatterns is generally not diagonal–no simple relations betweenpairs of patterns.
![Page 53: PCR, CCA, and other pattern-based regression techniques](https://reader031.fdocuments.in/reader031/viewer/2022022314/6215e24514af5c024426d6ee/html5/thumbnails/53.jpg)
Summary
I PCA compresses data and is useful in regressions. InPCR, PCs are the predictors.
I It can be useful to use PCs as predictands, too.I Diagonalizing regressions between PCs decomposes the
regression in pairs of patterns.I CCA diagonalizes the regression and find the components
with maximum correlation.