Techniques for studying correlation and covariance structure Principal Components Analysis (PCA)...
-
Upload
edwina-wilcox -
Category
Documents
-
view
227 -
download
2
Transcript of Techniques for studying correlation and covariance structure Principal Components Analysis (PCA)...
![Page 1: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/1.jpg)
Techniques for studying correlation and covariance structure
Principal Components Analysis (PCA)
Factor Analysis
![Page 2: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/2.jpg)
Principal Component Analysis
![Page 3: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/3.jpg)
Let x
and covariance matrix .
Definition:
1 1 1 p pC a x a x a x
have a p-variate Normal distribution
with mean vector
The linear combination
is called the first principal component if
1, , pa a a
is chosen to maximize
1Var C Var a x a a
subject to2 21 1pa a a a
![Page 4: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/4.jpg)
Let
, 1 1g a V a a a a a a
Consider maximizing
subject to 2 21 1pa a a a
V Var a x a a
Using the Lagrange multiplier technique
![Page 5: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/5.jpg)
Now
,1 0 if 1
g aa a a a
and
,2 2 0 if
g aa a a a
a
Thus is an eigenvector of and is the eigenvalue
associated with .
a
a
![Page 6: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/6.jpg)
Also Var a x a a a a a a
Hence is maximized if is the largest
eigenvalue of .
Var a x
Summary
1 1 1 p pC a x a x a x
is the first principal component if 1
p
a
a
a
2 21i.e. 1pa a a a
is the eigenvector (length 1)of associated with the largest eigenvalue 1 of .
![Page 7: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/7.jpg)
Let x
and covariance matrix .
Definition:
1 11 1 1 1
1 1
p p
p p pp p p
C a x a x a x
C a x a x a x
have a p-variate Normal distribution
with mean vector
The set of linear combinations
are called the principal components of
1, ,i i ipa a a
2 21 1i i i ipa a a a
The complete set of Principal components
x
if are chosen such that
![Page 8: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/8.jpg)
Note: we have already shown that
1 11 1, , pa a a
is the eigenvector of associated with the largest eigenvalue, 1 ,of the covariance matrix and
1 1 1 1 1Var C Var a x a a
and
1. Var(C1) is maximized.
2. Var(Ci) is maximized subject to Ci being independent of C1, …, Ci-1 (the previous i -1 principle components)
![Page 9: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/9.jpg)
We will now show that
1, ,i i ipa a a
is the eigenvector of associated with the ith largest eigenvalue, i of the covariance matrix and
i i i i iVar C Var a x a a
Proof (by induction – Assume true for i -1, then prove true for i)
![Page 10: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/10.jpg)
1 1 1
i
i i i
C a x a
x A x
C a x a
Now
has covariance matrix
1
1i i i
i
a
A A a a
a
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1
0
0
i i i
i i i i i i i i
i i i i i i i i i i
a a a a a a a a
a a a a a a a a
a a a a a a a a a a a a
![Page 11: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/11.jpg)
Hence Ci is independent of C1, …, Ci-1 if
1 1 1, , ,i i i i ig a a a a a
We want to maximize i i iVar C a a
subject to
1 11. 0i i ia a a a
2. 1i ia a
Let
1 1
1 1 1
1 1
0
0
0
i i i
i i i i
i i i i
a a a a
a a a a
a a a a
1 1 1i i i i i ia a a a
![Page 12: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/12.jpg)
1 1, , , ,0 for 1, 1i i i
j ij
g aa a j i
Now
and
1 1, , , ,1 0i i i
i ii
g aa a
![Page 13: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/13.jpg)
Now
1, , ,i i
i
g a
a
1 1 1 12 2
0i i i i ia a a a
1 1 1 1
1
2i i i i ia a a a
1 1 1 1
1
2j i i j i j i j ia a a a a a a a
1
0 02 j j ja a
hence
Also for j < i
Hence j = 0 for j < I and equation (1) becomes
(1)
i i ia a
![Page 14: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/14.jpg)
1, , pa a
i p
the principal component thi iC i a x
are the eignevectors of associated with the eigenvalues
Thus
and
1. Var(C1) is maximized.
2. Var(Ci) is maximized subject to Ci being independent of C1, …, Ci-1 (the previous i -1 principal components)
where
![Page 15: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/15.jpg)
1, , pa a
0i p
Recall any positive matrix,
where are eigenvectors of of length 1 and
1 1
1
0
, ,
0p
p p
a
a a PDP
a
are eigenvalues of
1, , is an orthogonal matrix.
( )
pP a a
P P PP I
![Page 16: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/16.jpg)
Example
In this example wildlife (moose) population density was measured over time (once a year) in three areas.
Year Area 1 Area 2 Area 3 Year Area 1 Area 2 Area 3
1 11.3 14.1 6.9 13 6.1 9.9 6.82 10.4 14 11.2 14 9.7 13.2 6.63 9.9 13 8.7 15 8.1 9.4 44 8.2 11.4 3.3 16 11.3 11.8 4.95 10.1 11.9 8.7 17 8.8 11.5 8.86 10.7 13.8 12.5 18 9.4 11.6 5.77 11 14.9 8.9 19 7.5 11.4 4.98 7.1 8.5 3.7 20 8.8 10.7 7.29 14.7 14.5 12.1 21 7.5 11.1 7
10 5.4 9 4.1 22 9.1 13.2 8.911 7.3 7.6 5.6 23 6.8 9.8 7.612 10.2 10.9 7.3
![Page 17: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/17.jpg)
picture
Area 3
Area 1
Area 2
![Page 18: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/18.jpg)
The Sample Statistics
4.297 3.307 3.295
3.527 3.527
6.566
S
9.10
11.62
7.19
x
1 .796 .620
1 .687
1
R
The mean vector
The covariance matrix
The correlation matrix
![Page 19: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/19.jpg)
Principal component Analysis
1 2 3
.522 .582 .624
.523 , .359 , .733
.674 .730 .117
a a a
1 2 311.85974, 2.204232, 0.814249 The eigenvalues of S
The eigenvectors of S
1 1 2 3
2 1 2 3
3 1 2 3
.522 .523 .674
.582 .359 .730
.624 .733 .117
C x x x
C x x x
C x x x
The principal components
![Page 20: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/20.jpg)
Area 3
Area 1
Area 2
1 1 2 3.522 .523 .674C x x x
![Page 21: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/21.jpg)
Area 3
Area 1
Area 2
2 1 2 3.582 .359 .730C x x x
![Page 22: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/22.jpg)
Area 3
Area 1
Area 2
3 1 2 3.624 .733 .117C x x x
![Page 23: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/23.jpg)
Graphical Picture of Principal Components
Multivariate Normal data falls in an ellipsoidal pattern.
The shape and orientation of the ellipsoid is determined by the covariance matrix
The eignevectors of are vectors giving the directions of the axes of the ellopsoidThe eigenvalues give the length of these axes.
![Page 24: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/24.jpg)
Recall that if is a positive definite matrix
1 1 1 p p pa a a a
1 1
1 1
0
, ,
0 p p
a
a a
a
PDPwhere P is an orthogonal matrix (P’P = PP’ = I) with the columns equal to the eigenvectors of .and D is a diagonal matrix with diagonal elements equal to the eigenvalues of .
![Page 25: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/25.jpg)
The vector of Principal components
1 1 1
p p p
C a x a
C x P x
C a x a
c P P P PDP P P P D P P
has covariance matrix
1 0
0 p
D
![Page 26: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/26.jpg)
An orthogonal matrix rotates vectors, thus
C P x
ctr tr P P tr PP tr
Also
1 1
p p
i iii i
1 1
var var Total Variance of p p
i ii i
C x x
rotates the vector x
into the vector of Principal components C
tr(D) =
![Page 27: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/27.jpg)
The ratio
1 1
var
Total Variance of ii i
p p
j jjj j
C
x
denotes the proportion of variance explained by the ith principal component Ci.
![Page 28: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/28.jpg)
The Example
i i % variance
1 11.8597 79.71%2 2.20423 14.82%3 0.81425 5.47%
Total 14.8782 100%
![Page 29: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/29.jpg)
cov , cov ,i j i j i jC x a x e x a e
Also
1 1 1i p p p ja a a a a e
i i j i ija e a
0,0, ,0,1,0, ,0jj
e
where
cov ,corr ,
i j
i j
i j
C xC x
Var C Var x
i ij iij
jji jj
aa
![Page 30: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/30.jpg)
Comment:If instead of the covariance matrix, , The correlation matrix , is used to extract the Principal components then the Principal components are defined in terms of the standard scores of the observations:
i ii
ii
xz
*and corr i i ijC a
The correlation matrix is the covariance matrix of the standard scores of the observations:
![Page 31: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/31.jpg)
More Examples
![Page 32: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/32.jpg)
Example 2: Bone Lengths of White Leghorn Fowl: The correlation matrix of the complete set of six fowl bone
measurements had the following form: Skull Length 1.000 0.584 0.615 0.601 0.570 0.600 Skull Breadth 1.000 0.576 0.530 0.526 0.555 Humerus 1.000 0.940 0.875 0.878 Ulna 1.000 0.877 0.886 Femur 1.000 0.924 Tibia 1.000
![Page 33: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/33.jpg)
Table: Principal Components Component Dimension 1 2 3 4 5 6
Skull: Length 0.74 0.45 0.49 -0.02 0.01 0.00 Breadth 0.70 0.59 -0.41 0.00 0.00 -0.01 Wing: Humerus 0.95 -0.16 -0.03 0.22 0.05 0.16 Ulna 0.94 -0.21 0.01 0.20 -0.04 -0.17 Leg: Femur 0.93 -0.24 -0.04 -0.21 0.18 -0.03 Tibia 0.94 -0.19 -0.03 -0.20 -0.19 0.04 Eigenvalue 4.568 0.714 0.412 0.173 0.076 0.057 % of Total 76.1 11.9 6.9 2.9 1.3 0.9 Variance
![Page 34: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/34.jpg)
Identification of Components: component Description 1 General average of all bone dimensions (Size) 2 Comparison of skull sizewith Wing and Leg lengths 3 Comparison of skull length and breadth (Skull Shape) 4 Comparison of Wing and Leg lengths 5 Comparison of femur and tibia 6 Comparison of humerus and ulna
![Page 35: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/35.jpg)
Example 3: Weschler Adult Intelligence Scale Subtest Scores Table: Principal Components Component 1 2 3 4 WAIS subtest: Information 0.83 0.33 -0.04 -0.01 Comprehension 0.75 0.31 0.07 -0.17 Arihtmetic 0.72 0.25 -0.08 0.35 Similarities 0.78 0.14 0.00 -0.21 Digit Span 0.62 0.00 -0.38 0.58 Vocabulary 0.83 0.38 -0.03 -0.16 Digit Symbol 0.72 -0.36 -0.26 -0.01 Picture Completion 0.78 -0.10 -0.25 -0.01 Block Design 0.72 -0.26 0.36 0.18 Picture Arrangement 0.72 -0.23 0.04 -0.05 Object assembly 0.65 -0.30 0.47 0.13 Age: -0.34 0.80 0.26 0.18 Years of Education: 0.75 0.01 -0.30 -0.23
Eigenvalue 6.69 1.42 0.80 0.71
% of Total Variance 51.47 10.90 6.15 5.48 Cum % of Variance 51.47 62.37 68.52 74.01
![Page 36: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/36.jpg)
Identification of Components: component Description 1 General intellectual Performance 2 Experiential or age factor - bipolar dimension comparing verbal
or informational skills known to increase with advancing age to subtests measuring spatial-perceptual qualities and other cognitive abilities known to decrease with age
3 Spatial imagery or perception dimension 4 Numerical facility
![Page 37: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/37.jpg)
Computation of the eigenvalues and eigenvectors of
1 1 1 p p pa a a a
21 1 1 p p pa a a a
1 1 1 p p pa a a a
2 21 1 1 p p pa a a a
Recall:
![Page 38: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/38.jpg)
continuing we see that:
1 1 1n n n
p p pa a a a
21 1 1
1 1
n n
pnp p p pa a a a a a
1 1 1n a a
For large values of n
1 1n a a
![Page 39: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/39.jpg)
The algorithm for computing the eigenvector 1a
1. Compute
2 4 8 16, , , , etc
rescaling so that the elements do not become to large in value.
i.e. rescale so that the largest element is 1.
2. Compute
1 1 1 1 for large and 1n ka a n a a 1a
using the fact that:
3. Compute 1 using
1 1 1a a
![Page 40: Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649d965503460f94a7f6af/html5/thumbnails/40.jpg)
4. Repeat using the matrix
21 1 1 2 2 2 p p pa a a a a a
5. Continue with i = 2 , … , p – 1 using the matrix
11 1 1
i ii i i i i i p p pa a a a a a
Example – Using Excel - Eigen