Download - Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.

Techniques for studying correlation and covariance structure

Principal Components Analysis (PCA)

Factor Analysis

Principal Component Analysis

Let x

and covariance matrix .

Definition:

1 1 1 p pC a x a x a x

have a p-variate Normal distribution

with mean vector

The linear combination

is called the first principal component if

1, , pa a a

is chosen to maximize

1Var C Var a x a a

subject to2 21 1pa a a a

Let

, 1 1g a V a a a a a a

Consider maximizing

subject to 2 21 1pa a a a

V Var a x a a

Using the Lagrange multiplier technique

Now

,1 0 if 1

g aa a a a

and

,2 2 0 if

g aa a a a

a

Thus is an eigenvector of and is the eigenvalue

associated with .

a

a

Also Var a x a a a a a a

Hence is maximized if is the largest

eigenvalue of .

Var a x

Summary

1 1 1 p pC a x a x a x

is the first principal component if 1

p

a

a

a

2 21i.e. 1pa a a a

is the eigenvector (length 1)of associated with the largest eigenvalue 1 of .

Let x

and covariance matrix .

Definition:

1 11 1 1 1

1 1

p p

p p pp p p

C a x a x a x

C a x a x a x

have a p-variate Normal distribution

with mean vector

The set of linear combinations

are called the principal components of

1, ,i i ipa a a

2 21 1i i i ipa a a a

The complete set of Principal components

x

if are chosen such that

Note: we have already shown that

1 11 1, , pa a a

is the eigenvector of associated with the largest eigenvalue, 1 ,of the covariance matrix and

1 1 1 1 1Var C Var a x a a

and

1. Var(C1) is maximized.

2. Var(Ci) is maximized subject to Ci being independent of C1, …, Ci-1 (the previous i -1 principle components)

We will now show that

1, ,i i ipa a a

is the eigenvector of associated with the ith largest eigenvalue, i of the covariance matrix and

i i i i iVar C Var a x a a

Proof (by induction – Assume true for i -1, then prove true for i)

1 1 1

i

i i i

C a x a

x A x

C a x a

Now

has covariance matrix

1

1i i i

i

a

A A a a

a

1 1 1 1 1 1 1

1 1 1 1 1 1 1

1 1 1 1

0

0

i i i

i i i i i i i i

i i i i i i i i i i

a a a a a a a a

a a a a a a a a

a a a a a a a a a a a a

Hence Ci is independent of C1, …, Ci-1 if

1 1 1, , ,i i i i ig a a a a a

We want to maximize i i iVar C a a

subject to

1 11. 0i i ia a a a

2. 1i ia a

Let

1 1

1 1 1

1 1

0

0

0

i i i

i i i i

i i i i

a a a a

a a a a

a a a a

1 1 1i i i i i ia a a a

1 1, , , ,0 for 1, 1i i i

j ij

g aa a j i

Now

and

1 1, , , ,1 0i i i

i ii

g aa a

Now

1, , ,i i

i

g a

a

1 1 1 12 2

0i i i i ia a a a

1 1 1 1

1

2i i i i ia a a a

1 1 1 1

1

2j i i j i j i j ia a a a a a a a

1

0 02 j j ja a

hence

Also for j < i

Hence j = 0 for j < I and equation (1) becomes

(1)

i i ia a

1, , pa a

i p

the principal component thi iC i a x

are the eignevectors of associated with the eigenvalues

Thus

and

1. Var(C1) is maximized.

2. Var(Ci) is maximized subject to Ci being independent of C1, …, Ci-1 (the previous i -1 principal components)

where

1, , pa a

0i p

Recall any positive matrix,

where are eigenvectors of of length 1 and

1 1

1

0

, ,

0p

p p

a

a a PDP

a

are eigenvalues of

1, , is an orthogonal matrix.

( )

pP a a

P P PP I

Example

In this example wildlife (moose) population density was measured over time (once a year) in three areas.

Year Area 1 Area 2 Area 3 Year Area 1 Area 2 Area 3

1 11.3 14.1 6.9 13 6.1 9.9 6.82 10.4 14 11.2 14 9.7 13.2 6.63 9.9 13 8.7 15 8.1 9.4 44 8.2 11.4 3.3 16 11.3 11.8 4.95 10.1 11.9 8.7 17 8.8 11.5 8.86 10.7 13.8 12.5 18 9.4 11.6 5.77 11 14.9 8.9 19 7.5 11.4 4.98 7.1 8.5 3.7 20 8.8 10.7 7.29 14.7 14.5 12.1 21 7.5 11.1 7

10 5.4 9 4.1 22 9.1 13.2 8.911 7.3 7.6 5.6 23 6.8 9.8 7.612 10.2 10.9 7.3

picture

Area 3

Area 1

Area 2

The Sample Statistics

4.297 3.307 3.295

3.527 3.527

6.566

S

9.10

11.62

7.19

x

1 .796 .620

1 .687

1

R

The mean vector

The covariance matrix

The correlation matrix

Principal component Analysis

1 2 3

.522 .582 .624

.523 , .359 , .733

.674 .730 .117

a a a

1 2 311.85974, 2.204232, 0.814249 The eigenvalues of S

The eigenvectors of S

1 1 2 3

2 1 2 3

3 1 2 3

.522 .523 .674

.582 .359 .730

.624 .733 .117

C x x x

C x x x

C x x x

The principal components

Area 3

Area 1

Area 2

1 1 2 3.522 .523 .674C x x x

Area 3

Area 1

Area 2

2 1 2 3.582 .359 .730C x x x

Area 3

Area 1

Area 2

3 1 2 3.624 .733 .117C x x x

Graphical Picture of Principal Components

Multivariate Normal data falls in an ellipsoidal pattern.

The shape and orientation of the ellipsoid is determined by the covariance matrix

The eignevectors of are vectors giving the directions of the axes of the ellopsoidThe eigenvalues give the length of these axes.

Recall that if is a positive definite matrix

1 1 1 p p pa a a a

1 1

1 1

0

, ,

0 p p

a

a a

a

PDPwhere P is an orthogonal matrix (P’P = PP’ = I) with the columns equal to the eigenvectors of .and D is a diagonal matrix with diagonal elements equal to the eigenvalues of .

The vector of Principal components

1 1 1

p p p

C a x a

C x P x

C a x a

c P P P PDP P P P D P P

has covariance matrix

1 0

0 p

D

An orthogonal matrix rotates vectors, thus

C P x

ctr tr P P tr PP tr

Also

1 1

p p

i iii i

1 1

var var Total Variance of p p

i ii i

C x x

rotates the vector x

into the vector of Principal components C

tr(D) =

The ratio

1 1

var

Total Variance of ii i

p p

j jjj j

C

x

denotes the proportion of variance explained by the ith principal component Ci.

The Example

i i % variance

1 11.8597 79.71%2 2.20423 14.82%3 0.81425 5.47%

Total 14.8782 100%

cov , cov ,i j i j i jC x a x e x a e

Also

1 1 1i p p p ja a a a a e

i i j i ija e a

0,0, ,0,1,0, ,0jj

e

where

cov ,corr ,

i j

i j

i j

C xC x

Var C Var x

i ij iij

jji jj

aa

Comment:If instead of the covariance matrix, , The correlation matrix , is used to extract the Principal components then the Principal components are defined in terms of the standard scores of the observations:

i ii

ii

xz

*and corr i i ijC a

The correlation matrix is the covariance matrix of the standard scores of the observations:

More Examples

Example 2: Bone Lengths of White Leghorn Fowl: The correlation matrix of the complete set of six fowl bone

measurements had the following form: Skull Length 1.000 0.584 0.615 0.601 0.570 0.600 Skull Breadth 1.000 0.576 0.530 0.526 0.555 Humerus 1.000 0.940 0.875 0.878 Ulna 1.000 0.877 0.886 Femur 1.000 0.924 Tibia 1.000

Table: Principal Components Component Dimension 1 2 3 4 5 6

Skull: Length 0.74 0.45 0.49 -0.02 0.01 0.00 Breadth 0.70 0.59 -0.41 0.00 0.00 -0.01 Wing: Humerus 0.95 -0.16 -0.03 0.22 0.05 0.16 Ulna 0.94 -0.21 0.01 0.20 -0.04 -0.17 Leg: Femur 0.93 -0.24 -0.04 -0.21 0.18 -0.03 Tibia 0.94 -0.19 -0.03 -0.20 -0.19 0.04 Eigenvalue 4.568 0.714 0.412 0.173 0.076 0.057 % of Total 76.1 11.9 6.9 2.9 1.3 0.9 Variance

Identification of Components: component Description 1 General average of all bone dimensions (Size) 2 Comparison of skull sizewith Wing and Leg lengths 3 Comparison of skull length and breadth (Skull Shape) 4 Comparison of Wing and Leg lengths 5 Comparison of femur and tibia 6 Comparison of humerus and ulna

Example 3: Weschler Adult Intelligence Scale Subtest Scores Table: Principal Components Component 1 2 3 4 WAIS subtest: Information 0.83 0.33 -0.04 -0.01 Comprehension 0.75 0.31 0.07 -0.17 Arihtmetic 0.72 0.25 -0.08 0.35 Similarities 0.78 0.14 0.00 -0.21 Digit Span 0.62 0.00 -0.38 0.58 Vocabulary 0.83 0.38 -0.03 -0.16 Digit Symbol 0.72 -0.36 -0.26 -0.01 Picture Completion 0.78 -0.10 -0.25 -0.01 Block Design 0.72 -0.26 0.36 0.18 Picture Arrangement 0.72 -0.23 0.04 -0.05 Object assembly 0.65 -0.30 0.47 0.13 Age: -0.34 0.80 0.26 0.18 Years of Education: 0.75 0.01 -0.30 -0.23

Eigenvalue 6.69 1.42 0.80 0.71

% of Total Variance 51.47 10.90 6.15 5.48 Cum % of Variance 51.47 62.37 68.52 74.01

Identification of Components: component Description 1 General intellectual Performance 2 Experiential or age factor - bipolar dimension comparing verbal

or informational skills known to increase with advancing age to subtests measuring spatial-perceptual qualities and other cognitive abilities known to decrease with age

3 Spatial imagery or perception dimension 4 Numerical facility

Computation of the eigenvalues and eigenvectors of

1 1 1 p p pa a a a

21 1 1 p p pa a a a

1 1 1 p p pa a a a

2 21 1 1 p p pa a a a

Recall:

continuing we see that:

1 1 1n n n

p p pa a a a

21 1 1

1 1

n n

pnp p p pa a a a a a

1 1 1n a a

For large values of n

1 1n a a

The algorithm for computing the eigenvector 1a

1. Compute

2 4 8 16, , , , etc

rescaling so that the elements do not become to large in value.

i.e. rescale so that the largest element is 1.

2. Compute

1 1 1 1 for large and 1n ka a n a a 1a

using the fact that:

3. Compute 1 using

1 1 1a a

4. Repeat using the matrix

21 1 1 2 2 2 p p pa a a a a a

5. Continue with i = 2 , … , p – 1 using the matrix

11 1 1

i ii i i i i i p p pa a a a a a

Example – Using Excel - Eigen