Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL...

17
PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed on a set of data with the hope of simplifying the description of a set of interrelated variables. Variables are treated equally and they are not separated into dependent and independent variables. In simplest terms, PCA transforms the original interrelated variables into a new set of uncorrelated variables call Principal Components. Each principal component is a linear combination of the original variables. The amount of information expressed by each principal component is its variance. Principal components often are displayed in rank order of decreasing variance. The principal component with the highest variance is termed the “first principal component.” An advantage of principal components to researchers is that the complexity in interpretation that can be caused by having a large number of interrelated variables can be reduced by utilizing only the first few principal components that explain a large proportion of the total variation. PCA can be used to test for normality. If the principal components are not normally distributed, then the original data weren’t either. Basic Concepts Suppose we have a random sample of N observations for two variables, X 1 and X 2 . o To simplify the description of these two variables, we will subtract the mean of each dataset from each observation; thus, ! = ( ! ! ) and ! = ( ! ! ) o The values of x 1 and x 2 would each have a mean of 0 and the sample variances ! ! and ! ! would be unaffected by using the deviations. o Our goal through PCA is to create two new variables C 1 and C 2 , called principal components that are uncorrelated.

Transcript of Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL...

Page 1: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PRINCIPAL COMPONENTS ANALYSIS (PCA)

Introduction • PCA is considered an exploratory technique that can be used to gain a better

understanding of the interrelationships between variables.

• PCA is performed on a set of data with the hope of simplifying the description of a set of interrelated variables.

• Variables are treated equally and they are not separated into dependent and independent variables.

• In simplest terms, PCA transforms the original interrelated variables into a new set of

uncorrelated variables call Principal Components.

• Each principal component is a linear combination of the original variables.

• The amount of information expressed by each principal component is its variance.

• Principal components often are displayed in rank order of decreasing variance.

• The principal component with the highest variance is termed the “first principal component.”

• An advantage of principal components to researchers is that the complexity in

interpretation that can be caused by having a large number of interrelated variables can be reduced by utilizing only the first few principal components that explain a large proportion of the total variation.

• PCA can be used to test for normality. If the principal components are not normally

distributed, then the original data weren’t either. Basic Concepts

• Suppose we have a random sample of N observations for two variables, X1 and X2.

o To simplify the description of these two variables, we will subtract the mean of each dataset from each observation; thus,

§ 𝑥! = (𝑋! − 𝑋!)  and  𝑥! = (𝑋! − 𝑋!)

o The values of x1 and x2 would each have a mean of 0 and the sample variances 𝑆!!and  𝑆!! would be unaffected by using the deviations.

o Our goal through PCA is to create two new variables C1 and C2, called principal components that are uncorrelated.

Page 2: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

o The new variables are linear functions of x1 and x2 that can be written as:

§ 𝐶! = 𝑎!!𝑥! + 𝑎!"𝑥! and 𝐶! = 𝑎!"𝑥! + 𝑎!!𝑥!,  and

§ Mean C1 =Mean C2 = 0

§ Variance C1 = 𝑎!!! 𝑆!! + 𝑎!"! 𝑆!! + 2𝑎!!𝑎!"𝑟𝑆!𝑆!

§ Variance C2 = 𝑎!"! 𝑆!! + 𝑎!!! 𝑆!! + 2𝑎!"𝑎!!𝑟𝑆!𝑆!

§ The variances for C1 and C2 are referred to the first and second eigenvalues of covariance matrix of X1 and X2

o The coefficients are chosen such that

i. The Variance C1 is maximized and greater than all other variances.

• The Var C1 ≥ Var C2 ≥ . . . ≥ Var CP.

ii. The N values of C1 and C2 are uncorrelated.

iii. 𝑎!!! + 𝑎!"! + 𝑎!"! + 𝑎!!! = 1 (i.e. the sum of the squares of the coefficients is one).

o Hotelling originally derived the mathematical solution for the coefficients.

o PCA can be thought of as a rotation of the original x1 and x2 axes to new axes of

C1 and C2.

o The three items above that are related to how the coefficients are chosen determine the amount of the rotation of the new C1 and C2 axes.

o The values for C1 and C2 are found by drawing perpendicular lines to the new axes

from a given point, x1, x2.

     

Page 3: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

   

           

Figure 1. Diagram showing the original x1 and x2 axes and the new C1 and C2 axes.

Figure 2. Plot showing principal components for two variable.

Page 4: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

The Number of Components to Retain • An important concept of PCA is to reduce the number of variables or reduce

dimensionality.

• An important decision that the researcher must make when using PCA is to determine the number of principal components to use.

• This decision has no hard-set rules, and the decision may seem subjective at times.

• Common methods to reduce the number of principal components include:

o Determine the minimum amount of variation that you want defined by the

principal components. Some individuals use a cutoff of 80%, or may even go lower to 50%.

o Another option is to eliminate the principal components that explain insufficient variation. A common cutoff is <5%.

o Another method is to eliminate all principal components that explain less than

70/P percent of the variation, where P = the total number variables.

o Scree plots from the SAS analysis also can be used. The place where the plot has an “elbow” can be used as the cutoff. Example of using the scree plots will be discussed in the next section.

Examples of SAS Analyses Using Proc Princomp Example 1: Using PCA to reduce the number of variables. This example starts with 20 variables

X1 through X20.

• SAS commands

ods graphics on; ods rtf file='pca.rtf'; proc princomp; var x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20; run; ods rtf close;

Page 5: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  the  Depression  Data  Set    

The  PRINCOMP  Procedure    

Observations 294

Variables 20

Simple Statistics

x1 x2 x3 x4 x5 x6 x7

Mean 0.3639455782 0.5680272109 0.5442176871 0.1938775510 0.5510204082 0.2482993197 0.2448979592

StD 0.7573481288 0.8097938384 0.8916089113 0.5898717961 0.8194202018 0.6261485001 0.6515074328

Simple Statistixs

x8 x9 x10 x11 x12 x13 x14

Mean 0.3503401361 0.5680272109 0.4625850340 0.3605442177 0.5136054422 0.3401360544 0.7210884354

StD 0.7770491263 0.9128200608 0.7990788967 0.6857454351 0.7738154621 0.7058661916 0.9690240890

Simple Statistics

x15 x16 x17 x18 x19 x20

Mean 0.6734693878 0.7482993197 0.6190476190 0.3095238095 0.2551020408 0.2482993197

StD 0.8910879595 0.8924678170 0.8885570105 0.6527268435 0.5720072643 0.5867541003

Correlation Matrix

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13

x1 1.0000 0.6245 0.4284 0.5749 0.5172 0.4061 0.5519 0.2002 0.3615 0.5217 0.4628 0.2099 0.2209

x2 0.6245 1.0000 0.6198 0.4832 0.7303 0.4344 0.5699 0.1328 0.3746 0.5841 0.4597 0.4043 0.1922

x3 0.4284 0.6198 1.0000 0.4411 0.6065 0.4846 0.5688 0.1672 0.2353 0.4646 0.4483 0.2415 0.1496

x4 0.5749 0.4832 0.4411 1.0000 0.5267 0.3774 0.5243 0.2534 0.2068 0.3883 0.3919 0.1923 0.2017

x5 0.5172 0.7303 0.6065 0.5267 1.0000 0.4708 0.4880 0.1353 0.2782 0.5737 0.4105 0.3649 0.2000

x6 0.4061 0.4344 0.4846 0.3774 0.4708 1.0000 0.3942 0.1503 0.1107 0.3154 0.2916 0.3065 0.1326

x7 0.5519 0.5699 0.5688 0.5243 0.4880 0.3942 1.0000 0.2884 0.3564 0.4831 0.4816 0.2574 0.2042

x8 0.2002 0.1328 0.1672 0.2534 0.1353 0.1503 0.2884 1.0000 0.1612 0.1723 0.1593 0.1368 0.1056

x9 0.3615 0.3746 0.2353 0.2068 0.2782 0.1107 0.3564 0.1612 1.0000 0.4808 0.3587 0.1557 0.0328

x10 0.5217 0.5841 0.4646 0.3883 0.5737 0.3154 0.4831 0.1723 0.4808 1.0000 0.5417 0.2934 0.2526

x11 0.4628 0.4597 0.4483 0.3919 0.4105 0.2916 0.4816 0.1593 0.3587 0.5417 1.0000 0.2802 0.1477

x12 0.2099 0.4043 0.2415 0.1923 0.3649 0.3065 0.2574 0.1368 0.1557 0.2934 0.2802 1.0000 0.0790

x13 0.2209 0.1922 0.1496 0.2017 0.2000 0.1326 0.2042 0.1056 0.0328 0.2526 0.1477 0.0790 1.0000

x14 0.2458 0.3200 0.2000 0.1785 0.2716 0.3227 0.2545 0.0940 0.1566 0.3523 0.2803 0.2736 0.2440

x15 0.3638 0.5228 0.3361 0.2247 0.3594 0.2437 0.3969 0.0918 0.1281 0.3279 0.2603 0.3331 0.2911

Page 6: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  the  Depression  Data  Set    

The  PRINCOMP  Procedure    

Correlation Matrix

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13

x16 0.2673 0.4110 0.2671 0.1449 0.3303 0.2466 0.3412 0.0439 0.2222 0.3026 0.2547 0.2867 0.2068

x17 0.3893 0.4441 0.2497 0.2195 0.3362 0.3424 0.3563 0.0457 0.1583 0.3404 0.2710 0.3054 0.2236

x18 0.3099 0.2990 0.3019 0.2868 0.2862 0.2706 0.2626 0.1152 0.1908 0.3396 0.2836 0.2248 0.2078

x19 0.1789 0.2682 0.2823 0.1159 0.1724 0.1275 0.2256 0.0132 0.2183 0.1964 0.2346 0.0577 0.0464

x20 0.4257 0.4276 0.4389 0.5113 0.3888 0.2682 0.5368 0.2951 0.2838 0.4020 0.3366 0.2519 0.1662

Correlation Matrix

x14 x15 x16 x17 x18 x19 x20

x1 0.2458 0.3638 0.2673 0.3893 0.3099 0.1789 0.4257

x2 0.3200 0.5228 0.4110 0.4441 0.2990 0.2682 0.4276

x3 0.2000 0.3361 0.2671 0.2497 0.3019 0.2823 0.4389

x4 0.1785 0.2247 0.1449 0.2195 0.2868 0.1159 0.5113

x5 0.2716 0.3594 0.3303 0.3362 0.2862 0.1724 0.3888

x6 0.3227 0.2437 0.2466 0.3424 0.2706 0.1275 0.2682

x7 0.2545 0.3969 0.3412 0.3563 0.2626 0.2256 0.5368

x8 0.0940 0.0918 0.0439 0.0457 0.1152 0.0132 0.2951

x9 0.1566 0.1281 0.2222 0.1583 0.1908 0.2183 0.2838

x10 0.3523 0.3279 0.3026 0.3404 0.3396 0.1964 0.4020

x11 0.2803 0.2603 0.2547 0.2710 0.2836 0.2346 0.3366

x12 0.2736 0.3331 0.2867 0.3054 0.2248 0.0577 0.2519

x13 0.2440 0.2911 0.2068 0.2236 0.2078 0.0464 0.1662

x14 1.0000 0.2459 0.3921 0.4232 0.1316 0.1842 0.2603

x15 0.2459 1.0000 0.3469 0.2303 0.1802 0.0568 0.2274

x16 0.3921 0.3469 1.0000 0.4253 0.1401 0.2532 0.2371

x17 0.4232 0.2303 0.4253 1.0000 0.3511 0.2792 0.2213

x18 0.1316 0.1802 0.1401 0.3511 1.0000 0.0620 0.2086

x19 0.1842 0.0568 0.2532 0.2792 0.0620 1.0000 0.3699

x20 0.2603 0.2274 0.2371 0.2213 0.2086 0.3699 1.0000

Page 7: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  the  Depression  Data  Set    

The  PRINCOMP  Procedure    

Eigenvalues of the Correlation Matrix

Eigenvalue Difference Proportion Cumulative

1 7.05541769 5.56984843 0.3528 0.3528

2 1.48556925 0.25405950 0.0743 0.4270

3 1.23150975 0.16582473 0.0616 0.4886

4 1.06568502 0.05305242 0.0533 0.5419

5 1.01263260 0.04517187 0.0506 0.5925

6 0.96746073 0.02064438 0.0484 0.6409

7 0.94681635 0.17757881 0.0473 0.6883

8 0.76923754 0.07460055 0.0385 0.7267

9 0.69463699 0.03451014 0.0347 0.7614

10 0.66012685 0.05249140 0.0330 0.7945

11 0.60763545 0.05829812 0.0304 0.8248

12 0.54933734 0.01270376 0.0275 0.8523

13 0.53663358 0.02791603 0.0268 0.8791

14 0.50871755 0.05781498 0.0254 0.9046

15 0.45090258 0.07572163 0.0225 0.9271

16 0.37518095 0.05399659 0.0188 0.9459

17 0.32118435 0.02662157 0.0161 0.9619

18 0.29456278 0.02618829 0.0147 0.9767

19 0.26837449 0.06999633 0.0134 0.9901

20 0.19837816 0.0099 1.0000

Page 8: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  the  Depression  Data  Set    

The  PRINCOMP  Procedure    

Eigenvectors

Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7 Prin8 Prin9 Prin10 Prin11

x1 0.277438 -.144979 -.057702 0.002724 -.088268 -.118951 -.120184 -.180910 -.388430 -.197252 -.104234

x2 0.313183 0.027136 -.031630 -.247811 -.024397 0.100961 -.130714 0.042606 -.114186 -.008324 0.159686

x3 0.267798 -.154720 -.034590 -.247247 0.218305 -.042497 -.109251 0.111386 0.246253 0.392895 -.000589

x4 0.243554 -.319404 -.176943 0.071551 0.172925 -.141394 -.060163 -.152562 -.127640 -.410196 0.005257

x5 0.286784 -.049717 -.138388 -.279353 0.041111 0.013724 -.073581 -.097235 0.087690 0.008652 0.400198

x6 0.220570 0.053395 -.224213 -.182286 0.339874 -.150884 0.255087 -.273927 0.043571 0.339431 0.053202

x7 0.284370 -.164359 0.018960 0.076061 0.086999 0.096669 -.088212 0.036050 -.196078 0.080462 -.284635

x8 0.108096 -.304519 -.110325 0.556703 0.097606 0.320860 0.381442 0.066608 -.115406 0.374184 0.046032

x9 0.175781 -.168999 0.396225 0.014631 -.535478 0.126622 0.083942 -.070534 -.136258 0.120102 0.309623

x10 0.276625 -.045423 0.083456 -.008415 -.365054 -.017871 -.025569 -.163073 0.232840 0.022423 0.147770

x11 0.243270 -.104819 0.131395 -.041382 -.241930 -.030600 0.052013 -.123516 0.363904 0.008163 -.668991

x12 0.179019 0.229984 -.163432 -.145063 -.036843 0.367000 0.460876 0.303827 0.254277 -.390629 0.096363

x13 0.125906 0.212631 -.264529 0.540020 -.095286 -.190807 -.452218 0.040916 0.301849 0.039170 0.234516

x14 0.180253 0.401483 0.101409 0.246105 0.084712 0.045743 0.171322 -.507914 0.228825 -.092695 -.053963

x15 0.200363 0.209780 -.270323 -.031213 -.083408 0.390090 -.355778 0.287622 -.092704 0.050316 -.263350

x16 0.192430 0.417446 0.185012 0.046739 0.039931 0.202204 -.073058 0.033934 -.304982 0.199647 0.012610

x17 0.209684 0.390480 0.086023 0.068401 0.049913 -.318303 0.203555 0.027570 -.382192 -.080368 -.061428

x18 0.171712 0.015325 -.201930 0.062862 -.275225 -.549661 0.292328 0.474241 0.002129 0.088240 -.035984

x19 0.131489 0.056866 0.632607 0.023179 0.334922 -.157717 -.116624 0.316230 0.162291 0.040590 0.030883

x20 0.235700 -.228260 0.193254 0.240427 0.290938 0.082201 -.008931 0.176681 0.102174 -.374999 0.097475

Eigenvectors

Prin12 Prin13 Prin14 Prin15 Prin16 Prin17 Prin18 Prin19 Prin20

x1 0.286574 -.012444 0.125514 0.224443 -.224906 -.549164 -.265853 0.026949 0.238605

x2 0.182666 -.077524 -.189498 0.017131 0.132969 -.182041 0.091523 -.158884 -.781315

x3 -.133670 0.017281 -.142106 -.228717 0.188980 -.366600 0.032081 0.513338 0.166970

x4 -.192715 -.087906 0.044586 0.263833 0.360904 0.393507 -.014676 0.365430 -.074480

x5 -.028886 -.275472 -.241924 0.060494 0.086076 0.128577 0.084836 -.500952 0.460898

x6 0.102265 0.257438 0.506858 0.120893 -.240812 0.176439 0.113868 -.071734 -.105756

x7 -.179040 0.094940 0.117573 -.559414 0.102489 0.181793 -.468761 -.319636 -.027376

x8 0.210580 -.231186 -.193353 0.139832 0.016659 0.002992 0.020863 0.003605 -.012852

x9 -.002924 0.269038 0.335431 -.045605 0.312700 0.015411 0.207553 0.051066 0.094840

x10 0.053062 -.042353 -.242867 -.088813 -.572512 0.337189 -.255973 0.302199 -.084866

x11 -.030848 -.323119 0.110391 0.126584 0.028622 -.027464 0.300615 -.165863 -.019095

Page 9: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  the  Depression  Data  Set    

The  PRINCOMP  Procedure    

Eigenvectors

Prin12 Prin13 Prin14 Prin15 Prin16 Prin17 Prin18 Prin19 Prin20

x12 0.055942 -.150989 0.295300 -.079415 0.058212 -.108218 -.223006 0.116261 0.047158

x13 0.041837 -.176416 0.336937 -.109570 0.074537 -.097993 0.015767 -.033185 -.055836

x14 -.007998 0.370796 -.331885 0.061372 0.282669 -.135965 -.148451 -.050682 0.023338

x15 0.259095 0.380741 -.094768 0.176289 0.007444 0.259252 0.175924 0.076887 0.191330

x16 -.622118 -.235488 0.079951 0.300664 -.151649 -.037694 -.038036 0.069687 -.017238

x17 0.251275 -.230142 -.099636 -.417317 0.012245 0.136290 0.355917 0.174599 0.114463

x18 -.208372 0.262276 -.177396 0.213171 0.049702 -.034009 -.111069 -.149200 -.029813

x19 0.343680 -.052032 0.074344 0.268064 0.082175 0.162671 -.255453 -.034123 0.036257

x20 -.232928 0.293801 -.051536 -.125060 -.376759 -.161253 0.414382 -.124051 -.002511

Page 10: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

Initial PCA Analysis of Malt Data to Determine the Number of Principal Components to Retain

• How many principal components should be retained?

1. Eleven principal components should be retained based on the rule of maintaining the total variation >80%.

2. Five principal components should be retained based on the rule of eliminating all principal components that explain less than 5% of the total variation.

3. Nine principal components should be retained based on the rule of eliminating all

principal components that explain <70/P% of the variation (70/20 = 3.5%).

4. Around three principal components should be retained based on the scree plots.

• So what is the correct answer?

o Decision should be based on your knowledge of the subject area.

o I would select a number between 5-7. Example 2: Using PCA to determine the interrelationships between variables related to malt

quality, particularly malt extract.

• Malt quality of barley lines is determined using a large number of correlated traits.

• PCA will be used to:

o Reduce dimensionality between the 10 variables that define malt quality.

o Determine which of the 10 variables contribute to explaining the “most” variability in each principal component based on the load.

Page 11: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

Initial PCA Analysis of Malt Data to Determine the Number of Principal Components to Retain (Abbreviated Output)

Eigenvalues of the Correlation Matrix

Eigenvalue Difference Proportion Cumulative

1 3.25651383 1.32360184 0.3257 0.3257

2 1.93291199 0.29774838 0.1933 0.5189

3 1.63516362 0.45047335 0.1635 0.6825

4 1.18469026 0.23755389 0.1185 0.8009

5 0.94713637 0.52141083 0.0947 0.8956

6 0.42572554 0.12523121 0.0426 0.9382

7 0.30049433 0.12518352 0.0300 0.9683

8 0.17531082 0.05986813 0.0175 0.9858

9 0.11544268 0.08883213 0.0115 0.9973

10 0.02661056 0.0027 1.0000

Eigenvectors

Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7 Prin8 Prin9 Prin10

kwt 0.460908 0.098994 -.191527 0.024931 -.208176 0.436128 0.416468 -.553296 -.001091 0.165322

plump 0.504392 0.093371 -.006892 -.054249 -.118362 0.021649 0.372624 0.726388 0.149609 -.174729

barcolor -.221913 -.226217 0.621107 -.046569 0.136818 0.175687 0.349279 -.099545 0.574247 -.019714

wrtcolor 0.107532 0.546630 -.186266 0.098715 0.493803 0.246585 -.334806 -.007926 0.473256 -.066470

protein 0.420932 -.168517 0.252873 0.408924 -.035322 -.235921 -.301670 0.038411 0.148166 0.626032

wrtprt 0.128291 0.440959 0.401698 0.424598 0.016312 -.388182 0.172970 -.235822 -.221777 -.402225

kolbach -.331075 0.526391 0.139328 -.107046 0.079364 0.056819 0.318532 0.201344 -.254311 0.605167

dp 0.308405 -.156832 0.400847 -.225186 0.484449 0.368787 -.173206 0.040258 -.512496 -.062732

alpha 0.033714 0.313990 0.370631 -.304447 -.640944 0.226079 -.446256 0.004503 0.064486 -.070536

bglucan -.269890 -.091582 -.006101 0.694381 -.163648 0.566538 -.045656 0.238714 -.145816 -.092158

Page 12: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

Initial PCA Analysis of Malt Data to Determine the Number of Principal Components to Retain (Abbreviated Output)

• Based on the different methods of determining how many principal components to retain, I would keep five.

• The next step is to redo the analysis keeping in only five principal components.

• The different plots to interpret the results should be requested.

• SAS Commands

ods graphics on; ods rtf file='maltpca.rtf'; proc princomp n=5 plots (ncomp=3)=pattern; id variety; var kwt plump barcolor wrtcolor protein wrtprt kolbach dp alpha bglucan; *title 'PCA of Malt Quality Using All Variables'; title 'PCA of Malt Quality Analyses Using 5 Principal Components'; run; ods rtf close;

• In the Proc Princomp statement, I use the option n=5 to have the PCA only calculate only the first five principal components. Additionally, the statement “plots=pattern” will provide the graphical plots of the PCA. The option “(ncomp=3)” requests for the graphical output comparing only the first three principal components.

Page 13: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  Malt  Quality  Analyses  Using  5  Principal  Components      

The  PRINCOMP  Procedure    

 

 

Observations 20

Variables 10

Simple Statistics

kwt plump barcolor wrtcolor protein wrtprt kolbach

Mean 36.00000000 97.09000000 38.50000000 1.810000000 13.67500000 5.947000000 45.55000000

StD 1.47362782 1.73384271 2.68524232 0.158612404 0.54567679 0.220838116 2.22911358

Simple Statistics

dp alpha bglucan

Mean 181.0000000 69.06000000 264.5000000

StD 17.7378572 2.68139791 73.1570049

Correlation Matrix

kwt plump barcolor wrtcolor protein wrtprt kolbach dp alpha bglucan

kwt 1.0000 0.7782 -.5134 0.2342 0.4569 0.1312 -.4251 0.2479 0.0974 -.2920

plump 0.7782 1.0000 -.3866 0.1879 0.6044 0.2405 -.3978 0.4145 0.1534 -.4576

barcolor -.5134 -.3866 1.0000 -.4325 -.0413 0.0799 0.1838 0.3028 0.1228 0.1932

wrtcolor 0.2342 0.1879 -.4325 1.0000 -.0639 0.3765 0.3811 0.0486 -.0324 -.1288

protein 0.4569 0.6044 -.0413 -.0639 1.0000 0.4147 -.6497 0.4834 -.0112 -.0560

wrtprt 0.1312 0.2405 0.0799 0.3765 0.4147 1.0000 0.3481 0.0949 0.3005 0.0508

kolbach -.4251 -.3978 0.1838 0.3811 -.6497 0.3481 1.0000 -.3280 0.3179 0.1166

dp 0.2479 0.4145 0.3028 0.0486 0.4834 0.0949 -.3280 1.0000 0.0238 -.4058

alpha 0.0974 0.1534 0.1228 -.0324 -.0112 0.3005 0.3179 0.0238 1.0000 -.1801

bglucan -.2920 -.4576 0.1932 -.1288 -.0560 0.0508 0.1166 -.4058 -.1801 1.0000

Eigenvalues of the Correlation Matrix

Eigenvalue Difference Proportion Cumulative

1 3.25651383 1.32360184 0.3257 0.3257

2 1.93291199 0.29774838 0.1933 0.5189

3 1.63516362 0.45047335 0.1635 0.6825

4 1.18469026 0.23755389 0.1185 0.8009

5 0.94713637 0.0947 0.8956

Page 14: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  Malt  Quality  Analyses  Using  5  Principal  Components      

The  PRINCOMP  Procedure    

 

 

Eigenvectors

Prin1 Prin2 Prin3 Prin4 Prin5

kwt 0.460908 0.098994 -.191527 0.024931 -.208176

plump 0.504392 0.093371 -.006892 -.054249 -.118362

barcolor -.221913 -.226217 0.621107 -.046569 0.136818

wrtcolor 0.107532 0.546630 -.186266 0.098715 0.493803

protein 0.420932 -.168517 0.252873 0.408924 -.035322

wrtprt 0.128291 0.440959 0.401698 0.424598 0.016312

kolbach -.331075 0.526391 0.139328 -.107046 0.079364

dp 0.308405 -.156832 0.400847 -.225186 0.484449

alpha 0.033714 0.313990 0.370631 -.304447 -.640944

bglucan -.269890 -.091582 -.006101 0.694381 -.163648

Page 15: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  Malt  Quality  Analyses  Using  5  Principal  Components    

The  PRINCOMP  Procedure    

 

 

 

 

 

Page 16: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  Malt  Quality  Analyses  Using  5  Principal  Components    

The  PRINCOMP  Procedure    

 

 

   

Page 17: Principal component analysis - NDSU › faculty › horsley › Principal... · PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction • PCA is considered an exploratory technique that

PCA  of  Malt  Quality  Analyses  Using  5  Principal  Components    

The  PRINCOMP  Procedure