3 principal components analysis

13
Biology Chemistr y Informat ics Principal Component Analysis (PCA) of metabolomic sample processing methods Principal Components Analysis Goal: Use PCA to identify the major modes of variance (Used DATA: Pumpkin data 1.csv) Topics: 1.Principal component number selection 2.Data pretreatment 3.PCA results visualization

Transcript of 3 principal components analysis

Page 1: 3  principal components analysis

Biology

Chemistry

Informatics

Principal Component Analysis (PCA) of metabolomic sample processing methods

Prin

cipa

l Com

pone

nts

Anal

ysis

Goal:Use PCA to identify the major modes of variance

(Used DATA: Pumpkin data 1.csv)

Topics: 1. Principal component number selection2. Data pretreatment3. PCA results visualization

Page 2: 3  principal components analysis

Biology

Chemistry

Informatics

Principal Components Analysis Pr

inci

pal C

ompo

nent

s An

alys

is

Steps1. Calculate a PCA model2. Select optimal model principal component (PC) 3. Overview PCA scores and loadings plots4. Repeat steps 1-2 using data centering and scaling

Visualize:5. Sample scores annotated by extraction and treatment6. Leverage and DmodX (distance from model plane)7. Variable loadings and biplots

Exercise:8. How many PCs are needed to capture 80% variance for raw data and

scaled data?9. Are their any moderate or extreme outliers?10.What variables contribute most to the variance for raw and scaled data?

Page 3: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Variance Explained(raw data)

Pri

nci

pal

Co

mp

on

ents

An

alys

is

• PCs can be selected to explain a minimum %variance in the data (~80%)

• PCs explaining below 1% variance can be excluded

• q2 is the cross-validated PCA prediction of left out data

Page 4: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Scores (raw data)P

rin

cip

al C

om

po

nen

ts A

nal

ysis

• Hotelling's T2 ellipse shows 95% CI for bivariate normal distribution

• Samples lying outside of the ellipse could be outliers

Page 5: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Loadings (raw data, centered)P

rin

cip

al C

om

po

nen

ts A

nal

ysis

• Unscaled data PCA loadings are highly correlated with magnitude

Page 6: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Biplot (raw data)P

rin

cip

al C

om

po

nen

ts A

nal

ysis

• Biplots can be used to rapidly overview the correlation between sample scores and variable loadings

Sample contains high maleic acid

Sample contains high sucrose (low maleic acid)

Page 7: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Leverage and DmodX(raw data)

Pri

nci

pal

Co

mp

on

ents

An

alys

is

Leverage is the distance to samples center in the PCA plane (extreme outliers)

Distance to model X (DmodX) is the orthogonal distance to the PCA plane (moderate outliers)

Page 8: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Variance Explained(autoscaled)

Pri

nci

pal

Co

mp

on

ents

An

alys

is

Page 9: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Scores (autoscaled)P

rin

cip

al C

om

po

nen

ts A

nal

ysis

• Loadings on PC1 describe differences due to extraction

• Loadings on PC2 describe differences due to treatment

Page 10: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Leverage and DmodX(autoscaled)

Pri

nci

pal

Co

mp

on

ents

An

alys

is

• Samples with both high leverage and DomdX are likely outliers

• Evaluate PCA results after their removal

Page 11: 3  principal components analysis

Biology

Chemistry

Informatics

PCA Loadings (autoscaled)P

rin

cip

al C

om

po

nen

ts A

nal

ysis

• Scaled loadings are independent of variable magnitude and show a rich variance structure of the data

Page 12: 3  principal components analysis

Biology

Chemistry

Informatics

Pri

nci

pal

Co

mp

on

ents

An

alys

isRelationship between scores and loadings (autoscaled)

Higher in 100% MeOH

ExtractionLower in 100% MeOH

Page 13: 3  principal components analysis

Biology

Chemistry

Informatics

Loadings and ScoresP

rin

cip

al C

om

po

nen

ts A

nal

ysis

Highest positive loading on PC1

Highest negative loading on PC1