PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf ·...

PCA and admixture modelsCM226: Machine Learning for Bioinformatics.

Fall 2016

Sriram SankararamanAcknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price

PCA and admixture models 1 / 57

Announcements

• HW1 solutions posted.

Supervised versus Unsupervised Learning

Unsupervised Learning from unlabeled observations

• Dimensionality Reduction. Last class.

• Other latent variable models. This class + review of PCA.

Outline

Dimensionality reduction

Linear Algebra background

PCAPractical issuesProbabilistic PCA

Admixture models

Population structure and GWAS

PCA and admixture models Dimensionality reduction 4 / 57

Raw data can be complex, high-dimensional

• If we knew what to measure, we could find simple relationships.

• Signals have redundancy.

• Genotype measured at ≈ 500K SNPs.

• Genotypes at neighboring SNPs correlated.

Goal: Find a “more compact” representation of dataWhy ?

• Visualize and discover hidden patterns.

• Preprocessing for a supervised learning problem.

• Statistical: remove noise.

• Computational: reduce wasteful computation.

Goal: Find a “more compact” representation of dataWhy ?

• Visualize and discover hidden patterns.

• Preprocessing for a supervised learning problem.

• Statistical: remove noise.

• Computational: reduce wasteful computation.

An example

• We measure parents’ andoffspring heights.

• Two measurements.• Points in R2

• How can we find a more“compact” representation ?

• Two measurements arecorrelated with some noise.

• Pick a direction and project.

An example

Goal: Minimize reconstruction error

• Find projection that minimizesthe Euclidean distance betweenoriginal points and projections.

• Principal Components Analysissolves this problem!

Principal Components Analysis

PCA: find lower dimensional representation of data

• Choose K.

• X is N ×M raw data.

• X ≈ ZWT where Z = N ×K reduced representaion (PC scores)

• W is M ×K principal components (columns are principalcomponents).

Outline

Admixture models

PCA and admixture models Linear Algebra background 10 / 57

Covariance matrix

• Generalizes to many features

• Ci,i: variance of feature i

• Ci,j : covariance of feature i and j

• Symmetric

Covariance matrix

• Positive semi-definite (PSD). Sometimes indicated as C � 0

(Positive semi-definite matrix) A matrix A ∈ Rn×n is positivesemi-definite iff vTAv ≥ 0 for all v ∈ Rn.

Covariance matrix

• Positive semi-definite (PSD). Sometimes indicated as C � 0

vTCv ∝ vTXTXv

= (Xv)TXv

n∑i=1

(Xv)i2

Covariance matrix

• All covariance matrices (being symmetric and PSD) have aneigendecomposition

Eigenvector and eigenvalue

(Eigenvector and eigenvalue) A vector v is an eigenvector ofA ∈ Rn×n if Av = λv for λ is the eigenvalue associated with v.

Eigendecomposition of a covariance matrix

• C is symmetric ⇒Its eigenvectors {ui}, i ∈ {1, . . . ,M} can be chosen to beorthonormal

• uTi uj = 0, i 6= j

• uTi ui = 1

• We can choose eigenvectors so that eigenvalues are in decreasingorder: λ1 ≥ λ2 . . . ≥ λM .

Cui = λiui, i ∈ {1, . . . ,M}

Arrange U = [u1 . . .uM ]

CU = C[u1 . . .uM ]

= [Cu1 . . .CuM ]

= [λ1u1 . . . λMuM ]

= [u1 . . .uM ]

λ1 0 . . . 0...

......

...0 0 . . . λM

CU = UΛ

Now U is an orthogonal matrix. So UUT = IM

C = CUUT

= UΛUT

C = UΛUT

• U is m×m orthonormal matrix. Columns are eigenvectors sorted byeigenvalues.

• Λ is a diagonal matrix of eigenvalues.

Eigendecomposition: Example

Covariance matrix : Ψ

Eigendecomposition: Example

Covariance matrix : Ψ

Alternate characterization of eigenvectors

• Eigenvectors are orthonormal directions of maximum variance

• Eigenvalues are the variance in these directions.

• First eigenvector direction of maximum variance with variance = λ1.

Alternate characterization of eigenvectors

Given covariance matrix C ∈ RM×M

x∗ = arg maxx xTCx

‖x‖2 = 1

Solution:x∗ = u1 is the first eigenvector of C.

• Example of a constrained optimization problem

• Why do we need the constaint ?

Outline

Admixture models

PCA and admixture models PCA 17 / 57

Back to PCA

Given N data points xn ∈ RM , n ∈ {1, . . . , N}, find a lineartransformation from a lower dimensional space K < M :W ∈ RM×K and a projection zn ∈ RK so that we can reconstructoriginal data from the lower dimensional projection.

xn ≈ w1zn,1 + . . .+wKzn,K

= [w1 . . .wK ]

zn,1...zn,K

= Wzn, zn ∈ RK

• We assume the data is centered.∑

n xn,m = 0.

Compression• We go from storing N ×M to M ×K +N ×K.

How do we define quality of reconstruction?

• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error

J(W ,Z) =1

‖xn −Wzn‖22

Z = [z1, . . . ,zN ]T

• Require columns of W to be orthonormal.

• The optimal solution is obtained by setting W = UK where UK

contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.

• The low-dimensional projection zn = WTxn.

J(W ,Z) =1

‖xn −Wzn‖22

Z = [z1, . . . ,zN ]T

J(W ,Z) =1

‖xn −Wzn‖22

Z = [z1, . . . ,zN ]T

PCA: K = 1

J(w1, z1) =1

‖xn −w1zn,1‖22

(xn −w1zn,1)T (xn −w1zn,1)

(xTnx− 2wT

1 xnzn,1 + zn,12wT

)= const+

(−2wT

1 xnzn,1 + zn,12)

To maximize this function, take derivatives with respect to zn,1

∂J(w1, z1)

∂zn,1= 0

⇒ zn,1 = wT1 xn

PCA: K = 1Plugging back zn,1 = wT

J(w1) = const+1

(−2wT

1 xnzn,1 + zn,12)

= const+1

(−2zn,1zn,1 + zn,1

= const− 1

Now, because the data is centered

E [z1] =1

wT1 xn

xn = 0PCA and admixture models PCA 20 / 57

PCA: K = 1

J(w1) = const− 1

Var [z1] = E[z1

2]− E [z1]

zn,12 − 0

PCA: K = 1

Putting together

J(w1) = const− 1

Var [z1] =1

We have

J(w1) = const− Var [z1]

Two views of PCA: Find a direction that minimizes the reconstructionerror ≡ Find a direction that maximizes variance of projected data

arg minw1J(w1) = arg maxw1

Var [z1]

PCA: K = 1

Var [z1]

Var [z1] =1

wT1 xnw

wT1 xnx

∑n(xnx

= wT1Cw1

PCA: K = 1

Var [z1]

So we need to solve

arg maxw1wT

Since we required W to be orthonormal, we need to constrain: ‖w1‖2 = 1.

This objective function is maximized when w1 is the first eigenvector of C

PCA: K > 1

• We can repeat the argument for K > 1.

• Since we require directions wk to be orthonormal, we can repeat theargument by searching for direction that maximzes the remainingvariance and is orthogonal to previously selected directions.

Computing eigendecompositions

• Numerical algorithms to compute all eigenvalue, eigenvectors.O(M3).

• Infeasible for genetic datasets.

• Computing largest eigenvalue, eigenvector: Power iteration. O(M2).

• Since we are interested in covariance matrices, can use algorithms tocompute the singular-value decomposition (SVD): O(MN2). (Willdiscuss later).

Practical issues

Choosing K

• For visualization, K = 2 or K = 3.

• For other analyses, pick K so that most of the variance in the data isretained. Fraction of variance retained in the top K eigenvectors∑K

k=1 λk∑Mm=1 λm

PCA: Example

PCA on HapMap

PCA on Human Genome Diversity Project

PCA on European genetic data

Novembre et al. Nature 2008PCA and admixture models PCA 28 / 57

Probabilistic interpretation of PCA

zniid∼ N (0, IK)

p(xn|zn) = N (Wzn, σ2IM )

zniid∼ N (0, IK)

E [xn|zn] = Wzn

E [xn] = E [E [xn|zn]]

= E [Wzn]

= WE [zn]

zniid∼ N (0, IK)

Cov [xn] = E[xnx

]− E [xn]E [xn]T

= E[(Wzn + εn)(Wzn + εn)T

]− 0

= E[Wznz

T + 2WznεTn + εnε

+ E[2Wznε

[εnε

]= WE [znzn]WT + 2WE

]+ σ2IM

= WE [znzn]WT + 2WE [zn]E [εn]T + σ2IM

= WIKWT + 2W 0 + σ2IM

= WWT + σ2IM

Probabilistic PCA

Log likelihood

LL(W , σ2) ≡ logP (D|W , σ2)

Maximize W subject to constraint that columns of W are orthonormal.The maximum likelihood estimator

WML = UK

√(ΛK − σ2IK)

UK = [U1 . . .UK ]

λ1 . . . 0...

...0 . . . λK

σ2ML =

M −K

M∑j=K+1

Probabilistic PCA

Log likelihood

LL(W , σ2) ≡ logP (D|W , σ2)

Maximize W subject to constraint that columns of W are orthonormal.The maximum likelihood estimator

WML = UK

√(ΛK − σ2IK)

UK = [U1 . . .UK ]

λ1 . . . 0...

...0 . . . λK

σ2ML =

M −K

M∑j=K+1

Probabilistic PCA

Computing the MLE

• Compute eigenvalues, eigenvectors

• Hidden/latent variable problem: Use EM

Probabilistic PCA

Computing the MLE

• Compute eigenvalues, eigenvectors

• Hidden/latent variable problem: Use EM

Other advantages of Probabilistic PCA

Can use model selection to infer K.

• Choose K to maximize the marginal likelihood P (D|K).

• Use cross-validation and pick K that maximizes likelihood on held outdata.

• Other model selection criteria such as AIC or BIC (see lecture 6 onclustering).

Mini-Summary

• Dimensionality reduction: Linear methods• Exploratory analysis and visualization.• Downstream inference: Can use the low-dimensional features for other

tasks.

• Principal Components Analysis finds a linear subspace that minimizedreconstruction error or equivalently maximizes the variance.

• Eigenvalue problem.• Probabilistic interpretation also leads to EM.

• Why may PCA not be appropriate for genetic data ?

PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf ·...

Documents

Transcript of PCA and admixture models - UCLAweb.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/pca.1.pdf ·...

Admixture Mapping Comes of Age* - Emil Kirkegaardemilkirkegaard.dk/en/wp-content/uploads/Admixture...Admixture Mapping Comes of Age∗ Cheryl A. Winkler, 1George W. Nelson, and Michael

Introduction to genomics CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2017/slides/intro.pdf · Is the disease genetic or environmental ? ... next Introduction

Chemical Admixture 2

Admixture Product Catalog - BBraunUSA.com...A5 ADMIXTURE PRODUCTS Admixture Products ©201 B. Braun Medical Inc. 11 Clinical/Technical Support 1-800-854-6851 Description Product Code

Incompatibility IV Admixture

WATERPROOFING ADMIXTURE FOR CONCRETE

Population Structure Analysis - BiostatisticsLearning objectives •Methods to identify global estimates of population structure –Principal Component Analysis (PCA) –Admixture

Experience with new admixture technologies in high ... · PDF fileExperience with new admixture technologies in ... admixture technologies in high performance concrete . ... mix design,

Intravenous admixture system

Modeling Continuous Admixture€¦ · However, the estimation of admixture time is largely dependent on the precision of the applied admixture models. Several methods have It is made

1.8 Admixture

Admixture Product Selector - X-Calibur · Anti-freeze admixture for cold weather concrete (Liquid & Powder) Plasticising and cement dispersing admixture for semi-dry concrete X-Calibur

admixture types

Rcc Admixture w Cvr1

PCA-Dien PCA Chapter

Introductory statistics - CM226: Machine Learning for ...web.cs.ucla.edu/~sriram/courses/cm226.fall-2016/slides/stats.pdfIntroductory statistics CM226: Machine Learning for Bioinformatics.

CAPS Central Admixture Pharmacy Services, Inc. · Central Admixture Pharmacy Services, Inc. The nation’s largest network of outsourcing admixture pharmacies Work with CAPS to ensure

Water and Admixture

CM226 Catering and Event Management Chapter 8 Pages 186 – 222.

Ancient Admixture in Human History - Geneticsgenetics.med.harvard.edu/reichlab/Reich_Lab/Software... · 2013-04-11 · Ancient Admixture in Human History ... admixture that assess