PCA and admixture modelsCM226: Machine Learning for Bioinformatics.
Fall 2016
Sriram SankararamanAcknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price
PCA and admixture models 1 / 57
Announcements
• HW1 solutions posted.
PCA and admixture models 2 / 57
Supervised versus Unsupervised Learning
Unsupervised Learning from unlabeled observations
• Dimensionality Reduction. Last class.
• Other latent variable models. This class + review of PCA.
PCA and admixture models 3 / 57
Outline
Dimensionality reduction
Linear Algebra background
PCAPractical issuesProbabilistic PCA
Admixture models
Population structure and GWAS
PCA and admixture models Dimensionality reduction 4 / 57
Raw data can be complex, high-dimensional
• If we knew what to measure, we could find simple relationships.
• Signals have redundancy.
• Genotype measured at ≈ 500K SNPs.
• Genotypes at neighboring SNPs correlated.
PCA and admixture models Dimensionality reduction 5 / 57
Dimensionality reduction
Goal: Find a “more compact” representation of dataWhy ?
• Visualize and discover hidden patterns.
• Preprocessing for a supervised learning problem.
• Statistical: remove noise.
• Computational: reduce wasteful computation.
PCA and admixture models Dimensionality reduction 6 / 57
Dimensionality reduction
Goal: Find a “more compact” representation of dataWhy ?
• Visualize and discover hidden patterns.
• Preprocessing for a supervised learning problem.
• Statistical: remove noise.
• Computational: reduce wasteful computation.
PCA and admixture models Dimensionality reduction 6 / 57
An example
• We measure parents’ andoffspring heights.
• Two measurements.• Points in R2
• How can we find a more“compact” representation ?
• Two measurements arecorrelated with some noise.
• Pick a direction and project.
PCA and admixture models Dimensionality reduction 7 / 57
An example
• We measure parents’ andoffspring heights.
• Two measurements.• Points in R2
• How can we find a more“compact” representation ?
• Two measurements arecorrelated with some noise.
• Pick a direction and project.
PCA and admixture models Dimensionality reduction 7 / 57
An example
• We measure parents’ andoffspring heights.
• Two measurements.• Points in R2
• How can we find a more“compact” representation ?
• Two measurements arecorrelated with some noise.
• Pick a direction and project.
PCA and admixture models Dimensionality reduction 7 / 57
Goal: Minimize reconstruction error
• Find projection that minimizesthe Euclidean distance betweenoriginal points and projections.
• Principal Components Analysissolves this problem!
PCA and admixture models Dimensionality reduction 8 / 57
Principal Components Analysis
PCA: find lower dimensional representation of data
• Choose K.
• X is N ×M raw data.
• X ≈ ZWT where Z = N ×K reduced representaion (PC scores)
• W is M ×K principal components (columns are principalcomponents).
PCA and admixture models Dimensionality reduction 9 / 57
Outline
Dimensionality reduction
Linear Algebra background
PCAPractical issuesProbabilistic PCA
Admixture models
Population structure and GWAS
PCA and admixture models Linear Algebra background 10 / 57
Covariance matrix
C =1
NXTX
• Generalizes to many features
• Ci,i: variance of feature i
• Ci,j : covariance of feature i and j
• Symmetric
PCA and admixture models Linear Algebra background 11 / 57
Covariance matrix
C =1
NXTX
• Positive semi-definite (PSD). Sometimes indicated as C � 0
(Positive semi-definite matrix) A matrix A ∈ Rn×n is positivesemi-definite iff vTAv ≥ 0 for all v ∈ Rn.
PCA and admixture models Linear Algebra background 11 / 57
Covariance matrix
C =1
NXTX
• Positive semi-definite (PSD). Sometimes indicated as C � 0
vTCv ∝ vTXTXv
= (Xv)TXv
=
n∑i=1
(Xv)i2
PCA and admixture models Linear Algebra background 11 / 57
Covariance matrix
C =1
NXTX
• All covariance matrices (being symmetric and PSD) have aneigendecomposition
PCA and admixture models Linear Algebra background 11 / 57
Eigenvector and eigenvalue
(Eigenvector and eigenvalue) A vector v is an eigenvector ofA ∈ Rn×n if Av = λv for λ is the eigenvalue associated with v.
PCA and admixture models Linear Algebra background 12 / 57
Eigendecomposition of a covariance matrix
• C is symmetric ⇒Its eigenvectors {ui}, i ∈ {1, . . . ,M} can be chosen to beorthonormal
• uTi uj = 0, i 6= j
• uTi ui = 1
• We can choose eigenvectors so that eigenvalues are in decreasingorder: λ1 ≥ λ2 . . . ≥ λM .
PCA and admixture models Linear Algebra background 13 / 57
Eigendecomposition of a covariance matrix
Cui = λiui, i ∈ {1, . . . ,M}
Arrange U = [u1 . . .uM ]
CU = C[u1 . . .uM ]
= [Cu1 . . .CuM ]
= [λ1u1 . . . λMuM ]
= [u1 . . .uM ]
λ1 0 . . . 0...
......
...0 0 . . . λM
= UΛ
PCA and admixture models Linear Algebra background 13 / 57
Eigendecomposition of a covariance matrix
CU = UΛ
Now U is an orthogonal matrix. So UUT = IM
C = CUUT
= UΛUT
PCA and admixture models Linear Algebra background 14 / 57
Eigendecomposition of a covariance matrix
C = UΛUT
• U is m×m orthonormal matrix. Columns are eigenvectors sorted byeigenvalues.
• Λ is a diagonal matrix of eigenvalues.
PCA and admixture models Linear Algebra background 14 / 57
Eigendecomposition: Example
Covariance matrix : Ψ
PCA and admixture models Linear Algebra background 15 / 57
Eigendecomposition: Example
Covariance matrix : Ψ
PCA and admixture models Linear Algebra background 15 / 57
Alternate characterization of eigenvectors
• Eigenvectors are orthonormal directions of maximum variance
• Eigenvalues are the variance in these directions.
• First eigenvector direction of maximum variance with variance = λ1.
PCA and admixture models Linear Algebra background 16 / 57
Alternate characterization of eigenvectors
Given covariance matrix C ∈ RM×M
x∗ = arg maxx xTCx
‖x‖2 = 1
Solution:x∗ = u1 is the first eigenvector of C.
• Example of a constrained optimization problem
• Why do we need the constaint ?
PCA and admixture models Linear Algebra background 16 / 57
Outline
Dimensionality reduction
Linear Algebra background
PCAPractical issuesProbabilistic PCA
Admixture models
Population structure and GWAS
PCA and admixture models PCA 17 / 57
Back to PCA
Given N data points xn ∈ RM , n ∈ {1, . . . , N}, find a lineartransformation from a lower dimensional space K < M :W ∈ RM×K and a projection zn ∈ RK so that we can reconstructoriginal data from the lower dimensional projection.
xn ≈ w1zn,1 + . . .+wKzn,K
= [w1 . . .wK ]
zn,1...zn,K
= Wzn, zn ∈ RK
• We assume the data is centered.∑
n xn,m = 0.
Compression• We go from storing N ×M to M ×K +N ×K.
How do we define quality of reconstruction?
PCA and admixture models PCA 18 / 57
PCA
• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error
J(W ,Z) =1
N
∑n
‖xn −Wzn‖22
Z = [z1, . . . ,zN ]T
• Require columns of W to be orthonormal.
• The optimal solution is obtained by setting W = UK where UK
contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.
• The low-dimensional projection zn = WTxn.
PCA and admixture models PCA 19 / 57
PCA
• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error
J(W ,Z) =1
N
∑n
‖xn −Wzn‖22
Z = [z1, . . . ,zN ]T
• Require columns of W to be orthonormal.
• The optimal solution is obtained by setting W = UK where UK
contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.
• The low-dimensional projection zn = WTxn.
PCA and admixture models PCA 19 / 57
PCA
• Find zn ∈ RK and W ∈ RM×K to minimize the reconstruction error
J(W ,Z) =1
N
∑n
‖xn −Wzn‖22
Z = [z1, . . . ,zN ]T
• Require columns of W to be orthonormal.
• The optimal solution is obtained by setting W = UK where UK
contains the K eigenvectors associated with the K largesteigenvalues of the covaiance matrix C of X.
• The low-dimensional projection zn = WTxn.
PCA and admixture models PCA 19 / 57
PCA: K = 1
J(w1, z1) =1
N
∑n
‖xn −w1zn,1‖22
=1
N
∑n
(xn −w1zn,1)T (xn −w1zn,1)
=1
N
∑n
(xTnx− 2wT
1 xnzn,1 + zn,12wT
1w1
)= const+
1
N
∑n
(−2wT
1 xnzn,1 + zn,12)
To maximize this function, take derivatives with respect to zn,1
∂J(w1, z1)
∂zn,1= 0
⇒ zn,1 = wT1 xn
PCA and admixture models PCA 20 / 57
PCA: K = 1Plugging back zn,1 = wT
1 xn
J(w1) = const+1
N
∑n
(−2wT
1 xnzn,1 + zn,12)
= const+1
N
∑n
(−2zn,1zn,1 + zn,1
2)
= const− 1
N
∑n
zn,12
Now, because the data is centered
E [z1] =1
N
∑n
zn,1
=1
N
∑n
wT1 xn
= wT1
1
N
∑n
xn = 0PCA and admixture models PCA 20 / 57
PCA: K = 1
J(w1) = const− 1
N
∑n
zn,12
Var [z1] = E[z1
2]− E [z1]
2
=1
N
∑n
zn,12 − 0
=1
N
∑n
zn,12
PCA and admixture models PCA 20 / 57
PCA: K = 1
Putting together
J(w1) = const− 1
N
∑n
zn,12
Var [z1] =1
N
∑n
zn,12
We have
J(w1) = const− Var [z1]
Two views of PCA: Find a direction that minimizes the reconstructionerror ≡ Find a direction that maximizes variance of projected data
arg minw1J(w1) = arg maxw1
Var [z1]
PCA and admixture models PCA 20 / 57
PCA: K = 1
arg minw1J(w1) = arg maxw1
Var [z1]
Var [z1] =1
N
∑n
zn,12
=1
N
∑n
wT1 xnw
T1 xn
=1
N
∑n
wT1 xnx
Tnw1
= wT1
∑n(xnx
Tn )
Nw1
= wT1Cw1
PCA and admixture models PCA 21 / 57
PCA: K = 1
arg minw1J(w1) = arg maxw1
Var [z1]
So we need to solve
arg maxw1wT
1Cw1
Since we required W to be orthonormal, we need to constrain: ‖w1‖2 = 1.
This objective function is maximized when w1 is the first eigenvector of C
PCA and admixture models PCA 21 / 57
PCA: K > 1
• We can repeat the argument for K > 1.
• Since we require directions wk to be orthonormal, we can repeat theargument by searching for direction that maximzes the remainingvariance and is orthogonal to previously selected directions.
PCA and admixture models PCA 22 / 57
Computing eigendecompositions
• Numerical algorithms to compute all eigenvalue, eigenvectors.O(M3).
• Infeasible for genetic datasets.
• Computing largest eigenvalue, eigenvector: Power iteration. O(M2).
• Since we are interested in covariance matrices, can use algorithms tocompute the singular-value decomposition (SVD): O(MN2). (Willdiscuss later).
PCA and admixture models PCA 23 / 57
Practical issues
Choosing K
• For visualization, K = 2 or K = 3.
• For other analyses, pick K so that most of the variance in the data isretained. Fraction of variance retained in the top K eigenvectors∑K
k=1 λk∑Mm=1 λm
PCA and admixture models PCA 24 / 57
PCA: Example
PCA and admixture models PCA 25 / 57
PCA: Example
PCA and admixture models PCA 25 / 57
PCA: Example
PCA and admixture models PCA 25 / 57
PCA: Example
PCA and admixture models PCA 25 / 57
PCA: Example
PCA and admixture models PCA 25 / 57
PCA on HapMap
PCA and admixture models PCA 26 / 57
PCA on Human Genome Diversity Project
PCA and admixture models PCA 27 / 57
PCA on Human Genome Diversity Project
PCA and admixture models PCA 27 / 57
PCA on European genetic data
1
Novembre et al. Nature 2008PCA and admixture models PCA 28 / 57
Probabilistic interpretation of PCA
zniid∼ N (0, IK)
p(xn|zn) = N (Wzn, σ2IM )
PCA and admixture models PCA 29 / 57
Probabilistic interpretation of PCA
zniid∼ N (0, IK)
p(xn|zn) = N (Wzn, σ2IM )
E [xn|zn] = Wzn
E [xn] = E [E [xn|zn]]
= E [Wzn]
= WE [zn]
= 0
PCA and admixture models PCA 29 / 57
Probabilistic interpretation of PCA
zniid∼ N (0, IK)
p(xn|zn) = N (Wzn, σ2IM )
Cov [xn] = E[xnx
Tn
]− E [xn]E [xn]T
= E[(Wzn + εn)(Wzn + εn)T
]− 0
= E[Wznz
TnW
T + 2WznεTn + εnε
Tn
]= E
[Wznz
TnW
T]
+ E[2Wznε
Tn
]+ E
[εnε
Tn
]= WE [znzn]WT + 2WE
[znε
Tn
]+ σ2IM
= WE [znzn]WT + 2WE [zn]E [εn]T + σ2IM
= WIKWT + 2W 0 + σ2IM
= WWT + σ2IM
PCA and admixture models PCA 29 / 57
Probabilistic PCA
Log likelihood
LL(W , σ2) ≡ logP (D|W , σ2)
Maximize W subject to constraint that columns of W are orthonormal.The maximum likelihood estimator
WML = UK
√(ΛK − σ2IK)
UK = [U1 . . .UK ]
ΛK =
λ1 . . . 0...
...0 . . . λK
σ2ML =
1
M −K
M∑j=K+1
λj
PCA and admixture models PCA 30 / 57
Probabilistic PCA
Log likelihood
LL(W , σ2) ≡ logP (D|W , σ2)
Maximize W subject to constraint that columns of W are orthonormal.The maximum likelihood estimator
WML = UK
√(ΛK − σ2IK)
UK = [U1 . . .UK ]
ΛK =
λ1 . . . 0...
...0 . . . λK
σ2ML =
1
M −K
M∑j=K+1
λj
PCA and admixture models PCA 30 / 57
Probabilistic PCA
Computing the MLE
• Compute eigenvalues, eigenvectors
• Hidden/latent variable problem: Use EM
PCA and admixture models PCA 31 / 57
Probabilistic PCA
Computing the MLE
• Compute eigenvalues, eigenvectors
• Hidden/latent variable problem: Use EM
PCA and admixture models PCA 31 / 57
Other advantages of Probabilistic PCA
Can use model selection to infer K.
• Choose K to maximize the marginal likelihood P (D|K).
• Use cross-validation and pick K that maximizes likelihood on held outdata.
• Other model selection criteria such as AIC or BIC (see lecture 6 onclustering).
PCA and admixture models PCA 32 / 57
Mini-Summary
• Dimensionality reduction: Linear methods• Exploratory analysis and visualization.• Downstream inference: Can use the low-dimensional features for other
tasks.
• Principal Components Analysis finds a linear subspace that minimizedreconstruction error or equivalently maximizes the variance.
• Eigenvalue problem.• Probabilistic interpretation also leads to EM.
• Why may PCA not be appropriate for genetic data ?
PCA and admixture models PCA 33 / 57
Top Related