Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton...

55
PCA from noisy linearly reduced measurements Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics September 27, 2016 Amit Singer (Princeton University) September 2016 1 / 39

Transcript of Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton...

Page 1: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

PCA from noisy linearly reduced measurements

Amit Singer

Princeton University

Department of Mathematics and Program in Applied and Computational Mathematics

September 27, 2016

Amit Singer (Princeton University) September 2016 1 / 39

Page 2: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Single particle reconstruction using cryo-EM

Schematic drawing of the imaging process:

The cryo-EM problem:

Amit Singer (Princeton University) September 2016 2 / 39

Page 3: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

New detector technology: Exciting times for cryo-EM

www.sciencemag.org SCIENCE VOL 343 28 MARCH 2014 1443

The Resolution Revolution

BIOCHEMISTRY

Werner Kühlbrandt

Advances in detector technology and image

processing are yielding high-resolution

electron cryo-microscopy structures of

biomolecules.

Precise knowledge of the structure of macromolecules in the cell is essen-tial for understanding how they func-

tion. Structures of large macromolecules can now be obtained at near-atomic resolution by averaging thousands of electron microscope images recorded before radiation damage accumulates. This is what Amunts et al. have done in their research article on page 1485 of this issue ( 1), reporting the structure of the large subunit of the mitochondrial ribosome at 3.2 Å resolution by electron cryo-micros-copy (cryo-EM). Together with other recent high-resolution cryo-EM structures ( 2– 4) (see the fi gure), this achievement heralds the beginning of a new era in molecular biology, where structures at near-atomic resolution are no longer the prerogative of x-ray crys-tallography or nuclear magnetic resonance (NMR) spectroscopy.

Ribosomes are ancient, massive protein-RNA complexes that translate the linear genetic code into three-dimensional proteins. Mitochondria—semi-autonomous organelles

A B C

Near-atomic resolution with cryo-EM. (A) The large subunit of the yeast mitochondrial ribosome at 3.2 Å reported by Amunts et al. In the detailed view below, the base pairs of an RNA double helix and a magnesium ion (blue) are clearly resolved. (B) TRPV1 ion channel at 3.4 Å ( 2), with a detailed view of residues lining the ion pore on the four-fold axis of the tetrameric channel. (C) F420-reducing [NiFe] hydrogenase at 3.36 Å ( 3). The detail shows an α helix in the FrhA subunit with resolved side chains. The maps are not drawn to scale.

Amit Singer (Princeton University) September 2016 3 / 39

Page 4: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Cryo-EM in the news...

March 31, 2016

REPORTS

Cite as: Sirohi et al., Science

10.1126/science.aaf5316 (2016).

The 3.8 Å resolution cryo-EM structure of Zika virus Devika Sirohi,1* Zhenguo Chen,1* Lei Sun,1 Thomas Klose,1 Theodore C. Pierson,2 Michael G. Rossmann,1†

Richard J. Kuhn1†

1Markey Center for Structural Biology and Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue University, West Lafayette, IN 47907, USA. 2Viral Pathogenesis Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.

Amit Singer (Princeton University) September 2016 4 / 39

Page 5: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Method of the Year 2015

January 2016 Volume 13 No 1

Single-particle cryo-electron microscopy (cryo-EM) is our choice for Method of theYear 2015 for its newfound ability to solve protein structures at near-atomicresolution. Featured is the 2.2-Å cryo-EM structure of β-galactosidase as recentlyreported by Bartesaghi et al. (Science 348, 1147–1151, 2015). Cover design by ErinDewalt.

Special feature starts on p19.

Amit Singer (Princeton University) September 2016 5 / 39

Page 6: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Big “movie" data, publicly available

http://www.ebi.ac.uk/pdbe/emdb/empiar/

Amit Singer (Princeton University) September 2016 6 / 39

Page 7: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Image formation model and inverse problem

Projection yk

Molecule φk

Electronsource

Rk ∈ SO(3)

Projection images yk = Tk Pkφk + ǫk , k = 1, 2, . . . , n.φk : R3

7→ R is the scattering potential of the k ′th molecule.Pk is 2D tomographic projection, after rotating φk by Rk .Tk is a “filtering" operator due to microscope and detector.Classical cryo-EM problem: Estimate φ = φ1 = φ2 = · · · = φn (and R1, . . . ,Rn) giveny1, . . . , yn.Heterogeneity problem: Estimate φ1, φ2, . . . , φn (and R1, . . . ,Rn) given y1, . . . , yn.

Amit Singer (Princeton University) September 2016 7 / 39

Page 8: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Toy example

Amit Singer (Princeton University) September 2016 8 / 39

Page 9: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

E. coli 50S ribosomal subunit

27,000 particle images provided by Dr. Fred Sigworth, Yale MedicalSchool

3D reconstruction by S, Lanhui Wang, and Jane ZhaoAmit Singer (Princeton University) September 2016 9 / 39

Page 10: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Algorithmic and computational challenges

1 Ab-initio modelling and orientation assignment

2 Heterogeneity – resolving structural variability

3 Denoising and 2D class averaging

4 Particle picking

5 Symmetry detection and reconstruction

6 Motion correction

7 Estimation of contrast transfer function and noise statistics

8 Fast reconstruction and refinement

9 Validation and resolution determination

Amit Singer (Princeton University) September 2016 10 / 39

Page 11: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Principal components analysis (PCA)

Given i.i.d. samples x1, . . . , xn of some x ∈ Rp, estimate mean

E[x] and covariance Cov[x].

µn =1n

n∑

k=1

xk Σn =1n

n∑

k=1

(xk − µn) (xk − µn)T

Eigenvectors and eigenvalues of Σn are used for dimensionalityreduction, compression, and denoising

Useful when most variability is captured by a few leading principalcomponents: low rank approximation.

Amit Singer (Princeton University) September 2016 11 / 39

Page 12: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

PCA from noisy linearly transformed measurements

Measure with Ak : Rp → Rq, noise variance Cov[ek ] = σ2Iq

yk = Akxk + ek .

Ak

Al

Goal: Estimate E[x] and Cov[x] from y1, . . . , yn.

Amit Singer (Princeton University) September 2016 12 / 39

Page 13: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Motivation: cryo-electron microscopy

Write image yk as

yk = TkPkφk + ek , k = 1, . . . ,n

where:

φk is a three-dimensional molecular volume,

Pk projects volume into an image and Tk filters it, and

ek is measurement noise.

Amit Singer (Princeton University) September 2016 13 / 39

Page 14: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application I: deconvolution in noise

Goal: Estimate unfiltered projection xk = Pkφk from

yk = Tkxk + ek .

Amit Singer (Princeton University) September 2016 14 / 39

Page 15: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application I: deconvolution in noise

Goal: Estimate unfiltered projection xk = Pkφk from

yk = Tkxk + ek .

xk

Amit Singer (Princeton University) September 2016 14 / 39

Page 16: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application I: deconvolution in noise

Goal: Estimate unfiltered projection xk = Pkφk from

yk = Tkxk + ek .

xk Tkxk

Amit Singer (Princeton University) September 2016 14 / 39

Page 17: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application I: deconvolution in noise

Goal: Estimate unfiltered projection xk = Pkφk from

yk = Tkxk + ek .

xk Tkxk Tkxk + ek

Amit Singer (Princeton University) September 2016 14 / 39

Page 18: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application I: deconvolution in noise

Goal: Estimate unfiltered projection xk = Pkφk from

yk = Tkxk + ek .

xk Tkxk Tkxk + ek

Filtering Tk : Rp → Rp is singular or ill-conditioned.

Amit Singer (Princeton University) September 2016 14 / 39

Page 19: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application I: deconvolution in noise

Goal: Estimate unfiltered projection xk = Pkφk from

yk = Tkxk + ek .

xk Tkxk Tkxk + ek

Filtering Tk : Rp → Rp is singular or ill-conditioned.

Using low-rank Cov[x], we can invert Tk and reduce effect of noiseek .

Amit Singer (Princeton University) September 2016 14 / 39

Page 20: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Contrast transfer function: Tk

http://spider.wadsworth.org/spider_doc/spider/docs/techs/ctf/ctf.html

CTF suppresses information and inverts contrast

CTF cannot be trivially inverted (zero crossings)

Additional envelope Gaussian decay of high frequencies

Information lost from one defocus group could be recovered from another group that hasdifferent zero crossings.

Amit Singer (Princeton University) September 2016 15 / 39

Page 21: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application II: heterogeneity

How to classify images from different molecular structures?1

Class A Class B

1Images courtesy Joachim Frank (Columbia)Amit Singer (Princeton University) September 2016 16 / 39

Page 22: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Application: heterogeneity (cont.)

Goal: Classify images

yk = TkPkφk + ek ,

according to molecular state φk .

Cannot invert since TkPk : Rp → Rq and q < p.

Volumes φk live in low-dimensional space – can be estimatedfrom top eigenvectors of Cov[φk ].

Restrict TkPk to this space, invert, cluster and reconstruct2.

Aggregating large-scale datasets give accurate estimates –despite high noise.

2Penczek et al. (2009)Amit Singer (Princeton University) September 2016 17 / 39

Page 23: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Least-squares estimator (mean)

From yk = Akxk + ek , we have

E[yk ] = AkE[x].

On the other handE[yk ] ≈ yk ,

soyk ≈ AkE[x].

Form least-squares estimator for E[x]

µn = arg minµ

1n

n∑

k=1

‖yk − Akµ‖2.

Amit Singer (Princeton University) September 2016 18 / 39

Page 24: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Least-squares estimator (covariance)

Again yk = Akxk + ek gives

Cov[yk ] = Ak Cov[x]ATk + σ2Iq .

On the other hand

Cov[yk ] ≈ (yk − Akµn) (yk − Akµn)T ,

so(yk − Akµn) (yk − Akµn)

T ≈ AkCov[x]ATk + σ2Iq.

Form least-squares estimator for Cov[x]

Σn = arg minΣ

1n

n∑

k=1

∥∥∥(yk − Akµn) (yk − Akµn)

T − (AkΣAHk + σ2Iq)

∥∥∥

2

F.

Amit Singer (Princeton University) September 2016 19 / 39

Page 25: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Normal equations

Mean estimator µn satisfies(

1n

n∑

k=1

ATk Ak

)

µn =1n

n∑

k=1

ATk yk .

For covariance, Ln(Σn) = Bn, with Ln : Rp×p → Rp×p given by

Ln(Σ) =1n

n∑

k=1

ATk AkΣAT

k Ak

and

Bn =1n

n∑

k=1

ATk (yk − Akµn) (yk − Akµn)

T Ak − σ2ATk Ak .

Resulting linear system in p2 variables can be solved efficientlyusing the conjugate gradient method.

Amit Singer (Princeton University) September 2016 20 / 39

Page 26: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Estimator properties

Taking Ak = Ip and σ2 = 0, recover sample mean and covariance.

Does not rely on a particular distribution of x.

With prior information on x, can regularize by adding terms toleast-squares objective.

Both µn and Σn are consistent estimators3. For n → ∞

µna.s.−−→ E[x]

Σna.s.−−→ Cov[x]

3Katsevich, Katsevich, S (2015)Amit Singer (Princeton University) September 2016 21 / 39

Page 27: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

High-dimensional PCA: Marcenko-Pastur

For pure Gaussian white noise of unit variance

yk = ek ,

we have sample covariance 1n

∑nk=1 ykyT

k .

n ≫ pCentral limit theorem

λ

n ≍ pMarcenko-Pastur law

λ

Amit Singer (Princeton University) September 2016 22 / 39

Page 28: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

High-dimensional PCA: spiked covariance model

Both signal and noiseyk = xk + ek .

n ≫ pCentral limit theorem

λ

n ≍ pSpiked covariance model4

λ

Signal eigenvalues pop out of bulk when SNR >√

p/n.

4Johnstone (2001)Amit Singer (Princeton University) September 2016 23 / 39

Page 29: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

High-dimensional PCA: shrinkage

To estimate Cov[x] fromyk = xk + ek ,

when n ≍ p, we apply shrinkage to eigenvalues of sample covariance5

Original

λ

Shrinkage

λ

5Donoho, Gavish, Johnstone (2013)Amit Singer (Princeton University) September 2016 24 / 39

Page 30: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn

In clean case, yk = Akxk and

Bn =1n

n∑

k=1

ATk Ak (xk − µn) (xk − µn)

T ATk Ak ,

Amit Singer (Princeton University) September 2016 25 / 39

Page 31: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn

In clean case, yk = Akxk and

Bn =1n

n∑

k=1

ATk Ak (xk − µn) (xk − µn)

T ATk Ak ,

so

E[Bn] =1n

n∑

k=1

Cov[ATk Akxk ].

Amit Singer (Princeton University) September 2016 25 / 39

Page 32: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn

In clean case, yk = Akxk and

Bn =1n

n∑

k=1

ATk Ak (xk − µn) (xk − µn)

T ATk Ak ,

soE[Bn] = Cov[AT

k Akxk ]

if the Ak matrices are realizations of random matrices Ak .

Amit Singer (Princeton University) September 2016 25 / 39

Page 33: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn

In clean case, yk = Akxk and

Bn =1n

n∑

k=1

ATk Ak (xk − µn) (xk − µn)

T ATk Ak ,

soE[Bn] = Cov[AT

k Akxk ]

if the Ak matrices are realizations of random matrices Ak .

For large n, we then have

Bn ≈ Cov[ATk Akxk ].

Amit Singer (Princeton University) September 2016 25 / 39

Page 34: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn

In clean case, yk = Akxk and

Bn =1n

n∑

k=1

ATk Ak (xk − µn) (xk − µn)

T ATk Ak ,

soE[Bn] = Cov[AT

k Akxk ]

if the Ak matrices are realizations of random matrices Ak .

For large n, we then have

Bn ≈ Cov[ATk Akxk ].

How do we estimate Cov[ATk Akxk ] given noisy yk?

Amit Singer (Princeton University) September 2016 25 / 39

Page 35: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn (cont.)

Colored noise

ATk yk = AT

k Akxk +

covariance E[ATk Ak ]

︷ ︸︸ ︷

ATk ek .

Amit Singer (Princeton University) September 2016 26 / 39

Page 36: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn (cont.)

Colored noise

ATk yk = AT

k Akxk +

covariance E[ATk Ak ]

︷ ︸︸ ︷

ATk ek .

Whiten by calculating

S =1n

n∑

k=1

ATk Ak ≈ E[AT

k Ak ]

and settingzk = S−1/2AT

k (yk − Akµn).

Amit Singer (Princeton University) September 2016 26 / 39

Page 37: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn (cont.)

Colored noise

ATk yk = AT

k Akxk +

covariance E[ATk Ak ]

︷ ︸︸ ︷

ATk ek .

Whiten by calculating

S =1n

n∑

k=1

ATk Ak ≈ E[AT

k Ak ]

and settingzk = S−1/2AT

k (yk − Akµn).

Calculate sample covariance

Amit Singer (Princeton University) September 2016 26 / 39

Page 38: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn (cont.)

Colored noise

ATk yk = AT

k Akxk +

covariance E[ATk Ak ]

︷ ︸︸ ︷

ATk ek .

Whiten by calculating

S =1n

n∑

k=1

ATk Ak ≈ E[AT

k Ak ]

and settingzk = S−1/2AT

k (yk − Akµn).

Calculate sample covariance

n∑

k=1

zkzTk

Amit Singer (Princeton University) September 2016 26 / 39

Page 39: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn (cont.)

Colored noise

ATk yk = AT

k Akxk +

covariance E[ATk Ak ]

︷ ︸︸ ︷

ATk ek .

Whiten by calculating

S =1n

n∑

k=1

ATk Ak ≈ E[AT

k Ak ]

and settingzk = S−1/2AT

k (yk − Akµn).

Calculate sample covariance, shrink

ρ

(n∑

k=1

zkzTk

)

Amit Singer (Princeton University) September 2016 26 / 39

Page 40: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Shrinkage of Bn (cont.)

Colored noise

ATk yk = AT

k Akxk +

covariance E[ATk Ak ]

︷ ︸︸ ︷

ATk ek .

Whiten by calculating

S =1n

n∑

k=1

ATk Ak ≈ E[AT

k Ak ]

and settingzk = S−1/2AT

k (yk − Akµn).

Calculate sample covariance, shrink, and unwhiten

Bn = S1/2ρ

(

1n

n∑

k=1

zk zTk

)

S1/2.

Amit Singer (Princeton University) September 2016 26 / 39

Page 41: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Deconvolution in noise

Recall noisy, filtered, image

yk = Tkxk + ek

given clean projection xk .

Using covariance estimation method, estimate Cov[xk ].

Construct classical Wiener filter and invert Tk – estimate xk fromyk :

ΣnT Tk (TkΣnT T

k + σ2Iq)−1(yk − Tkµn) + µn.

Amit Singer (Princeton University) September 2016 27 / 39

Page 42: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Effect of shrinkage on simulated data

Number of images ×10 40 1 2 3 4 5 6 7 8 9 10

log

10(R

elat

ive

MS

E)

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

No shrinkage, SNR=1/20Shrinkage, SNR=1/20No shrinkage, SNR=1/40Shrinkage, SNR=1/40

No shrinkage, SNR=1/60Shrinkage, SNR=1/60No shrinkage, SNR=1/100Shrinkage, SNR=1/100

Applying shrinkage to right-hand side Bn in Ln(Σn) = Bn results insignificant increase in accuracy for the covariance estimation.

Amit Singer (Princeton University) September 2016 28 / 39

Page 43: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental results: TRPV1 (EMPIAR-10005)

Dataset of 35645 projection images 256-by-256 pixels large.Liao et al., Nature 2013; Bhamre, Zhang, S, JSB 2016

Input

Amit Singer (Princeton University) September 2016 29 / 39

Page 44: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental results: TRPV1 (EMPIAR-10005)

Dataset of 35645 projection images 256-by-256 pixels large.Liao et al., Nature 2013; Bhamre, Zhang, S, JSB 2016

Input Filtered

Amit Singer (Princeton University) September 2016 29 / 39

Page 45: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental results: TRPV1 (EMPIAR-10005)

Dataset of 35645 projection images 256-by-256 pixels large.Liao et al., Nature 2013; Bhamre, Zhang, S, JSB 2016

Input Filtered Closest projection

Amit Singer (Princeton University) September 2016 29 / 39

Page 46: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental data - TRPV1

Raw Closest projection TWF CWF

K2 direct electron detector

35645 motion corrected, picked particle images of 256×256 pixels

Amit Singer (Princeton University) September 2016 30 / 39

Page 47: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental data - 80S ribosome

Raw Closest projection TWF CWF

FALCON II 4k×4k direct electron detector

105247 motion corrected, picked particle images of 360×360pixels

Amit Singer (Princeton University) September 2016 31 / 39

Page 48: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental data -IP3R1

Raw Closest projection TWF CWF

Gatan 4k×4k CCD

37382 picked particle images of 256×256 pixels

Amit Singer (Princeton University) September 2016 32 / 39

Page 49: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

2D covariance estimation — further remarks

Steerable PCA: covariance matrix commutes with in-planerotation, block diagonalized in Fourier-Bessel basis, computationalcost O(nN3) (Zhao, Shkolnisky, S, IEEE TCI 2016)

Wiener filtering: taking into account that estimated covariancediffers from population covariance (S and Wu, SIAM Imaging2013)

Visualization of underlying particles without class averaging

Outlier detection of picked particles

Improved class averages

Amit Singer (Princeton University) September 2016 33 / 39

Page 50: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Heterogeneity problem

Given images

yk =

Ak︷ ︸︸ ︷

TkPk xk + ek

classify yk according to molecular state xk .

We use proposed method to estimate covariance Cov[xk ].

Reduce computational and memory costs by writing Ln as aconvolution and circulant preconditioning.

Dominant eigenvectors of Σn define low-dimensional subspacewhere volumes xk live. For C classes, rank Cov[xk ] = C − 1.

We can invert TkPk to find coordinates in subspace and cluster.

Amit Singer (Princeton University) September 2016 34 / 39

Page 51: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Experimental results: 70S ribosome

Dataset of 10000 images (130-by-130), 2 classes, courtesy JoachimFrank (Columbia University). Downsampled to N = 16.

Largest 32 eigenvalues

0

1

2

3

4

5

Coordinate histogram

00

100

200

300

400

Clustering accuracy 89% with respect to dataset labeling.

Took 26 min. on a 2.7 GHz, 12-core CPU.

Amit Singer (Princeton University) September 2016 35 / 39

Page 52: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Conclusions

Least-squares approach to covariance estimation from noisypartial measurements is a simple but powerful method tocharacterize the distribution of the underlying data.

In the high-dimensional regime, eigenvalue shrinkage cansignificantly improve the performance of the estimator.

Covariance information can be leveraged to great effect indenoising and classification tasks such as those encountered incryo-EM.

Amit Singer (Princeton University) September 2016 36 / 39

Page 53: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

Current and future work

Incorporating constraints on covariance Σn, such as sparsity.

Replacing the least squares cost with a robust cost function.

Theoretical understanding of bulk noise distribution in Σn, not justin Bn.

Application to other tasks, such as embryo development, magneticresonance imaging, and matrix completion.

Amit Singer (Princeton University) September 2016 37 / 39

Page 54: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

ASPIRE: Algorithms for Single Particle Reconstruction

Open source toolbox, publicly available:http://spr.math.princeton.edu/

Amit Singer (Princeton University) September 2016 38 / 39

Page 55: Amit Singer - Institute for Mathematics and its …...2016/09/06  · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples

References

A. Singer, H.-T. Wu, “Two-Dimensional tomography from noisyprojections taken at unknown random directions", SIAM Journalon Imaging Sciences, 6 (1), pp. 136-175, (2013).

E. Katsevich, A. Katsevich, A. Singer, “Covariance MatrixEstimation for the Cryo-EM Heterogeneity Problem", SIAMJournal on Imaging Sciences, 8 (1), pp. 126–185 (2015).

J. Andén, E. Katsevich, A. Singer, “Covariance estimation usingconjugate gradient for 3D classification in Cryo-EM", in IEEE 12thInternational Symposium on Biomedical Imaging (ISBI 2015), pp.200–204, 16-19 April 2015.

T. Bhamre, T. Zhang, A. Singer, “Denoising and CovarianceEstimation of Single Particle Cryo-EM Images", Journal ofStructural Biology, 195 (1), pp. 72–81 (2016).

Amit Singer (Princeton University) September 2016 39 / 39