Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton...
Transcript of Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton...
![Page 1: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/1.jpg)
PCA from noisy linearly reduced measurements
Amit Singer
Princeton University
Department of Mathematics and Program in Applied and Computational Mathematics
September 27, 2016
Amit Singer (Princeton University) September 2016 1 / 39
![Page 2: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/2.jpg)
Single particle reconstruction using cryo-EM
Schematic drawing of the imaging process:
The cryo-EM problem:
Amit Singer (Princeton University) September 2016 2 / 39
![Page 3: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/3.jpg)
New detector technology: Exciting times for cryo-EM
www.sciencemag.org SCIENCE VOL 343 28 MARCH 2014 1443
The Resolution Revolution
BIOCHEMISTRY
Werner Kühlbrandt
Advances in detector technology and image
processing are yielding high-resolution
electron cryo-microscopy structures of
biomolecules.
Precise knowledge of the structure of macromolecules in the cell is essen-tial for understanding how they func-
tion. Structures of large macromolecules can now be obtained at near-atomic resolution by averaging thousands of electron microscope images recorded before radiation damage accumulates. This is what Amunts et al. have done in their research article on page 1485 of this issue ( 1), reporting the structure of the large subunit of the mitochondrial ribosome at 3.2 Å resolution by electron cryo-micros-copy (cryo-EM). Together with other recent high-resolution cryo-EM structures ( 2– 4) (see the fi gure), this achievement heralds the beginning of a new era in molecular biology, where structures at near-atomic resolution are no longer the prerogative of x-ray crys-tallography or nuclear magnetic resonance (NMR) spectroscopy.
Ribosomes are ancient, massive protein-RNA complexes that translate the linear genetic code into three-dimensional proteins. Mitochondria—semi-autonomous organelles
A B C
Near-atomic resolution with cryo-EM. (A) The large subunit of the yeast mitochondrial ribosome at 3.2 Å reported by Amunts et al. In the detailed view below, the base pairs of an RNA double helix and a magnesium ion (blue) are clearly resolved. (B) TRPV1 ion channel at 3.4 Å ( 2), with a detailed view of residues lining the ion pore on the four-fold axis of the tetrameric channel. (C) F420-reducing [NiFe] hydrogenase at 3.36 Å ( 3). The detail shows an α helix in the FrhA subunit with resolved side chains. The maps are not drawn to scale.
Amit Singer (Princeton University) September 2016 3 / 39
![Page 4: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/4.jpg)
Cryo-EM in the news...
March 31, 2016
REPORTS
Cite as: Sirohi et al., Science
10.1126/science.aaf5316 (2016).
The 3.8 Å resolution cryo-EM structure of Zika virus Devika Sirohi,1* Zhenguo Chen,1* Lei Sun,1 Thomas Klose,1 Theodore C. Pierson,2 Michael G. Rossmann,1†
Richard J. Kuhn1†
1Markey Center for Structural Biology and Purdue Institute for Inflammation, Immunology and Infectious Disease, Purdue University, West Lafayette, IN 47907, USA. 2Viral Pathogenesis Section, Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
Amit Singer (Princeton University) September 2016 4 / 39
![Page 5: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/5.jpg)
Method of the Year 2015
January 2016 Volume 13 No 1
Single-particle cryo-electron microscopy (cryo-EM) is our choice for Method of theYear 2015 for its newfound ability to solve protein structures at near-atomicresolution. Featured is the 2.2-Å cryo-EM structure of β-galactosidase as recentlyreported by Bartesaghi et al. (Science 348, 1147–1151, 2015). Cover design by ErinDewalt.
Special feature starts on p19.
Amit Singer (Princeton University) September 2016 5 / 39
![Page 6: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/6.jpg)
Big “movie" data, publicly available
http://www.ebi.ac.uk/pdbe/emdb/empiar/
Amit Singer (Princeton University) September 2016 6 / 39
![Page 7: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/7.jpg)
Image formation model and inverse problem
Projection yk
Molecule φk
Electronsource
Rk ∈ SO(3)
Projection images yk = Tk Pkφk + ǫk , k = 1, 2, . . . , n.φk : R3
7→ R is the scattering potential of the k ′th molecule.Pk is 2D tomographic projection, after rotating φk by Rk .Tk is a “filtering" operator due to microscope and detector.Classical cryo-EM problem: Estimate φ = φ1 = φ2 = · · · = φn (and R1, . . . ,Rn) giveny1, . . . , yn.Heterogeneity problem: Estimate φ1, φ2, . . . , φn (and R1, . . . ,Rn) given y1, . . . , yn.
Amit Singer (Princeton University) September 2016 7 / 39
![Page 8: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/8.jpg)
Toy example
Amit Singer (Princeton University) September 2016 8 / 39
![Page 9: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/9.jpg)
E. coli 50S ribosomal subunit
27,000 particle images provided by Dr. Fred Sigworth, Yale MedicalSchool
3D reconstruction by S, Lanhui Wang, and Jane ZhaoAmit Singer (Princeton University) September 2016 9 / 39
![Page 10: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/10.jpg)
Algorithmic and computational challenges
1 Ab-initio modelling and orientation assignment
2 Heterogeneity – resolving structural variability
3 Denoising and 2D class averaging
4 Particle picking
5 Symmetry detection and reconstruction
6 Motion correction
7 Estimation of contrast transfer function and noise statistics
8 Fast reconstruction and refinement
9 Validation and resolution determination
Amit Singer (Princeton University) September 2016 10 / 39
![Page 11: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/11.jpg)
Principal components analysis (PCA)
Given i.i.d. samples x1, . . . , xn of some x ∈ Rp, estimate mean
E[x] and covariance Cov[x].
µn =1n
n∑
k=1
xk Σn =1n
n∑
k=1
(xk − µn) (xk − µn)T
Eigenvectors and eigenvalues of Σn are used for dimensionalityreduction, compression, and denoising
Useful when most variability is captured by a few leading principalcomponents: low rank approximation.
Amit Singer (Princeton University) September 2016 11 / 39
![Page 12: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/12.jpg)
PCA from noisy linearly transformed measurements
Measure with Ak : Rp → Rq, noise variance Cov[ek ] = σ2Iq
yk = Akxk + ek .
Ak
Al
Goal: Estimate E[x] and Cov[x] from y1, . . . , yn.
Amit Singer (Princeton University) September 2016 12 / 39
![Page 13: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/13.jpg)
Motivation: cryo-electron microscopy
Write image yk as
yk = TkPkφk + ek , k = 1, . . . ,n
where:
φk is a three-dimensional molecular volume,
Pk projects volume into an image and Tk filters it, and
ek is measurement noise.
Amit Singer (Princeton University) September 2016 13 / 39
![Page 14: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/14.jpg)
Application I: deconvolution in noise
Goal: Estimate unfiltered projection xk = Pkφk from
yk = Tkxk + ek .
Amit Singer (Princeton University) September 2016 14 / 39
![Page 15: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/15.jpg)
Application I: deconvolution in noise
Goal: Estimate unfiltered projection xk = Pkφk from
yk = Tkxk + ek .
xk
Amit Singer (Princeton University) September 2016 14 / 39
![Page 16: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/16.jpg)
Application I: deconvolution in noise
Goal: Estimate unfiltered projection xk = Pkφk from
yk = Tkxk + ek .
xk Tkxk
Amit Singer (Princeton University) September 2016 14 / 39
![Page 17: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/17.jpg)
Application I: deconvolution in noise
Goal: Estimate unfiltered projection xk = Pkφk from
yk = Tkxk + ek .
xk Tkxk Tkxk + ek
Amit Singer (Princeton University) September 2016 14 / 39
![Page 18: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/18.jpg)
Application I: deconvolution in noise
Goal: Estimate unfiltered projection xk = Pkφk from
yk = Tkxk + ek .
xk Tkxk Tkxk + ek
Filtering Tk : Rp → Rp is singular or ill-conditioned.
Amit Singer (Princeton University) September 2016 14 / 39
![Page 19: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/19.jpg)
Application I: deconvolution in noise
Goal: Estimate unfiltered projection xk = Pkφk from
yk = Tkxk + ek .
xk Tkxk Tkxk + ek
Filtering Tk : Rp → Rp is singular or ill-conditioned.
Using low-rank Cov[x], we can invert Tk and reduce effect of noiseek .
Amit Singer (Princeton University) September 2016 14 / 39
![Page 20: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/20.jpg)
Contrast transfer function: Tk
http://spider.wadsworth.org/spider_doc/spider/docs/techs/ctf/ctf.html
CTF suppresses information and inverts contrast
CTF cannot be trivially inverted (zero crossings)
Additional envelope Gaussian decay of high frequencies
Information lost from one defocus group could be recovered from another group that hasdifferent zero crossings.
Amit Singer (Princeton University) September 2016 15 / 39
![Page 21: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/21.jpg)
Application II: heterogeneity
How to classify images from different molecular structures?1
Class A Class B
1Images courtesy Joachim Frank (Columbia)Amit Singer (Princeton University) September 2016 16 / 39
![Page 22: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/22.jpg)
Application: heterogeneity (cont.)
Goal: Classify images
yk = TkPkφk + ek ,
according to molecular state φk .
Cannot invert since TkPk : Rp → Rq and q < p.
Volumes φk live in low-dimensional space – can be estimatedfrom top eigenvectors of Cov[φk ].
Restrict TkPk to this space, invert, cluster and reconstruct2.
Aggregating large-scale datasets give accurate estimates –despite high noise.
2Penczek et al. (2009)Amit Singer (Princeton University) September 2016 17 / 39
![Page 23: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/23.jpg)
Least-squares estimator (mean)
From yk = Akxk + ek , we have
E[yk ] = AkE[x].
On the other handE[yk ] ≈ yk ,
soyk ≈ AkE[x].
Form least-squares estimator for E[x]
µn = arg minµ
1n
n∑
k=1
‖yk − Akµ‖2.
Amit Singer (Princeton University) September 2016 18 / 39
![Page 24: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/24.jpg)
Least-squares estimator (covariance)
Again yk = Akxk + ek gives
Cov[yk ] = Ak Cov[x]ATk + σ2Iq .
On the other hand
Cov[yk ] ≈ (yk − Akµn) (yk − Akµn)T ,
so(yk − Akµn) (yk − Akµn)
T ≈ AkCov[x]ATk + σ2Iq.
Form least-squares estimator for Cov[x]
Σn = arg minΣ
1n
n∑
k=1
∥∥∥(yk − Akµn) (yk − Akµn)
T − (AkΣAHk + σ2Iq)
∥∥∥
2
F.
Amit Singer (Princeton University) September 2016 19 / 39
![Page 25: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/25.jpg)
Normal equations
Mean estimator µn satisfies(
1n
n∑
k=1
ATk Ak
)
µn =1n
n∑
k=1
ATk yk .
For covariance, Ln(Σn) = Bn, with Ln : Rp×p → Rp×p given by
Ln(Σ) =1n
n∑
k=1
ATk AkΣAT
k Ak
and
Bn =1n
n∑
k=1
ATk (yk − Akµn) (yk − Akµn)
T Ak − σ2ATk Ak .
Resulting linear system in p2 variables can be solved efficientlyusing the conjugate gradient method.
Amit Singer (Princeton University) September 2016 20 / 39
![Page 26: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/26.jpg)
Estimator properties
Taking Ak = Ip and σ2 = 0, recover sample mean and covariance.
Does not rely on a particular distribution of x.
With prior information on x, can regularize by adding terms toleast-squares objective.
Both µn and Σn are consistent estimators3. For n → ∞
µna.s.−−→ E[x]
Σna.s.−−→ Cov[x]
3Katsevich, Katsevich, S (2015)Amit Singer (Princeton University) September 2016 21 / 39
![Page 27: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/27.jpg)
High-dimensional PCA: Marcenko-Pastur
For pure Gaussian white noise of unit variance
yk = ek ,
we have sample covariance 1n
∑nk=1 ykyT
k .
n ≫ pCentral limit theorem
λ
n ≍ pMarcenko-Pastur law
λ
Amit Singer (Princeton University) September 2016 22 / 39
![Page 28: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/28.jpg)
High-dimensional PCA: spiked covariance model
Both signal and noiseyk = xk + ek .
n ≫ pCentral limit theorem
λ
n ≍ pSpiked covariance model4
λ
Signal eigenvalues pop out of bulk when SNR >√
p/n.
4Johnstone (2001)Amit Singer (Princeton University) September 2016 23 / 39
![Page 29: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/29.jpg)
High-dimensional PCA: shrinkage
To estimate Cov[x] fromyk = xk + ek ,
when n ≍ p, we apply shrinkage to eigenvalues of sample covariance5
Original
λ
Shrinkage
λ
5Donoho, Gavish, Johnstone (2013)Amit Singer (Princeton University) September 2016 24 / 39
![Page 30: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/30.jpg)
Shrinkage of Bn
In clean case, yk = Akxk and
Bn =1n
n∑
k=1
ATk Ak (xk − µn) (xk − µn)
T ATk Ak ,
Amit Singer (Princeton University) September 2016 25 / 39
![Page 31: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/31.jpg)
Shrinkage of Bn
In clean case, yk = Akxk and
Bn =1n
n∑
k=1
ATk Ak (xk − µn) (xk − µn)
T ATk Ak ,
so
E[Bn] =1n
n∑
k=1
Cov[ATk Akxk ].
Amit Singer (Princeton University) September 2016 25 / 39
![Page 32: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/32.jpg)
Shrinkage of Bn
In clean case, yk = Akxk and
Bn =1n
n∑
k=1
ATk Ak (xk − µn) (xk − µn)
T ATk Ak ,
soE[Bn] = Cov[AT
k Akxk ]
if the Ak matrices are realizations of random matrices Ak .
Amit Singer (Princeton University) September 2016 25 / 39
![Page 33: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/33.jpg)
Shrinkage of Bn
In clean case, yk = Akxk and
Bn =1n
n∑
k=1
ATk Ak (xk − µn) (xk − µn)
T ATk Ak ,
soE[Bn] = Cov[AT
k Akxk ]
if the Ak matrices are realizations of random matrices Ak .
For large n, we then have
Bn ≈ Cov[ATk Akxk ].
Amit Singer (Princeton University) September 2016 25 / 39
![Page 34: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/34.jpg)
Shrinkage of Bn
In clean case, yk = Akxk and
Bn =1n
n∑
k=1
ATk Ak (xk − µn) (xk − µn)
T ATk Ak ,
soE[Bn] = Cov[AT
k Akxk ]
if the Ak matrices are realizations of random matrices Ak .
For large n, we then have
Bn ≈ Cov[ATk Akxk ].
How do we estimate Cov[ATk Akxk ] given noisy yk?
Amit Singer (Princeton University) September 2016 25 / 39
![Page 35: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/35.jpg)
Shrinkage of Bn (cont.)
Colored noise
ATk yk = AT
k Akxk +
covariance E[ATk Ak ]
︷ ︸︸ ︷
ATk ek .
Amit Singer (Princeton University) September 2016 26 / 39
![Page 36: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/36.jpg)
Shrinkage of Bn (cont.)
Colored noise
ATk yk = AT
k Akxk +
covariance E[ATk Ak ]
︷ ︸︸ ︷
ATk ek .
Whiten by calculating
S =1n
n∑
k=1
ATk Ak ≈ E[AT
k Ak ]
and settingzk = S−1/2AT
k (yk − Akµn).
Amit Singer (Princeton University) September 2016 26 / 39
![Page 37: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/37.jpg)
Shrinkage of Bn (cont.)
Colored noise
ATk yk = AT
k Akxk +
covariance E[ATk Ak ]
︷ ︸︸ ︷
ATk ek .
Whiten by calculating
S =1n
n∑
k=1
ATk Ak ≈ E[AT
k Ak ]
and settingzk = S−1/2AT
k (yk − Akµn).
Calculate sample covariance
Amit Singer (Princeton University) September 2016 26 / 39
![Page 38: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/38.jpg)
Shrinkage of Bn (cont.)
Colored noise
ATk yk = AT
k Akxk +
covariance E[ATk Ak ]
︷ ︸︸ ︷
ATk ek .
Whiten by calculating
S =1n
n∑
k=1
ATk Ak ≈ E[AT
k Ak ]
and settingzk = S−1/2AT
k (yk − Akµn).
Calculate sample covariance
n∑
k=1
zkzTk
Amit Singer (Princeton University) September 2016 26 / 39
![Page 39: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/39.jpg)
Shrinkage of Bn (cont.)
Colored noise
ATk yk = AT
k Akxk +
covariance E[ATk Ak ]
︷ ︸︸ ︷
ATk ek .
Whiten by calculating
S =1n
n∑
k=1
ATk Ak ≈ E[AT
k Ak ]
and settingzk = S−1/2AT
k (yk − Akµn).
Calculate sample covariance, shrink
ρ
(n∑
k=1
zkzTk
)
Amit Singer (Princeton University) September 2016 26 / 39
![Page 40: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/40.jpg)
Shrinkage of Bn (cont.)
Colored noise
ATk yk = AT
k Akxk +
covariance E[ATk Ak ]
︷ ︸︸ ︷
ATk ek .
Whiten by calculating
S =1n
n∑
k=1
ATk Ak ≈ E[AT
k Ak ]
and settingzk = S−1/2AT
k (yk − Akµn).
Calculate sample covariance, shrink, and unwhiten
Bn = S1/2ρ
(
1n
n∑
k=1
zk zTk
)
S1/2.
Amit Singer (Princeton University) September 2016 26 / 39
![Page 41: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/41.jpg)
Deconvolution in noise
Recall noisy, filtered, image
yk = Tkxk + ek
given clean projection xk .
Using covariance estimation method, estimate Cov[xk ].
Construct classical Wiener filter and invert Tk – estimate xk fromyk :
ΣnT Tk (TkΣnT T
k + σ2Iq)−1(yk − Tkµn) + µn.
Amit Singer (Princeton University) September 2016 27 / 39
![Page 42: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/42.jpg)
Effect of shrinkage on simulated data
Number of images ×10 40 1 2 3 4 5 6 7 8 9 10
log
10(R
elat
ive
MS
E)
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
No shrinkage, SNR=1/20Shrinkage, SNR=1/20No shrinkage, SNR=1/40Shrinkage, SNR=1/40
No shrinkage, SNR=1/60Shrinkage, SNR=1/60No shrinkage, SNR=1/100Shrinkage, SNR=1/100
Applying shrinkage to right-hand side Bn in Ln(Σn) = Bn results insignificant increase in accuracy for the covariance estimation.
Amit Singer (Princeton University) September 2016 28 / 39
![Page 43: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/43.jpg)
Experimental results: TRPV1 (EMPIAR-10005)
Dataset of 35645 projection images 256-by-256 pixels large.Liao et al., Nature 2013; Bhamre, Zhang, S, JSB 2016
Input
Amit Singer (Princeton University) September 2016 29 / 39
![Page 44: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/44.jpg)
Experimental results: TRPV1 (EMPIAR-10005)
Dataset of 35645 projection images 256-by-256 pixels large.Liao et al., Nature 2013; Bhamre, Zhang, S, JSB 2016
Input Filtered
Amit Singer (Princeton University) September 2016 29 / 39
![Page 45: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/45.jpg)
Experimental results: TRPV1 (EMPIAR-10005)
Dataset of 35645 projection images 256-by-256 pixels large.Liao et al., Nature 2013; Bhamre, Zhang, S, JSB 2016
Input Filtered Closest projection
Amit Singer (Princeton University) September 2016 29 / 39
![Page 46: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/46.jpg)
Experimental data - TRPV1
Raw Closest projection TWF CWF
K2 direct electron detector
35645 motion corrected, picked particle images of 256×256 pixels
Amit Singer (Princeton University) September 2016 30 / 39
![Page 47: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/47.jpg)
Experimental data - 80S ribosome
Raw Closest projection TWF CWF
FALCON II 4k×4k direct electron detector
105247 motion corrected, picked particle images of 360×360pixels
Amit Singer (Princeton University) September 2016 31 / 39
![Page 48: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/48.jpg)
Experimental data -IP3R1
Raw Closest projection TWF CWF
Gatan 4k×4k CCD
37382 picked particle images of 256×256 pixels
Amit Singer (Princeton University) September 2016 32 / 39
![Page 49: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/49.jpg)
2D covariance estimation — further remarks
Steerable PCA: covariance matrix commutes with in-planerotation, block diagonalized in Fourier-Bessel basis, computationalcost O(nN3) (Zhao, Shkolnisky, S, IEEE TCI 2016)
Wiener filtering: taking into account that estimated covariancediffers from population covariance (S and Wu, SIAM Imaging2013)
Visualization of underlying particles without class averaging
Outlier detection of picked particles
Improved class averages
Amit Singer (Princeton University) September 2016 33 / 39
![Page 50: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/50.jpg)
Heterogeneity problem
Given images
yk =
Ak︷ ︸︸ ︷
TkPk xk + ek
classify yk according to molecular state xk .
We use proposed method to estimate covariance Cov[xk ].
Reduce computational and memory costs by writing Ln as aconvolution and circulant preconditioning.
Dominant eigenvectors of Σn define low-dimensional subspacewhere volumes xk live. For C classes, rank Cov[xk ] = C − 1.
We can invert TkPk to find coordinates in subspace and cluster.
Amit Singer (Princeton University) September 2016 34 / 39
![Page 51: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/51.jpg)
Experimental results: 70S ribosome
Dataset of 10000 images (130-by-130), 2 classes, courtesy JoachimFrank (Columbia University). Downsampled to N = 16.
Largest 32 eigenvalues
0
1
2
3
4
5
Coordinate histogram
00
100
200
300
400
Clustering accuracy 89% with respect to dataset labeling.
Took 26 min. on a 2.7 GHz, 12-core CPU.
Amit Singer (Princeton University) September 2016 35 / 39
![Page 52: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/52.jpg)
Conclusions
Least-squares approach to covariance estimation from noisypartial measurements is a simple but powerful method tocharacterize the distribution of the underlying data.
In the high-dimensional regime, eigenvalue shrinkage cansignificantly improve the performance of the estimator.
Covariance information can be leveraged to great effect indenoising and classification tasks such as those encountered incryo-EM.
Amit Singer (Princeton University) September 2016 36 / 39
![Page 53: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/53.jpg)
Current and future work
Incorporating constraints on covariance Σn, such as sparsity.
Replacing the least squares cost with a robust cost function.
Theoretical understanding of bulk noise distribution in Σn, not justin Bn.
Application to other tasks, such as embryo development, magneticresonance imaging, and matrix completion.
Amit Singer (Princeton University) September 2016 37 / 39
![Page 54: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/54.jpg)
ASPIRE: Algorithms for Single Particle Reconstruction
Open source toolbox, publicly available:http://spr.math.princeton.edu/
Amit Singer (Princeton University) September 2016 38 / 39
![Page 55: Amit Singer - Institute for Mathematics and its …...2016/09/06 · Amit Singer (Princeton University) September 2016 10 / 39 Principal components analysis (PCA) Given i.i.d. samples](https://reader033.fdocuments.in/reader033/viewer/2022042312/5eda8cf55f8d0d7f302a4fe7/html5/thumbnails/55.jpg)
References
A. Singer, H.-T. Wu, “Two-Dimensional tomography from noisyprojections taken at unknown random directions", SIAM Journalon Imaging Sciences, 6 (1), pp. 136-175, (2013).
E. Katsevich, A. Katsevich, A. Singer, “Covariance MatrixEstimation for the Cryo-EM Heterogeneity Problem", SIAMJournal on Imaging Sciences, 8 (1), pp. 126–185 (2015).
J. Andén, E. Katsevich, A. Singer, “Covariance estimation usingconjugate gradient for 3D classification in Cryo-EM", in IEEE 12thInternational Symposium on Biomedical Imaging (ISBI 2015), pp.200–204, 16-19 April 2015.
T. Bhamre, T. Zhang, A. Singer, “Denoising and CovarianceEstimation of Single Particle Cryo-EM Images", Journal ofStructural Biology, 195 (1), pp. 72–81 (2016).
Amit Singer (Princeton University) September 2016 39 / 39