EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives,...

40
EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling Course January 17, 2020

Transcript of EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives,...

Page 1: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

EM Structure Archives, Validation and Challenges

Cathy LawsonEMDataResource & RCSB

Rutgers University

S2C2 Modeling CourseJanuary 17, 2020

Page 2: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Stanford University/SLAC

Rutgers University European Bioinformatics Institute

Unified Data Resource for 3DEM

■ Established 2007 under NIGMS Support (R01GM079429) ■ Develop Data Archives for 3DEM (EMDB + PDB)■ Promote Community Development of Validation and

Standards

Page 3: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Project Website

https://www.emdataresource.org

Page 4: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Growth of EM Archives

Page 5: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

EMDB maps by year and resolution

Page 6: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Finding Cryo-EM Structuresemdataresource.org/search.html

Page 7: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

EMDR Search

Page 8: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

EMDR Search Options / Demo

https://www.emdataresource.org

What EMDB entries do you want to search for?

Page 9: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Cryo-EM Structure Deposition

EMDB, PDB

Page 10: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

wwPDB OneDep System

■ Deposition system for X-ray, NMR, and EM Structures■ EM: Deposit map to EMDB, coordinate model to PDB ■ Validation report is produced

Page 11: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

File uploads for EM

deposition

Page 12: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

FSC Curve■ Upload XML format file■ Create via software package (e.g., Relion, EMAN,

cisTEM), or FSC serverPDBe.org/FSC

Page 13: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

mmCIF Data Dictionary for cryo-EMTop Levelem_experimentem_software

Sample Descriptionem_entity_assemblyem_entity_assembly_molwtem_entity_assembly_naturalsourceem_entity_assembly_recombinantem_virus_entityem_virus_natural_hostem_virus_shell

Data Collectionem_diffractionem_diffraction_shellem_diffraction_statsem_image_recordingem_image_scansem_imagingem_imaging_optics

Sample/Specimen Preparationem_bufferem_buffer_componentem_crystal_formationem_embeddingem_sample_supportem_specimenem_stainingem_vitrification

em_fiducial_markers*em_focused_ion_beam*em_grid_pretreatment*em_high_pressure_freezing*em_shadowing*em_support_film*em_tomography*em_tomography_specimen*em_ultramicrotomy*

Image Processing & Reconstructionem_3d_reconstructionem_image_processingem_particle_selectionem_volume_selectionem_ctf_correction

em_2d_crystal_entityem_3d_crystal_entityem_helical_entityem_single_particle_entity

em_euler_angle_assignment*em_final_classification*em_start_model*

Structure Analysisem_3d_fittingem_3d_fitting_listem_fsc_curve*

Experimental Data

em_map*em_structure_factors*em_layer_lines*

*All categories are collected by the OneDep system. Most categories are archived in both PDB and EMDB; asterisked categories are archived only in EMDB.

Page 14: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Cryo-EM Structure Validation

Page 15: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Validation Report for EM Structuresversion 1 (2016-2019)

■ Map resolution reported by depositor■ Model geometry statistics■ No fit-to-map validation

Page 16: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Validation Report for EM Structuresversion 2 (2020-)

■ New:■ Map and Map+Model Images■ FSC curve plot(s)■ Rotationally averaged power spectrum plot■ Fit-to-Map: Atom inclusion at recommended

contour level plot

Page 17: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Map ImagesEMD-0273

Page 18: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

FSC CurvesEMD-0273

Blue: FSC curve; Vertical Black Line: reported resolution

Calculated by archive fromdeposited half-maps

Calculated bydepositor

Page 19: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Map + Model ImagesEMD-0273 PDB 6UH7

Page 20: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Atom InclusionEMD-0273 PDB 6UH7

Page 21: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Ribosome Example EMD-2914 PDB 5AJ4

Page 22: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Ribosome Example EMD-2914 PDB 5AJ4

Lower contour level

rcsb.org/3d-view/molstar/5aj4

Recommended contour level

Page 23: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

CryoEM Validation Challenges

Page 24: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

EM Validation Task Force 2010 Recommendations

■ Full FSC curve from independent half-maps

■ Model Stereochemistry (same as X-ray / NMR)

■ Other Metrics: More Research Needed

Henderson et al. (2012) Structure 20, 205-214http://www.ncbi.nlm.nih.gov/pubmed/22325770

vtf.emdataresource.org

Page 25: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Single Particle Cryo-EM in PDBRelease Year vs Resolution

Validation in a Changing Landscape

■ How accurate are the maps and their model interpretations?

■ What criteria are currently being used and are they good enough?

model modelmodel& map

Page 26: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

2016 Map & Model Challenges■ Reconstruction and Modelling tasks at 2.5-5 Å■ Major outcomes:

■ Estimation of map resolution needs to be better standardized across the community

■ Novel model-based methods may be useful for estimating map resolvability

■ Further review of global fit-to-map metrics is needed

Page 27: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

2019 Model Metrics Challenge

■ Goal: Identify metrics most suitable for evaluating and comparing fit of atomic coordinate models into cryo-EM maps for specimens in the 1.5-4.0 Åreported overall resolution range.

■ We received 63 models from 16 modelling teams■ 51 of 63 were modelled using ab initio methods

Page 28: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Model Compare Pipeline

■ “Laboratory” for evaluating assessments

■ http://model-compare.emdataresource.org

Andriy KryshtafovychUC Davis

Page 29: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Model Challenge Meeting @ Stanford/SLAC June 2019

■ External Advisors/Assessors: Peter Rosenthal, Paul Emsley, Jane Richardson, Paul Adams, James Fraser, Frank DiMaio, Pavel Afonine, Tom Terwilliger, Mark Herzik

■ Challengers: Soon Wen Hoh, Gunnar Schroeder, Andrea Vaiana, GrzegorzChojnowski, Daisuke Kihara, Pavel Afonine, Abishek Singharoy, XiaodiYu, Liguo Wang, Frank DiMaio, Matt Baker

■ EMDataResource: Andriy Kryshtafovych, Cathy Lawson, Wah Chiu, Greg Pintilie, Helen Berman

Page 30: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Correlation

Full map density TEMPY CCC | PHENIX box_CC

Density within a maskTEMPy CCC_overlap | Segment Mander’s Overlap

PHENIX CC_peaks | CC_volume | CC_mask

Density-derived functions TEMPY Mutual Information(MI) | MI_overlap | Laplacian Filtered

Density at atom positions MAPQ Q-score: vs Reference Gaussians (r=0-2 Å)

FSC curveSingle point PHENIX Resolution Map-Model FSC = 0.5

Integration CCPEM REFMAC5 FSCavg curve area to defined resolution limit

Atom Inclusion TEMPy Envelope | EMDB Atom Inclusion

Rotamer EMRinger Z-score protein Cg-atom paths around c1

ConformationBackbone

CaBLAM Cɑ-trace Cɑ-only virtual dihedrals

CaBLAM Conformation Cɑ and CO-containing virtual dihedrals

MOLPROBITY Ramachandran

Sidechain MOLPROBITY Rotamer

Valence Geometry PHENIX Bond | Bond angle | Chirality | Planarity | Dihedral

Clashes MOLPROBITY Clashscore

Energy PROQ3 energy and predicted features

Superposition

Cɑ Superposition OPENSTRUCT RMSD-Cɑ

Distance cutoffsOPENSTRUCT Global Distance Calculation (GDC) all | sidechain

Global Distance Test (GDT) total score | high accuracy

Sequence assignment PHENIX seq match | Cɑ atom position match | overall score

*Multiple references DAVIS-QA average of pairwise GDT_TS scores

DistancesPer chain LDDT Local difference distance test

All chains OPENSTRUCT oligomeric LDDT | weighted oligomeric LDDT

Contacts

Contact area CAD Contact Area Difference

Shared contacts OPENSTRUCT Quaternary Structure (QS) best, global

Hydrogen bonds HBPLUS H-bond Precision all | nonlocal | Similarity all | nonlocal

Coordinates Only

Fit to Map

vs Reference Model

vs Models Consensus*

Page 31: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Average correlation per target map

Global Fit-to-Map Metrics Comparison

Correlation across all four target maps

Page 32: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

3 Fit-to-Map Metrics yielded Resolution-Sensitive Model Ranking

■ Within single map targets, all fit-to-map metrics were largely equivalent (gave similar model rankings)

■ Across all map targets, only 3 fit-to-map metrics yielded resolution-sensitive (human-intuitive) model rankings:■ Q-score■ EMRinger■ Map-Model FSC @ 0.5

Page 33: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

APOF 1.8 Å APOF 2.3 Å APOF 3.1 Å

Resolution-Sensitive Ranking

> >

Page 34: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Resolution-Sensitive Ranking: Q-Score

APOF 1.8 Å

APOF 2.3 Å

APOF 3.1 Å

ADH 2.9 Å

Reference model score Submitted models scores

APOF: 3ajoADH: 6nbb

Page 35: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Q-score: novel way to estimate map resolution

00.10.20.30.40.50.60.70.80.91

1.0 2.0 3.0 4.0 5.0 6.0 7.00

0.10.20.30.40.50.60.70.80.91

1.0 2.0 3.0 4.0 5.0 6.0

Avg.

Q-s

core

(RN

A at

oms)

Avg.

Q-s

core

(Pro

tein

ato

ms)

Reported Resolution (Å)Reported Resolution (Å)

y=-0.178x + 1.119r2=0.901

y=-0.138x + 0.997r2=0.897

Pintilie et al Nature Methods in press

Protein RNA

Maps from EMDB and Models fromPDB

Page 36: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

CaBLAM: proxy for areference model

■ CaBLAM compares virtual dihedrals (based on backbone carbonyl-O, Cɑ) to PDB statistics

■ Especially valuable when carbonyl-O’s are not obvious (>3 Å)

■ CaBLAM performed similarly to metrics commonly used in CASP competitions Cross-correlations between CaBLAM and

“vs. Reference Model” metrics scores for all models in

the 2019 Model Challenge

Page 37: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

2019 Model Challenge Major Outcomes■ Most ab initio methods represented in the challenge

performed extremely well■ Most fit-to-map metrics are fine for optimization

against a single experimental map■ Resolution-sensitive metrics are best for ranking

diverse structures in an archive■ CaBLAM is a valuable new tool for evaluating protein

backbone conformation issues

Page 38: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

EM Structure Validation ServersMap: Service/Name LinkOverall Shape & Hand Tilt-Pair pdbe.org/tiltpair

Resolution by FSC FSC pdbe.org/FSC

Local Resolution 3DFSC 3dfsc.salk.edu

Local Resolution Scipion scipion.cnb.csic.es/m/myresmap#

Model: Service/Name LinkStereochemistry, compare with all PDB structures

wwPDB validate.wwpdb.org

Stereochemistry Molprobity molprobity.biochem.duke.edu

Nucleic Acid conformation DNATCO dnatco.org

Map/Model Fit: Service/Name Link“backbone bumpiness” EMRinger emringer.com (@UCSF)

See also: www.emdataresource.org/validation.html

Page 39: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

Stanford University/SLAC

Rutgers University European Bioinformatics Institute

Unified Data Resource for 3DEM

www.emdataresource.org

Page 40: EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives, Validation and Challenges Cathy Lawson EMDataResource & RCSB Rutgers University S2C2 Modeling

References

■ Lawson CL, Chiu W (2018) Comparing cryo-EM structures (Editorial). J Struct Biol. 204, 523-526. 10.1016/j.jsb.2018.10.004

■ Patwardhan A & Lawson CL (2016). Databases and Archiving for CryoEM. Methods Enzymol 579, 393-412. 10.1016/bs.mie.2016.04.015

■ Lagerstedt I, et al (2013). Web-based visualisation and analysis of 3D electron-microscopy data from EMDB and PDB. J Struct Biol. 184, 173-81. 10.1016/j.jsb.2013.09.021

■ Henderson R, et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure20, 205-214. 10.1016/j.str.2011.12.014