EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives,...
Transcript of EM Structure Archives, Validation and Challenges · 2020-01-17 · EM Structure Archives,...
EM Structure Archives, Validation and Challenges
Cathy LawsonEMDataResource & RCSB
Rutgers University
S2C2 Modeling CourseJanuary 17, 2020
Stanford University/SLAC
Rutgers University European Bioinformatics Institute
Unified Data Resource for 3DEM
■ Established 2007 under NIGMS Support (R01GM079429) ■ Develop Data Archives for 3DEM (EMDB + PDB)■ Promote Community Development of Validation and
Standards
Growth of EM Archives
EMDB maps by year and resolution
Finding Cryo-EM Structuresemdataresource.org/search.html
EMDR Search
EMDR Search Options / Demo
https://www.emdataresource.org
What EMDB entries do you want to search for?
Cryo-EM Structure Deposition
EMDB, PDB
wwPDB OneDep System
■ Deposition system for X-ray, NMR, and EM Structures■ EM: Deposit map to EMDB, coordinate model to PDB ■ Validation report is produced
File uploads for EM
deposition
FSC Curve■ Upload XML format file■ Create via software package (e.g., Relion, EMAN,
cisTEM), or FSC serverPDBe.org/FSC
mmCIF Data Dictionary for cryo-EMTop Levelem_experimentem_software
Sample Descriptionem_entity_assemblyem_entity_assembly_molwtem_entity_assembly_naturalsourceem_entity_assembly_recombinantem_virus_entityem_virus_natural_hostem_virus_shell
Data Collectionem_diffractionem_diffraction_shellem_diffraction_statsem_image_recordingem_image_scansem_imagingem_imaging_optics
Sample/Specimen Preparationem_bufferem_buffer_componentem_crystal_formationem_embeddingem_sample_supportem_specimenem_stainingem_vitrification
em_fiducial_markers*em_focused_ion_beam*em_grid_pretreatment*em_high_pressure_freezing*em_shadowing*em_support_film*em_tomography*em_tomography_specimen*em_ultramicrotomy*
Image Processing & Reconstructionem_3d_reconstructionem_image_processingem_particle_selectionem_volume_selectionem_ctf_correction
em_2d_crystal_entityem_3d_crystal_entityem_helical_entityem_single_particle_entity
em_euler_angle_assignment*em_final_classification*em_start_model*
Structure Analysisem_3d_fittingem_3d_fitting_listem_fsc_curve*
Experimental Data
em_map*em_structure_factors*em_layer_lines*
*All categories are collected by the OneDep system. Most categories are archived in both PDB and EMDB; asterisked categories are archived only in EMDB.
Cryo-EM Structure Validation
Validation Report for EM Structuresversion 1 (2016-2019)
■ Map resolution reported by depositor■ Model geometry statistics■ No fit-to-map validation
Validation Report for EM Structuresversion 2 (2020-)
■ New:■ Map and Map+Model Images■ FSC curve plot(s)■ Rotationally averaged power spectrum plot■ Fit-to-Map: Atom inclusion at recommended
contour level plot
Map ImagesEMD-0273
FSC CurvesEMD-0273
Blue: FSC curve; Vertical Black Line: reported resolution
Calculated by archive fromdeposited half-maps
Calculated bydepositor
Map + Model ImagesEMD-0273 PDB 6UH7
Atom InclusionEMD-0273 PDB 6UH7
Ribosome Example EMD-2914 PDB 5AJ4
Ribosome Example EMD-2914 PDB 5AJ4
Lower contour level
rcsb.org/3d-view/molstar/5aj4
Recommended contour level
CryoEM Validation Challenges
EM Validation Task Force 2010 Recommendations
■ Full FSC curve from independent half-maps
■ Model Stereochemistry (same as X-ray / NMR)
■ Other Metrics: More Research Needed
Henderson et al. (2012) Structure 20, 205-214http://www.ncbi.nlm.nih.gov/pubmed/22325770
vtf.emdataresource.org
Single Particle Cryo-EM in PDBRelease Year vs Resolution
Validation in a Changing Landscape
■ How accurate are the maps and their model interpretations?
■ What criteria are currently being used and are they good enough?
model modelmodel& map
2016 Map & Model Challenges■ Reconstruction and Modelling tasks at 2.5-5 Å■ Major outcomes:
■ Estimation of map resolution needs to be better standardized across the community
■ Novel model-based methods may be useful for estimating map resolvability
■ Further review of global fit-to-map metrics is needed
2019 Model Metrics Challenge
■ Goal: Identify metrics most suitable for evaluating and comparing fit of atomic coordinate models into cryo-EM maps for specimens in the 1.5-4.0 Åreported overall resolution range.
■ We received 63 models from 16 modelling teams■ 51 of 63 were modelled using ab initio methods
Model Compare Pipeline
■ “Laboratory” for evaluating assessments
■ http://model-compare.emdataresource.org
Andriy KryshtafovychUC Davis
Model Challenge Meeting @ Stanford/SLAC June 2019
■ External Advisors/Assessors: Peter Rosenthal, Paul Emsley, Jane Richardson, Paul Adams, James Fraser, Frank DiMaio, Pavel Afonine, Tom Terwilliger, Mark Herzik
■ Challengers: Soon Wen Hoh, Gunnar Schroeder, Andrea Vaiana, GrzegorzChojnowski, Daisuke Kihara, Pavel Afonine, Abishek Singharoy, XiaodiYu, Liguo Wang, Frank DiMaio, Matt Baker
■ EMDataResource: Andriy Kryshtafovych, Cathy Lawson, Wah Chiu, Greg Pintilie, Helen Berman
Correlation
Full map density TEMPY CCC | PHENIX box_CC
Density within a maskTEMPy CCC_overlap | Segment Mander’s Overlap
PHENIX CC_peaks | CC_volume | CC_mask
Density-derived functions TEMPY Mutual Information(MI) | MI_overlap | Laplacian Filtered
Density at atom positions MAPQ Q-score: vs Reference Gaussians (r=0-2 Å)
FSC curveSingle point PHENIX Resolution Map-Model FSC = 0.5
Integration CCPEM REFMAC5 FSCavg curve area to defined resolution limit
Atom Inclusion TEMPy Envelope | EMDB Atom Inclusion
Rotamer EMRinger Z-score protein Cg-atom paths around c1
ConformationBackbone
CaBLAM Cɑ-trace Cɑ-only virtual dihedrals
CaBLAM Conformation Cɑ and CO-containing virtual dihedrals
MOLPROBITY Ramachandran
Sidechain MOLPROBITY Rotamer
Valence Geometry PHENIX Bond | Bond angle | Chirality | Planarity | Dihedral
Clashes MOLPROBITY Clashscore
Energy PROQ3 energy and predicted features
Superposition
Cɑ Superposition OPENSTRUCT RMSD-Cɑ
Distance cutoffsOPENSTRUCT Global Distance Calculation (GDC) all | sidechain
Global Distance Test (GDT) total score | high accuracy
Sequence assignment PHENIX seq match | Cɑ atom position match | overall score
*Multiple references DAVIS-QA average of pairwise GDT_TS scores
DistancesPer chain LDDT Local difference distance test
All chains OPENSTRUCT oligomeric LDDT | weighted oligomeric LDDT
Contacts
Contact area CAD Contact Area Difference
Shared contacts OPENSTRUCT Quaternary Structure (QS) best, global
Hydrogen bonds HBPLUS H-bond Precision all | nonlocal | Similarity all | nonlocal
Coordinates Only
Fit to Map
vs Reference Model
vs Models Consensus*
Average correlation per target map
Global Fit-to-Map Metrics Comparison
Correlation across all four target maps
3 Fit-to-Map Metrics yielded Resolution-Sensitive Model Ranking
■ Within single map targets, all fit-to-map metrics were largely equivalent (gave similar model rankings)
■ Across all map targets, only 3 fit-to-map metrics yielded resolution-sensitive (human-intuitive) model rankings:■ Q-score■ EMRinger■ Map-Model FSC @ 0.5
APOF 1.8 Å APOF 2.3 Å APOF 3.1 Å
Resolution-Sensitive Ranking
> >
Resolution-Sensitive Ranking: Q-Score
APOF 1.8 Å
APOF 2.3 Å
APOF 3.1 Å
ADH 2.9 Å
Reference model score Submitted models scores
APOF: 3ajoADH: 6nbb
Q-score: novel way to estimate map resolution
00.10.20.30.40.50.60.70.80.91
1.0 2.0 3.0 4.0 5.0 6.0 7.00
0.10.20.30.40.50.60.70.80.91
1.0 2.0 3.0 4.0 5.0 6.0
Avg.
Q-s
core
(RN
A at
oms)
Avg.
Q-s
core
(Pro
tein
ato
ms)
Reported Resolution (Å)Reported Resolution (Å)
y=-0.178x + 1.119r2=0.901
y=-0.138x + 0.997r2=0.897
Pintilie et al Nature Methods in press
Protein RNA
Maps from EMDB and Models fromPDB
CaBLAM: proxy for areference model
■ CaBLAM compares virtual dihedrals (based on backbone carbonyl-O, Cɑ) to PDB statistics
■ Especially valuable when carbonyl-O’s are not obvious (>3 Å)
■ CaBLAM performed similarly to metrics commonly used in CASP competitions Cross-correlations between CaBLAM and
“vs. Reference Model” metrics scores for all models in
the 2019 Model Challenge
2019 Model Challenge Major Outcomes■ Most ab initio methods represented in the challenge
performed extremely well■ Most fit-to-map metrics are fine for optimization
against a single experimental map■ Resolution-sensitive metrics are best for ranking
diverse structures in an archive■ CaBLAM is a valuable new tool for evaluating protein
backbone conformation issues
EM Structure Validation ServersMap: Service/Name LinkOverall Shape & Hand Tilt-Pair pdbe.org/tiltpair
Resolution by FSC FSC pdbe.org/FSC
Local Resolution 3DFSC 3dfsc.salk.edu
Local Resolution Scipion scipion.cnb.csic.es/m/myresmap#
Model: Service/Name LinkStereochemistry, compare with all PDB structures
wwPDB validate.wwpdb.org
Stereochemistry Molprobity molprobity.biochem.duke.edu
Nucleic Acid conformation DNATCO dnatco.org
Map/Model Fit: Service/Name Link“backbone bumpiness” EMRinger emringer.com (@UCSF)
See also: www.emdataresource.org/validation.html
Stanford University/SLAC
Rutgers University European Bioinformatics Institute
Unified Data Resource for 3DEM
www.emdataresource.org
References
■ Lawson CL, Chiu W (2018) Comparing cryo-EM structures (Editorial). J Struct Biol. 204, 523-526. 10.1016/j.jsb.2018.10.004
■ Patwardhan A & Lawson CL (2016). Databases and Archiving for CryoEM. Methods Enzymol 579, 393-412. 10.1016/bs.mie.2016.04.015
■ Lagerstedt I, et al (2013). Web-based visualisation and analysis of 3D electron-microscopy data from EMDB and PDB. J Struct Biol. 184, 173-81. 10.1016/j.jsb.2013.09.021
■ Henderson R, et al (2012) Outcome of the first electron microscopy validation task force meeting. Structure20, 205-214. 10.1016/j.str.2011.12.014