Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

53
bioexcel.eu Partners Funding Assessing structure quality in the PDB archive Presenters: Matthew Conroy Host: Adam Carter BioExcel Webinar Series 8 February, 2017

Transcript of Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Page 1: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Partners Funding

Assessing structure quality in the PDB archive

Presenters: Matthew ConroyHost: Adam Carter

BioExcel Webinar Series

8 February, 2017

Page 2: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Thiswebinarisbeingrecorded

Page 3: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

BioExcel Overview• Excellence in Biomolecular Software

- Improve the performance, efficiency and scalability of key codes

• Excellence in Usability- Devise efficient workflow environments

with associated data integration

• Excellence in Consultancy and Training- Promote best practices and train end users

DMI Monitor

DMI Enactor

DMI Executor

DMI Enactor

Data Delivery Point

Data Source

Monitoring flow

Data flow

Service Invocation

DMI Optimiser

DMI Planner

DMIValidator

DMI Gateway

DMI Gateway

DMI Gateway

DMI Enactor

Portal / Workbench

DMI Request

DADC Engineer

DMI Expert

Repository

Registry

DMI Expert

Domain Expert

Page 4: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Interest Groups

• Integrative Modeling IG• Free Energy Calculations IG• Hybrid methods for biomolecular systems IG• Biomolecular simulations entry level users IG• Practical applications for industry IG• Training• Workflows

Support platformshttp://bioexcel.eu/contact

Forums Code Repositories Chat channel Video Channel

Page 5: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Audience Q&A session

Please use the Questionsfunction in GoToWebinar

application

Any other questions or points to discuss after the live

webinar? Join the discussion the discussion at

http://ask.bioexcel.eu.

Page 6: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Today’s Presenter

Matthew Conroy is a Scientific Curator at EMBL-EBI working with the Protein Data Bank in Europe. Before joining PDBe, he was solving structures of proteins by X-ray crystallography, NMR and electron microscopy.

6

Page 7: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Protein Data Bank in Europe

PDBe.org

Assessing structure quality in the PDB archive

Matthew Conroy

Page 8: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

What is the Protein Data Bank (PDB)?

PDBe.org

An archive of experimentally determined 3-dimensional structures of biological macromolecules

Protein, nucleic acids, sugars

Page 9: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

wwPDB.org

‘The PDB’FTP Archive

Page 10: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

wwPDB.org

‘The PDB’FTP Archive

Value added data

Value added data

Value added data

Value added data

Page 11: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

From the small… …to the large

Models are interpretations of experimental data

PDBe.org

Copper scavenger <1kDa

Zika virus 4 MDa

Page 12: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

What experimental data are available?

PDBe.org

10% solution NMR SpectroscopyData: Restraints (mandatory since 2008)

Chemical shifts (mand. since 2010)

88% X-ray crystallographyData: Structure factors (mandatory since 2008)

1% Electron microscopyData: Map in EMDB (mandatory since 2016)

Page 13: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Like in all science:

Some data are better than other data

Some models (interpretations of that data) are better than other models

For whatever you need to do, you need to:Find the best modelBe aware of any potential limitations

Different techniques have different strengths

Page 14: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

• “Structures are not absolute truths – they are models that fit the experimental data and therefore have uncertainty and subjectivity associated with them.”

Whaddaya Know: A Guide to Uncertainty and Subjectivity in Structural Biology

Trends in Biochemical SciencesMacKay et al 2017

DOI: 10.1016/j.tibs.2016.11.002

Page 15: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Poll

• What do you use PDB data for primarily?

• Template for homology model

• Protein-Protein complex prediction

• Small molecule (drug) docking studies

• Something else

Page 16: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

All structures in the PDB are models explaining data

PDB archive does not reject structures based on qualityBut we do give an indication of that quality

• Essentially, all models are wrong, but some are useful.

• George E.P. Box

Validating PDB Data

Page 17: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

What is the wwPDB doing to indicate structure quality?

Validation Task Forces

Page 18: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Method-specific Validation Task Forces

Advise us how best to validate:

• the models

• the experimental data

• fit of model to data

Page 19: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Most entries in the PDB archive come withvalidation reportsPDF documents downloadable from wwPDB sitesAlso XML format for machine reading

Model Quality Data Quality* Fit of model to data*

X-ray ✓ ✓ ✓

NMR ✓ ✓ ✓

EM ✓ Not yet, but on EBI’s EMDB pages

Not yet, but on EBI’s EMDB pages

* If data were deposited

Page 20: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

On wwPDB sites(recalculated annually)

Available at deposition(for authors and referees)

Standalone servervalidate.wwpdb.org

Page 21: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

At PDBe, they’re available from search results

PDBe.org

Page 22: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

And also from each entry page

PDBe.org

Page 23: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Start with a summary and gets more detailed

1. Overall quality at a glance

2. Residue-property plots

Highlight outliers

3. Detailed analysis of any potential issues

Page 24: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Start with a summary

• Overall quality at a glance

How does this structure compare to others in the PDB archive?

Several different metrics considered

Page 25: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Poor ranking doesn’t necessarily mean a structure is ‘wrong’

It may mean it’s not as ‘right’ as others

Justified reasons for outliers eg strainAre they supported by data?

PDBe.org

Page 26: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

We use these to rank results at PDBe

Quality

PDBe.org

Page 27: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Overall quality at a glanceaka ‘Summary Sliders’

Atoms bumping into each other

Geometric quality assessed using MolprobityTo compare a structure to others in the PDB

X-ray, NMR and EM are all judged the same way

Surprising bond angles Compared to all PDB

and to similar structures

Page 28: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Of course, these only tell you the model is chemically sensible

Good geometry doesn’t mean it’s right!

Page 29: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Overall quality at a glanceaka ‘Summary Sliders’

Extra metrics for X-ray structuresHow well does the model back-predict the data?Lower values are better

Residues not in electron density

The real-space R-value (RSR) measures fit between a residue and the data. The RSR Z-score (RSRZ) is a normalisation of this specific to the residue type and a resolution bin. Outliers have RSRZ >2

Page 30: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

A look at X-ray data

From the diffraction pattern, a map can be calculated.

This indicates the location of electrons (therefore atoms) in the crystalHence- ‘electron density map’

Model is built into this map in an iterative process

Page 31: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Resolution indicates precision with which atoms can be placed

3.7Å

2.4Å

1.5Å

0.8Å

Page 32: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

In low resolution data, models might simply indicate location/orientation of proteins

37Å EM map of F1Fo ATPase to show role in shaping mitochondrial cristae

PDB entry 4b2q

Page 33: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Colour coded according to the number ofgeometric quality criteria outliersie model quality

Green = 0Yellow = 1Orange = 2Red = 3 or more.Grey = unmodelled residues

Outliers in these metrics:• bond length & angle • Chirality/planarity • too-close contacts • Ramachandran• sidechain torsion angle• RNA sugar pucker

Residue-property plots – per polymer chain

The red dot! For X-ray structuresA red dot above a residue indicates a poor fit to the electron density

(ie RSRZ outlier)

Page 34: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Visualising validation at PDBe

PDBe.org

Page 35: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

PDBe.org

Page 36: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Like

lihoo

d of

di

sord

er

Disordered in model ensembleOrdered in model ensemble

NMR data quality and fit to data• This is at the moment basic

1. Completeness of resonance assignments% of atoms for which chemical shift is measured

2. Statistically unusual shifts

3. Random Coil Index• Do chemical shift and protein conformation agree?

Page 37: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

EM data quality and fit to data

PDBe.org/EMD-8116

Not yet in report, but available at EMDBpdbe.org/emdb

Page 38: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Geometry and fit to data go hand-in-hand

Asp 62 has some outlier bond lengths

But these are not justified by the electron density

Val 48 in entry 3kse is a Ramachandran outlier

But the strained conformation is supported by data

Page 39: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

An aside on Assemblies• Only the smallest part of a

crystal structure is deposited to the PDB archive

• The whole can be generated by applying symmetry to this

• The part you’re interested in could be:

The file as it is:

The file and symmetry

Only part of the file

Page 40: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

HIV protease. Entry 2az9

Page 41: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Viewing assemblies at PDBe with LiteMol

PDBe.org

Page 42: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Validation for small molecules

• There are thousands of amino acids and nucleic acid bases in the PDB

• But a small molecule could be unique

• So how can we tell what it should look like?

Summary of issues: X marks an outlier!

Is it the correct handedness?

Does it hit anything?

Is it chemically sensible?

Does it fit the data?

Page 43: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Validation for small molecules

PDBe.org

1. Geometry

Compares bond lengths and angleswith chemically similar fragments

Mogul - A knowledge-based library of molecular geometry derived from the Cambridge Structural Database (CSD)

Page 44: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Validation for small molecules

PDBe.org

2. Fit to data

RSR: Measure of how well a residue fits its local density >0.5 means it is worth a second look

LLDF- ‘local ligand density fit’Z-score of ligand RSR relative to nearby polymeric residues > 2 is flagged as unusual

LLDF: “Is the ligand data quality comparable to that of the binding site?”

Page 45: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Difference density- the ‘red and green bits’!Along with the electron density maps we’ve already seen, comes a second mapIndicating:

Areas where the model has too many atoms for the data

Data suggest too few atoms here

Areas where the model has too few atoms for the data

Page 46: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Is there a hint of a ligand bound to the Haem of this cytochrome P450?

Difference density can indicate modelling errors

Large green ‘blob’- there might be something else here

Page 47: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Difference density can indicate modelling errors

Aspirin in red density- presence unsupported by data(also not interacting with protein much!)

Page 48: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

Viewing ligands at PDBe- each has its own page

PDBe.org

Page 49: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

A summary of validation

PDBe.org

It’s not black and white! Structures can’t be assigned ‘good’ or ‘bad’ easily

Look at the validation reportHow many outliers? Are they justified? Are they talked about in the paper?Does the model fit the data?

Does the structure adequately explain what is known biologically?

Page 50: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

A summary of validation

PDBe.org

wwPDB validation reports are extensive and a good aid to identifying structure quality

Search results at PDBe rank by quality Take into account validation metrics and resolution

PDBe pages allow validation to be viewed in 3DGeometry + data for EM and X-ray

Also a PyMOL plugin to show geometry

Available at pymolwiki.org

Page 51: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

PDBe.orgproteindatabank@PDBeurope

[email protected]

Thanks!

Page 52: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Audience Q&A session

Please use the Questionsfunction in GoToWebinar

application

Any other questions or points to discuss after the live

webinar? Join the discussion the discussion at

http://ask.bioexcel.eu.

Page 53: Bioexcel webinar series #10: "Аssessing structure quality in the pdb archive"

bioexcel.eu

Next Webinar15th February, 201715:00 GMT / 16:00 CET

Robust solutions for cryoEMfitting and visualisation of interaction space

Gydo van Zundertwith Mikael Trellet and JörgSchaarschmidt

Find out more, and register at www.bioexcel.eu/webinars