The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Post on 13-Jan-2016

45 views 4 download

description

The RCSB Protein Data Bank Teaching an Old Dog New Tricks. Philip E. Bourne pbourne@ucsd.edu. From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted. Biocurator Perspectives. A Tribute. Agenda. The old dog - PowerPoint PPT Presentation

Transcript of The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

The RCSB Protein Data BankTeaching an Old Dog New Tricks

Philip E. Bourne

pbourne@ucsd.edu

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

A Tribute

From the guardian of a resource (institution) to all those men and

women who make biology possible – may we never take you for

granted

Biocurator Perspectives

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

History of the Old Dog1970s• Community discussions about how to establish an archive of protein structures• Cold Spring Harbor meeting in protein crystallography• PDB established at Brookhaven (October 1971; 7 structures)1980s• Number of structures increases as technology improves• Community discussions about requiring depositions• IUCr guidelines established• Number of structures deposited increases1990s• Ontology defined • Structural genomics begins• PDB moves to RCSB 2000s• wwPDB formed

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

History of the Old Dog1970s• Community discussions about how to establish an archive of protein structures• Cold Spring Harbor meeting in protein crystallography• PDB established at Brookhaven (October 1971; 7 structures)1980s• Number of structures increases as technology improves• Community discussions about requiring depositions• IUCr guidelines established• Number of structures deposited increases1990s• Ontology defined • Structural genomics begins• PDB moves to RCSB 2000s• wwPDB formed

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Unchanging Core Mission

• Create and maintain a well-curated database of macromolecular structure data derived using experimental methods

that will…• Facilitate and support scientific research and education

that is…• Always accessible to a diverse user community worldwide• Developed in collaboration with that community

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Challenges - Scientific • More complex structures – molecular machines,

complexes• New methods (e.g. EM)• Lack of a vocabulary to provide reductionism in

complex structures• Partially solved problems in analyzing structures –

structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners

• Integrating microscopic and macroscopic views• Disease relationships

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.orgN

um

ber

of

rele

ased

en

trie

s

Year:

Growth and Complexity

Structure

SWISS-PROT/ GenBank IDs

Gene Ontology

Enzyme Commission

SourceOrganism

OMIM/Disease

Genomes(NCBI Gene)

Structural Genomics Targets

PubmedNCBI Taxonomy

Domains/Families

Primary References Derived References

•Source Organism Browser

•GO Browsers•Find Structures by GO ID

• Enzyme Browser

• Reactome

• Genome Browser•SNPs Mapped to Structure•Find Structures by SP ID

SCOP

CATH

•Disease Browser

Some Actions

•CATH Browser•SCOP Browser•PFAM Display

•Abstract Search

• Target Search

Data Integration

NAR 2005, 33: D233-D237

Human Proteome &Homology Models

•Function Coverage•Target Selection

PFAM

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Challenges - Technical• Sheer numbers• Efficient visualization• Improved annotation• Demands from a more diverse user base• Centralization versus decentralization• Web V2

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Diverse User Community (180,000 individuals per month)

and Diversifying Further• Structural biologists

• Computational biologists

• Experimental biologists

• Educators

• Students

• Lay public

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

New Tricks – Protein Representation

The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.

Limitations of a Cartesian Viewpoint

• A local viewpoint – does not capture the global properties of the protein

• Limited to a single scale descriptor

• Limits comparative analysis

New Tricks – Protein Representation

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Protein Kinase A – Open Book View

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Superfamily Members – The Same But Different

Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of

proteins.

Alignment Violates the Triangle Inequality

),(),(),(|),(),(| kjdjidkidkjdjid

Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI.

New Tricks – Protein Representation

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

• Roots in spherical harmonics

• Parameter space and boundary conditions can be a variety of properties

• Order of the multipoles defines the granularity of the descriptors

• Bottom line – interpreted as shape descriptors

An Alternative Approach: Multipolar Representation

Gramada & Bourne 2006 BMC Bioinformatics 7:242

Results – Protein Kinase Like Superfamily Alignment

Scheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49

Clear distinction between families.

Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level.

New Tricks – Protein Representation

Possibilities – Structure Based Phylogenetic Analysis

Scheeff & Bourne Multipoles

New Tricks – Protein Representation

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

New Tricks – Protein Motion

OrderedStructures

DisorderedStructures

Structures exist in a spectrum from order to disorder

Obtaining Protein Dynamic InformationProtein Structures Treated as a

3-D Elastic Network

Bahar, I., A.R. Atilgan, and B. Erman

Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.

Folding & Design, 1997. 2(3): p. 173-181.

New Tricks – Protein Motion

Gaussian Network Model

• Each C is a node in the network.

• Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å)

• Decompose protein fluctuation into a summation of different modes.

New Tricks – Protein Motion

Functional Flexibility Score

• Utilize correlated movements to help define regional flexibility with functional importance.

Functionally Flexible Score

For each residue:

1. Find Maximum and Minimum Correlation.

2. Use to scale normalized fluctuation to determine functional importance.

Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Identifying FFRs in HIV Protease

Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Other Examples BPTI and Calmodulin

Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Side Note: Gaussian Network Model vs Molecular Dynamics

• GNM relatively course grained

• GNM fast to compute vs MD–Look over larger time scales

–Suitable for high throughput

New Tricks – Protein Motion

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

An Active Research Program Around the Resource is Good for

the Resource

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

• Ensures that the PDB

remains a single &

uniform archive publicly

available to the worldwide

community

• 3 founding members:

RCSB PDB, PDBj, MSD-

EBI

Single worldwide archive of macromolecular structural data

Virtual Communities - Internal

wwPDB Activities

• Collaborative projects– Remediation

• taxonomy, ligands, literature

– Single data processing system

Virtual Communities - Internal

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External (modeling, other….)

• What will the resource look like in 2-5 years?

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Virtual Communities - External

Consider the PDB a gathering point through which a virtual and

real community interacts with each other around a common

interest

Virtual Communities - External

PDB-in-a-CAVE

NJ Science Olympiad Science ExpoTraveling art exhibit

for lay audiences

Website Tutorials/Feedback

Molecule of the Month

Real

Virtual

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Virtual Communities - Modelers• Recommendations of Workshop

– PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules

– A central, publicly available archive (or technical equivalent thereof) or portal should be established for models

– It was unanimously agreed that methods for assessing model quality are essential

Structure 2006 To be published

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Agenda

• The old dog• New tricks

– Thinking differently about proteins– Virtual Communities

• Internal (wwPDB)• External

• What will the resource look like in 2-5 years?

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

What Will the Resource Look Like in the Next 2-5 Years?

• Upwards of 75,000 structures• Consensus (and different) views at the micro and

macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification…

• Community annotation cf Wikipedia• Distributed subsets - External Reference Files (XML)• MyPDB• PDB-in-a-box• Specialized visualization tools (mbt.sdsc.edu)

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Is a database really different than a biological journal?

PloS Comp Biol 2005 1(3) e34

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

Now assigning DOIs to structures

Swiss-Prot - 20 Year Celebration

www.pdb.org • info@rcsb.org

Acknowledgements

The RCSB PDB

Jenny GuProtein Motions

NIH, NSF, DOE

Apostol GramadaMultipole Analysis

A Protein is More than the Union of its Parts

• Breaking the protein into parts changes the object of the comparison

• This is interpreted in many cases to imply that the rmsd measure is inadequate.

• The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do.

From Røgen & Fain (2003), PNAS 100:119-124

New Tricks – Protein Representation

An Alternative Approach: Multipolar Representation

Roots in Spherical Harmonics

• Parameterization

+ boundary conditionsgCharge distribution (i.e. structure) Ð

f qlm out;M lm in;qilm; M i

lmg

Scalar potential

Gramada & Bourne 2006 BMC Bioinformatics 7:242

New Tricks – Protein Representation

Spatial distribution ofa scalar quantity

• “Out” Multipoles

qlm =P

i=1

N

r li Y ã

lm(òi;þi); l = 0;ááá;1 ; m = à l;ááá;l

For a given rank l, they form a 2l+1 dimensional vector under 3D rotations

ql = fql;mgm=à l;ááá;l

Vector algebra applies => metric properties

Gramada & Bourne 2006 BMC Bioinformatics 7:242

An Alternative Approach: Multipolar Representation

New Tricks – Protein Representation

The multipoles can be interpreted as shape descriptors

In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail

The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure

An Alternative Approach: Multipolar Representation

Gramada & Bourne 2006 BMC Bioinformatics 7:242New Tricks – Protein Representation