Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

59
Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University

Transcript of Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Page 1: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Comparative Modeling for Beta Protein Structure Prediction

Lenore J. Cowen

Tufts University

Page 2: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Amino Acids

A protein is composed of a central backbone and a collection of (typically) 50-2000 amino acids (a.k.a. residues).

There are 20 different kinds of amino acids each consisting of up to 18 atoms, e.g.,

Name 3-letter code 1-letter codeLeucine Leu LAlanine Ala ASerine Ser SGlycine Gly GValine Val VGlutamic acid Glu EThreonine Thr T

Page 3: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

CH2 CH2 CH CH2 H C CH3 CH2 CH2 CH2 CH2

COO- CH2 H3C CH3 CH2 HC CH CH2

CH2 CH3 HN N OH NH CH

C

NH2 N+H2 Asp Arg Val Tyr Ile His Pro Phe D R V Y I H P F

O H O H O H O H O H O H O H

H3N+ CH C N CH C N CH C N CH C N CH C N CH C N CH C N CH COO-

Protein Structure

Protein sequence: DRVYIHPF

repeating backbone structure

repeating backbone structure

Page 4: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Given an amino acid sequence, e.g., MDPNCSCAAAGDSCTCANSCTCLACKCTSCK, how will it fold in 3D?

Protein Folding Problem

The fold is important because it determines the function of the protein.

Page 5: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Note: The pictures I’ve been giving are “cartoons” of the backbone

Page 6: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Inverse Protein Folding Problem

Instead of given a sequence, and asking what’s its fold, take a fold, and ask for all the sequences that form that fold.

…VLWIXS….

…SSCILWG…

Page 7: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

What do we mean by “that fold”?

Page 8: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

Page 9: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

Page 10: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)

Page 11: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Can we recognize and model all folds that form a beta-trefoil, etc.?

• If they are evolutionarily close enough the answer is YES.

• Use BLAST to recognize homology (similar sequences have similar folds) and align conserved parts of the backbone.

…GVFIIIMGSHGK… …GVD-LMG-HGR…

Page 12: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Comparative modeling

• One the backbone of the conserved core is fixed, pack in the sidechains

• Add loops and unstructured regions.

Page 13: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Can we recognize and model all folds that form a beta-trefoil, etc.?

• But STRUCTURE can be more CONSERVED that sequence—maybe the structures align but we can no longer use BLAST because the sequence similarity is too weak

…GVFIIIMGSHGK… …GR—CV-GCAGR…

Page 14: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Comparative modeling

• If you CAN find the correct alignment, can do as before.

• One the backbone of the conserved core is fixed, pack in the sidechains

• Add loops and unstructured regions.

Page 15: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

• Statistical template/profile methods (Altschul et al. 1990)

• Hidden Markov Models (Eddy, 1998)

• Threading Methods (Jones et al. 1992)

• Combinations of two or more of the above

Approaches to Structural Motif Recognition

Page 16: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.
Page 17: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Our Results

Recognizing the Beta Helix and Beta Trefoil Folds

Page 18: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

A processive fold composed of repeated super-secondary units.

Each rung consists of three beta-strands separated by turn regions.

No sequence repeat.

The Right-handed Parallel Beta-Helix

Pectate Lyase C (Yoder et al. 1993)

Page 19: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Biological Importance of Beta Helices

Surface proteins in human infectious disease:• virulence factors • adhesins• toxins• allergens

Proposed as a model for amyloid fibrils (e.g. Alzheimer’s and Creutzfeldt-Jakob)

Virulence factors in plant pathogens

Page 20: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

What was Known

Solved beta-helix structures:

12 structures in PDB in 7 different SCOP families

Pectate Lyase: Pectate Lyase C Pectate Lyase E Pectate Lyase

Galacturonase: Polygalacturonase Polygalacturonase II Rhamnogalacturonase A

Pectin Lyase: Pectin Lyase A Pectin Lyase B

Chondroitinase BPectin MethylesteraseP.69 PertactinP22 Tailspike

Page 21: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

[Bradley, Cowen, Menke, King, Berger, PNAS, 2001, 98:26, 14,819-14,824 ; Cowen, Bradley, Menke, King, Berger (2002), J Comp Biol, 9, 261-276]

Performance:

• On PDB: no false positives & no false negatives. Recognizes beta helices in PDB across SCOP

families in cross-validation.

• Recognizes many new potential beta helices when run on larger sequence databases.

• Runs in linear time (~5 min. on SWISS-PROT).

BetaWrap Program

Page 22: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

BetaWrap ProgramHistogram of protein scores for:

• beta helices not in database (12 proteins)• non-beta helices in PDB (1346 proteins

)

Page 23: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Single Rung of a Beta Helix

Page 24: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

3D Pairwise Correlations

Stacking residues in adjacent beta-strands

exhibit strong correlations

Residues in the T2 turn have special

correlations (Asparagine ladder,

aliphatic stacking)

B3T2

B2

B1

Page 25: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Question: how can we find these correlations which are a variable distance apart in sequence?

Page 26: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Finding Candidate Wraps

• Assume we have the correct locations of a

single T2 turn (fixed B2 & B3).

• Generate the 5 best-scoring candidates for the next rung.

B2

B3 T2Candidate

Rung

Page 27: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Scoring Candidate Wraps (rung-to-rung)

Rung-to-rung alignment score incorporates:

• Beta sheet pairwise alignmentpreferences taken from amphipathic beta structures in PDB.

(w/o beta helices)

• Additional stacking bonuseson internal pairs.

• Distribution on turn lengths.

Page 28: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Scoring Candidate Wraps (5 rungs)

• Iterate out to 5 rungs generating candidate wraps:

• Score each wrap:

- sum the rung-to-rung scores

- B1 correlations filter

- screen for alpha-helical content

Page 29: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Predicted Beta Helices

Features of the 200 top-scoring proteins in the NCBI’s protein sequence database:

• Many proteins of similar function to the known beta-helices; some with similar sequences.

• A significant fraction are characterized as microbial outer membrane or cell-surface proteins.

• Mouse, human, worm and fly sequences significantly underrepresented – only two proteins!

Page 30: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Some Predicted Beta Helices in Human Pathogens

Vibrio cholerae Helicobacter pyloriPlasmodium falciparum Chlamyidia trachomatis Chlamydophilia pneumoniae Listeria monocytogenes Trypanosoma brucei Borrelia burgdorferiLeishmania donovani Bordetella bronchiseptica Trypanosoma cruizi Bordetella parapertussisBacillus anthracisRickettsia ricketsii Rickettsia japonicaNeisseria meningitidisLegionaella pneumophilia

CholeraUlcersMalariaVenereal infectionRespiratory infectionListeriosisSleeping sicknessLyme diseaseLeishmaniasisRespiratory infectionSleeping sicknessWhooping coughAnthraxRocky Mtn. spotted feverOriental spotted feverMeningitisLegionnaire’s disease

Page 31: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The beta-trefoil consists of three leaves around an axis of three-fold symmetry.

The Beta-Trefoil

x3

Single Leaf Entire trefoil(3 leaves)

B1

B3

B2

B4

Cap

Barrel

1BFF (Kitagawa et al. 1991)

T1

T2

T3

Page 32: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

A leaf template consists of:

Templates

In addition, it is between 26 and 64 residues long.

A trefoil template consists of three leaf templates separated by two T4 turns of length 0 to 16.

• a B1-strand, followed by a T1 turn of length 2 to 17, followed by

• a B2-strand, followed by a T2 turn of length 0 to 11, followed by a B3-strand, followed by

• a T3 turn of length 4 to 20, followed by a B4 strand.

Cap

tem

plat

e

B1

B3

B2

B4

T2

T3

T1

Page 33: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

What Pairs Do We Consider?

In both the barrel and the cap, we consider both directly aligned pairs of residues and pairs of residues one-off from each other.

Different tables are used for pairwise preferences for buried, exposed, and one-off pairs of residues.

T1B4

B1

B2

B3T2

T3

Page 34: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Packing moves earlier in the modeling process

• In order to produce more accurate sequence-structure alignments, we return several possible “wraps” and try to pack sidechains.

• So sidechain packing is used earlier in the comparative modeling process; also to help find the correct sequence-structure alignment.

Page 35: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Top wraps fed to packing function.

• SCWRL (Canutescu, 2003) is better at packing cap than barrels.

• Input to SCWRL:

• Atomic coordinates of the backbone of cap strand pairs from a member of each trefoil superfamily in the training set.

• Top 4 wraps of the target sequence onto the trefoil template.

• Return best-scoring wrap with a good packing, if one exists, else reject.

The Packing Function

Page 36: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Partial PDB file from actual trefoil

Example of the Packing Phase

1235

4

610

78 9

Steric clash

ATOM 4340 N LEU B 196 41.442 …ATOM 4341 CA LEU B 196 40.705 …ATOM 4342 C LEU B 196 40.704 …ATOM 4343 O LEU B 196 41.787 …ATOM 4344 CB LEU B 196 41.441 …ATOM 4345 CG LEU B 196 41.503 …ATOM 4346 CD1 LEU B 196 41.902 …ATOM 4347 CD2 LEU B 196 40.155 …ATOM 4348 H LEU B 196 42.299 …ATOM 4349 N THR B 197 39.524 …ATOM 4350 CA THR B 197 39.397 …ATOM 4351 C THR B 197 38.506 …ATOM 4352 O THR B 197 37.700 …ATOM 4353 CB THR B 197 38.704 …ATOM 4354 OG1 THR B 197 39.307 …ATOM 4355 CG2 THR B 197 38.808 …ATOM 4356 H THR B 197 38.752 ……

Known Cap

LTSKD STILL12345 67890

ATOM 1 N LEU 1 41.442 …ATOM 2 CA LEU 1 40.705 …ATOM 3 C LEU 1 40.704 …ATOM 4 O LEU 1 41.787 …ATOM 5 CB LEU 1 41.412 …ATOM 6 CG LEU 1 40.686 … ATOM 7 CD1 LEU 1 39.364 …ATOM 8 CD2 LEU 1 41.533 … ATOM 9 N ARG 2 39.524 …ATOM 10 CA ARG 2 39.397 …ATOM 11 C ARG 2 38.506 …ATOM 12 O ARG 2 37.700 …ATOM 13 CB ARG 2 38.788 …ATOM 14 CG ARG 2 39.658 …ATOM 15 CD ARG 2 38.984 …ATOM 16 NE ARG 2 39.799 …ATOM 17 CZ ARG 2 39.404 … …

Predicted cap atomic positions

Cap from top wrap

LRVYY RILHN12345 67890

SCWRL

1ABR (Tahirov et al. 1995)

B2

B3

B2

B3

Page 37: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Toward Automation

• For each SCOP beta-structural template*align all known examples of fold

*find pairs in conserved core*thread onto template (additionally

use profiles); find candidate alignmentsPack sidechains for each, determine best

structurePlace loops and unstructured regions

Page 38: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Toward Automation

• For each SCOP beta-structural template*align all known examples of fold

*find pairs in conserved core*thread onto template (additionally

use profiles); find candidate alignmentsPack sidechains for each, determine best

structurePlace loops and unstructured regions

Page 39: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Multiple Structure Alignment for Remote Protein Homologs

• We spend the remainder of the talk discussing our new program for multiple structure alignment: MATT

Page 40: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Multiple Structure Alignment Problem

Input: atomic coordinates for the backbones of m protein structures

Output: A sequence alignment of the protein structures, together with a superimposition of the structures in 3D space.

Page 41: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Multiple Structure Alignment Problem

Def: the common core of a protein structure is the set of positions where every structure contributes a residue in alignment

Page 42: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Multiple Structure Alignment Problem

Geometric criteria:

Good multiple structure alignments MAXIMIZE common core size while MINIMIZING pairwise RMSDs between structures.

Note: even simplified versions NP-Hard (Goldman, Istrail and Papadimitriou, 1999)

Page 43: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Multiple Structure Alignment Problem

Discrimination criteria:

Good multiple structure alignments align what is “supposed to be aligned” because it is part of the evolutionarily conserved core.

Page 44: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Approaches to Structure Alignment

• AFP chaining methods align all short pieces and chain together using dynamic programming

• Contact map methods look for similarities within distance matrices

• Geometric hashing, secondary structure elements, etc.

Page 45: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Some Popular Structure Aligners

• Dali (Holm 93)• VAST (Bryant 96)• LOCK (Singh 97)• FlexProt (Shatsky et

al. 02)• FATCAT

(Ye&Godzik 04)• LOVOALIGN

(Andreani et al. 06)

• CE/CE-MC (Shindyalov 2000)

• SSAP (Orengo&Taylor 96)

• MultiProt (Shatsky&Wolfson 04)

• POSA (Ye&Godzik 05)• Mustang (Konagurthu et

al. 06)• CBA (Ebert 07)

Page 46: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Benchmark Datasets

• Globins

• Homstrad– 1028 alignments – Each alignment contains 2-41 structures– 399 sets with > 2 structures

Page 47: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

The Benchmark Datasets

SabmarkSuperfamily set: – 3645 domains in 426 subsetsTwilight zone set: – 1740 domains in 209 subsetsBoth sets contain: – Between 3 and 25 structures– Decoy structures (sequence matches that reside

in different SCOP domains)

Page 48: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Matt: Multiple Alignment with Translation and Twists

• Matt is an AFP chaining method that additionally adds flexibility in the form of geometrically impossible bends and breaks.

Page 49: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Other work modeling flexibility

• In structure alignment: – Flexprot [Shatsky et al., 2002]– Fatcat/POSA [Ye&Godzik, 2004, 2005]

• For other reasons: – Molecular docking [Echols et al,03; Bonvin,06]– Ligand binding [Lemmen et al, 2006]– Decoy construction [Singh&Berger, 2006]

Page 50: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Outline of the Matt Algorithm

Page 51: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Results on Sabmark (Superfamily)

Program Name Avg. Core Size Avg. RMSD

Multiprot 68.701 1.498

Mustang 104.162 4.146

Matt 104.692 2.639

Page 52: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Results on Sabmark (Twilight Zone)

Program Name Avg. Core Size Avg. RMSD

Multiprot 36.54 1.536

Mustang 66.833 5.035

Matt 66.967 2.916

Page 53: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.
Page 54: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.
Page 55: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Sabmark Decoy Set

• For each SCOP superfamily, positive examples of the fold, and negative examples that are – Random examples from a different superfamily– Examples from a different superfamily that are

nonetheless good BLAST hits

Page 56: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.
Page 57: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Toward Automation

• For each SCOP beta-structural template*align all known examples of fold

*find pairs in conserved core*thread onto template (additionally

use profiles); find candidate alignmentsPack sidechains for each, determine best

structurePlace loops and unstructured regions

Page 58: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

On the Web

• BetawrapPro for predicting beta-helices and beta-trefoils at: http://betawrappro.csail.mit.edu

• Matt at: http://matt.csail.mit.edu OR http://matt.cs.tufts.edu

Page 59: Comparative Modeling for Beta Protein Structure Prediction Lenore J. Cowen Tufts University.

Acknowledgements

• Matt Menke• Andrew McDonnell• Phil Bradley• Bonnie Berger• Jonathan King

• National Science Foundation