Architecture Support for OS CSCI 444/544 Operating Systems Fall 2008.
110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture...
-
Upload
jason-mcdonald -
Category
Documents
-
view
216 -
download
1
Transcript of 110/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction BCB 444/544 Lecture...
1BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
BCB 444/544
Lecture 23
Protein Tertiary Structure Prediction
#23_Oct15
2BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Mon Oct 15 - Lecture 23
Protein Tertiary Structure Prediction
• Chp 15 - pp 214 - 230
Wed Oct 17 & Thurs Oct 18 - Lecture 24 & Lab 8
(Terribilini)
RNA Structure/Function & RNA Structure Prediction
• Chp 16 - pp 231 - 242
Fri Oct 18 - Lecture 25
Gene Prediction • Chp 8 - pp 97 - 112
Required Reading (before lecture)
3BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
New Reading & Homework Assignment
ALL: HomeWork #4 (emailed & posted online Sat AM)
Due: Mon Oct 22 by 5 PM (not Fri Oct 19) Read:
Ginalski et al.(2005) Practical Lessons from Protein Structure Prediction, Nucleic Acids Res. 33:1874-91. http://nar.oxfordjournals.org/cgi/content/full/33/6/1874 (PDF posted on website)
• Although somewhat dated, this paper provides a nice overview of protein structure prediction methods and evaluation of predicted structures.
• Your assignment is to write a summary of this paper - for details see HW#4 posted online & sent by email on Sat
Oct 13
4BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Seminars Last Week
Dr. Klaus Schulten (Univ of Illinois) - Baker Center
Seminar The Computational Microscope
2:10 PM in E164 Lagomarcino http
://www.bioinformatics.iastate.edu/seminars/abstracts/2007_2008/Klaus_Schulten_Seminar.pdf
• Check out links on Schulten's website (videos, etc) • http://www.ks.uiuc.edu/~kschulte/
• Great seminar - amazing simulations of dynamics in proteins and large macromolecular assemblies
• Very computationally intensive - very impressive demonstration of power of computation to produce insights not attainable using only experimental approaches
5BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:http://www.bcb.iastate.edu/seminars/index.html
• Oct 18 Thur - BBMB Seminar 4:10 in 1414 MBB • Sachdeve Sidhu (Genentech) Phage peptide and
antibody libraries in protein engineering and ligand selection
• Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI• Lyric Bartholomay (Ent, ISU) TBA
6BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Protein Sequence & Structure: Analysis
• Diamond STING Millennium - Many useful structure analysis tools, including Protein Dossier http://trantor.bioc.columbia.edu/SMS/
• SwissProt (UniProt)Protein knowledgebase
http://us.expasy.org/sprot
• InterProSequence analysis tools
http://www.ebi.ac.uk/interpro
7BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Chp 14 - Secondary Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 14
Protein Secondary Structure Prediction
• √Secondary Structure Prediction for Globular Proteins
• √Secondary Structure Prediction for Transmembrane Proteins
• √Coiled-Coil Prediction
8BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Where Find "Actual" Secondary Structure? In the PDB
9BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
How Does Predicted Secondary Structure Compare with Actual? (An example)
Query MAATAAEAVASGSGEPREEAGALGPAWDESQLRSYSFPTRPIPRLSQSDPRAEELIENEEGOR V CCCCHHHHHHHHCCHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHCCCCFDM CCCCCCCCCCCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCCCDM CCCCHHHHHHCCCCCCCEECCCCCCCCCHHHCCCCCCEECCCCCCCCCCHHHHHHHHCCC
DSSPAuthor
Predicted - Using 3 methods (from CMD server, Jernigan Group, ISU)
Actual - Calculated from PDB coordinates by DSSP or author:
10BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Chp 15 - Tertiary Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 15
Protein Tertiary Structure Prediction
• Methods• Homology Modeling• Threading and Fold Recognition• Ab Initio Protein Structural Prediction• CASP
11BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Structural Genomics - Status & Goal
~ 20,000 "traditional" genes in human genome (recall, this is fewer than earlier estimate of
30,000)
~ 2,000 proteins in a typical cell> 4.9 million sequences in UniProt (Oct 2007)
> 46,000 protein structures in the PDB (Oct 2007)
Experimental determination of protein structure lags far behind sequence determination!
Goal: Determine structures of "all" protein folds in nature, using combination of experimental structure determination methods (X-ray crystallography, NMR, mass spectrometry) & structure prediction
12BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Structural Genomics Project
TargetDB: Database of Structural Genomics
Targets
http://targetdb.pdb.org
13BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
PMDB: Protein Model Database http://mi.caspur.it/PMDB/help.php
also, via NAR's Molecular Biology Database Collection http://www.oxfordjournals.org/nar/database/summary/855
Database of Theoretical Structures?
Theoretical structural models (predicted) are no longer accepted by the PDB (since 10/15/06); but, it is possible to search for models deposited earlier:
http://www.rcsb.org/pdb/search/searchModels.do
14BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Protein Structure Prediction or Protein Folding Problem
"Major unsolved problem in molecular biology"
In cells: spontaneousassisted by enzymesassisted by chaperones
In vitro: many proteins can fold to their "native" states spontaneously & without assistance
but, many do not!
15BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Deciphering the Protein Folding Code
• Protein Structure Prediction or• Protein Folding Problem
Given the amino acid sequence of a protein, predict its 3-dimensional structure (fold)
• Inverse Folding ProblemGiven a protein fold, identify every amino acid sequence that can adopt its 3-dimensional structure
16BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Protein Structure Prediction
Structure is largely determined by sequence
BUT:• Similar sequences can assume different structures• Dissimilar sequences can assume similar structures• Many proteins are multi-functional 2 Major Protein Folding Problems:
1- Determine folding pathway 2- Predict tertiary structure from sequence
Both still largely unsolved problems
17BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Steps in Protein Folding
1- "Collapse"- driving force is burial of
hydrophobic aa’s (fast - msecs)
2- Molten globule - helices & sheets form, but
"loose" (slow - secs)
3- "Final" native folded state - compaction
& rearrangement of 2' structures
Native state?- assumed to be lowest free energy- may be an ensemble of structures
18BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Protein Dynamics
• Protein in native state is NOT static
• Function of many proteins requires conformational changes, sometimes large, sometimes small
• Globular proteins are inherently "unstable"
(NOT evolved for maximum stability)
• Energy difference between native and denatured state is very small (5-15 kcal/mol)
(this is equivalent to ~ 2 H-bonds!)
• Folding involves changes in both entropy & enthalpy
19BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Difficulty of Tertiary Structure Prediction
Folding or tertiary structure prediction problem can be formulated as a search for minimum energy conformation
• Search space is defined by psi/phi angles of backbone and side-chain rotamers
• Search space is enormous even for small proteins!
• Number of local minima increases exponentially with number of residues
Computationally it is an exceedingly difficult problem!
20BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Tertiary Structure Prediction Methods
2 (or 3) Major Methods:1. Comparative Modeling:
• Homology Modeling (easiest!) • Threading and Fold Recognition (harder)
2. Ab Initio Protein Structural Prediction (really hard)
21BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Comparative Modeling?
Comparative modeling - term is
sometimes used interchangeably with homology modeling, but also sometimes used to mean both:
• homology modeling
• threading/fold recognition
22BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Ab Initio Prediction
1. Develop energy function
• bond energy• bond angle energy• dihedral angle energy• van der Waals energy• electrostatic energy
2. Calculate structure by minimizing energy function • usually Molecular Dynamics (MD) or Monte Carlo (MC)
Ab initio prediction - impractical for most real (long) proteins• Computationally? very expensive• Accuracy? Usually poor for all except short peptides
(but much improvement recently!)
Provides both folding pathway & folded structure
23BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Comparative Modeling
Provide folded structure only
Two types:
1) Homology modeling
2) Threading (fold recognition)
Both rely on availability of experimentally determined structures that are "homologous" or at least structurally very similar to target
24BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Homology Modeling
1. Identify homologous protein sequences (-BLAST)2. Among available structures (in PDB), choose one
with closest sequence to target as template(can combine steps 1 & 2 by using PDB-BLAST)
1. Build model by placing target sequence residues in corresponding positions on homologous structure & refine by "tweaking" modeled structure (energy minimization)
2. Homology modeling - works "well"1. Computationally? "relatively" inexpensive2. Accuracy? higher sequence identity better
model
1. Requires ~30% sequence identity with sequence for which structure is known
25BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Threading - Fold RecognitionIdentify “best” fit between target sequence & template structure
1. Develop energy function2. Develop template library3. Align target sequence with each template in library &
score4. Identify top scoring template (1D to 3D alignment)5. Refine structure as in homology modeling
Threading - works "sometimes"1. Computationally? Can be expensive or cheap, depends
on energy function & whether "all atom" or "backbone only" threading is used
2. Accuracy? in theory, should not depend on sequence identity (should depend on quality of template library & "luck")
Usually, higher sequence identity to protein of known structure better model
26BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Threading: the Motivation
• Basic premise:
• Statistics from Protein Data Bank (>46,000 structures)
• Thus, chances for a protein to have a native-like structural fold in PDB are quite good
• Note: Proteins with similar structural folds could be either homologs or analogs
The number of unique structural folds in nature is fairly small (probably 2000-3000)
Prior to Structural Genomics Project, 90% of "new" structures submitted to PDB were similar to existing folds in PDB - suggesting that almost all folds in nature have been identified
27BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
1. Align target sequence with template structures
in fold library (usually from the PDB)
2. Calculate energy score to evaluate "goodness of fit" between target sequence & template structure
3. Rank models based on energy scores
Target Sequence
Structure Templates
ALKKGF…HFDTSE
Steps in Threading
28BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Threading Goal - & Issues
• Structure database - must be "complete"
• Can't build a good model if there is no good template in library!
• Sequence-structure alignment algorithm:
• Bad alignment Bad score!
• Energy function or Scoring Scheme:
• Must distinguish correct sequence-fold alignment from incorrect sequence-fold alignments
• Must distinguish “correct” fold from close decoys
• Prediction reliability assessment - How determine
whether predicted structure is correct? (or even close?)
Find “correct” sequence-structure alignment of a target sequence with its native-like fold in template library (usually derived from PDB)
29BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Threading: Template database
• Build a database of structural templates e.g., ASTRAL domain library derived from the
PDB
Sometimes, supplement with additional decoys e.g., generated using ab initio approach such as Rosetta (Baker)
30BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Threading: Energy function
• Two main methods (& combinations of these)
• Structural profile (environmental) physicochemical properties of amino acids
• Contact potential (statistical) based on contact statistics from PDB
famous one: Miyazawa & Jernigan (ISU)
31BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Protein Threading: Typical energy function
How well does a specific residue fit structural environment?
What is "probability" that two specific residues are in contact?
Alignment gap penalty?
Total energy: Ep + Es + Eg
Goal: Find a sequence-structure alignment that minimizes energy function
32BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
A Local Example: Rapid Threading Approach for Protein Structure Prediction
Kai-Ming Ho, Physics Haibo Cao
Yungok Ihm Zhong Gao
James MorrisCai-zhuang
Wang Drena Dobbs, GDCB
Jae-Hyung LeeMichael
TerribiliniJeff Sander
Cao H, Ihm Y, Wang, CZ, Morris, JR, Su, M, Dobbs, D, Ho, KM (2004)
Three-dimensional threading approach to protein structure recognition
Polymer 45:687-697
33BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Motivations for & Assumptions of Ho Threading Algorithm
Goal: Develop a threading algorithm that:• Is simple & rapid enough to be used in high throughput
applications• Is relatively "insensitive" to sequence similarity
between target protein sequence & sequence of template structure
(to enhance detection of remote homologs & structures that are similar due to convergent evolution)
• Can be used to answer questions such as:What are predicted structures of all "unassigned" ORFs in Arabidopsis?Does Arabidopsis have a protein with structure similar to mammalian Tumor Necrosis Factor (TNF)?
Assumptions:• Native state of a protein is lowest free energy state• Hydrophobic interactions drive protein folding
34BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Simplify: Template structure representation
,1=ijC 5.6≤ijr Åif (contact)
,0=ijC Otherwise
A neighbor in sequence (non-contact)
i
j
1
N
Template structure ( contact matrix) C NN ×
Yungok Ihm
35BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Simplify: Target Sequence Representation
• Miyazawa-Jernigan (MJ) model: inter-residue contact energy M(i,j) is a quasi-chemical approximation based on pair-wise contact statistics extracted from known protein structures in the PDB: 20 X 20 matrix = 210 values ("letters")
• Li-Tang-Wingreen (LTW): factorize the MJ interaction
matrix to reduce the number of parameters associated with amino acids from 210 to 20 q values
• Hydrophobic-Polar (HP): represent amino acids as either H (hydrophobic) or polar (P); Dill et al demonstrated the utility of this simple binary alphabet representation: 2 values
Compare results with 210 vs 20 vs 2 letter representations
How low can we go?
36BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Simplify: Energy Function
• Interaction “counts” only if two hydrophobic amino acid residues are in contact
• At residue level, pair-wise hydrophobic interaction is dominant:
E = i,j Cij Uij
Cij : contact matrix
Uij = U(residue I, residue J)
MJ: U = Uij
LTW: U = Qi*Qj
HP: U = {1,0}
Yungok Ihm
Energy calculation: Contact energy
Miyazawa-Jernigan (MJ) matrix:
210 parametersStatistical potential
Li-Tang-Wingreen (LTW):
20 parameters
})){(2~
( βαα +++= jiij qqCM
Contact Energy: )(1
ijjijic CQCQEN
ij
β+∑=
=2604.0,6797.0
−=−=−−=
βααii qQ
with
C M F I L
CMFILVW
046 054 -020 049 -001 006057 001 003 -008052 018 010 -001 -004
=M
iq
€
Qi~ solubility
~ hydrophobicity
contact matrix C
Yungok Ihm
ij
1
N
Template Structure
β+∑==
N
ij
jijic QCQE1
Contact Energy
Contact Matrix
Sequence
AVFMRIHNDIVYNDIANTTQ
Sequence Vector
)6497.0 ,1197.1 ,9897.0 ,7997.0(
),.....,,,(
== EFVA QQQQS
otherwise(a neighbor in sequence)
,0
56 if ,1
ij
ijij
C
rC Å
Scoring Function
Summary of Ho Threading Procedure
Yungok Ihm
Can complexity be further reduced?Consider simplifying structure representation, too
ALKKGF…HFDTSE
Sequence – Structure (1D – 3D problem)
(1D – 2D problem)
(1D – 1D problem)
Sequence – Contact Matrix
Sequence – 1D Profile
Haibo Cao
Examine eigenvectors of contact matrix
∑=
= N
i
ii
iii
TV
TVr
1
2
2
)~
(
)~
(
λ
λ
211
2
1
1
~~
)~
(~~
TVTV
TVVTCTT
i
N
i
i
N
i
iii
λλ
λ
≅=
=≡
∑
∑
=
=
Hydrophobic Contacts
iλiV :i-th eigenvector
C
1V :eigenvector with largest eigenvalue
:i-th eigenvalue of
:fraction of hydrophobic contacts from i-th eigenvectorir:protein sequence of the template structureT
C :contact matrix
Haibo Cao
Represent contact matrix by its dominanteigenvector (1D profile)
• First eigenvector (with highest eigenvalue) dominates the overlap between sequence and structure
• Higher ranking (rank > 4) eigenvectors are “sequence blind”
Haibo Cao
42BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Threading Alignment StepThreading Alignment Step - - now fast! now fast! Align Align target sequence vector (1D)target sequence vector (1D) with with eigenvector profile of eigenvector profile of template structure template structure (1D)(1D)
1VP =1D Profile
Maximize the overlap between the
Sequence (S) and the profile (P) allowing gapsPS •
Calculate contact energy
using the alignment: Ec
New profile CPP =
Cao et al Polymer 45 (2004)
43BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Parameters for alignment?
• Gap penalty: Insertion/deletion in helices or
strands is strongly penalized; smaller penalties for in/dels in loops
Gap penalties apply to alignment score only, not to energy calculation
• Size penalty: If a target residue and aligned
template residue differ in radius by > 0.5Å and if residue is involved in > 2 contacts, alignment is penalized
Size penalties apply to alignment score only, not to energy calculation
Loop
Helix
ALKKGFG…HFDTSE
Yungok Ihm
44BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
How incorporate secondary structure?
• Predict secondary structure of target sequence (PSIPRED, PROF, JPRED, SAM, GOR V)
N+ = total number of matches between predicted & actual secondary structure of template
N- = total number of mismatches
Ns = total number of residues selected in alignment
“Global fitness” : f = 1 + (N+ - N-) / Ns
Emod = f * Ethreading
Yungok Ihm
How much better is this “fit” than random?
Eshuffle : Shuffled Sequence vs Structure
Erelative = Emod – Eshuffled
Yungok Ihm
Avg E score for same sequence shuffled (randomized) many times
E score modifed to reflect fit with predicted 2' structure
46BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Performance Evaluation? "Blind Test"
CASP5 Competition (CASP7 is most recent)
(Critical Assessment of Protein Structure Prediction)
Given: Amino acid sequence
Goal: Predict 3-D structure (before experimental results published)
47BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Typical Results: (well, actually, our BEST Results):
HO = #1-Ranked CASP5 Prediction for this Target
• Target 174
• PDB ID = 1MG7
Actual Structure
Predicted Structure
T174_1
T174_2
Cao, Ihm, Wang, Dobbs, Ho
48BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
• FR Fold Recognition • (targets manually assessed by Nick Grishin)
• -----------------------------------------------------------
• Rank Z-Score Ngood Npred NgNW NpNW Group-name • 1 24.26 9.00 12.00 9 12 Ginalski • 2 21.64 7.00 12.00 7 12 Skolnick Kolinski • 3 19.55 8.00 12.50 9 14 Baker • 4 16.88 6.00 10.00 6 10 BIOINFO.PL • 5 15.25 7.00 7.00 7 7 Shortle • 6 14.56 6.50 11.50 7 13 BAKER-ROBETTA • 7 13.49 4.00 11.00 4 11 Brooks • 8 11.34 3.00 6.00 3 6 Ho-Kai-Ming • 9 10.45 3.00 5.50 3 6 Jones-NewFold • -----------------------------------------------------------
• FR NgNW - number of good predictions without weighting for multiple models• FR NpNW - number of total predictions without weighting for multiple models
Overall Performance in CASP5 Contest
~8th out of 180 (M. Levitt, Stanford)
49BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
CASP - Check it out!
Critical Assessment of Protein Structure Prediction
http://predictioncenter.gc.ucdavis.edu/
• CASP7 contest - 2006:• http://www.predictioncenter.org/casp7/Casp7.html
• Provides assessment of automated servers for protein structure prediction (LiveBench, CAFASP,
EVA) & URLs for them
• Related contests & resources:
• Protein Function Prediction (part of CASP)
• CAPRI = Critical Assessment of Predicted Interactions
• New: CASPM = CASP for M = Mutant proteins
• Predict effects of small (point) mutations, e.g., SNPs
50BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Another Convenient List of Links for Protein Prediction Servers
http://en.wikipedia.org/wiki/List_of_protein_structure_prediction_software
51BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Chp 13 - Protein Structure Visualization, Comparison & Classification
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 13
Protein Structure Visualization, Comparison & Classification
• Protein Structural Visualization
Protein Structure Comparison• Protein Structure Classification
52BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Protein Structure Comparison Methods
3 Basic Approaches for Aligning Structures (see Xiong textbook for details)
1. Intermolecular 2. Intramolecular 3. Combined
But, very active research area - many recent new methods
3 Popular Methods: 1. DALI = Distance Matrix Alignment of Structures
(Holm)• FSSP Database
2. SSAP = Sequential Structure Alignment Program (Orengo)1. CATH Database
• CE = Combinatorial Extension (Bourne)• VAST at NCBI
URLS:
http://en.wikipedia.org/wiki/Structural_alignment_software
53BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Another local example: Combining Structure Prediction, Machine Learning & "Real" (wet-lab) Experiments to Investigate the Lentiviral
Rev Protein: A Step Toward New HIV Therapies
Susan Carpenter (Washington State Univ)
Wendy SparksYvonne Wannemuehler
Drena Dobbs, GDCBJae-Hyung LeeMichael Terribilini
Kai-Ming Ho, Physics Yungok IhmHaibo CaoCai-zhuang Wang
Gloria Culver, BBMBLaura Dutca
5410/15/07BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
ProvirusCytoplasm
Nucleus
Late: Structural ProteinsProgeny RNA
Macromolecular interactions mediated by
Rev protein in lentiviruses (HIV & EIAV)
pre-mRNA AAAA
(protein-protein)
(protein-protein)
(protein-protein)
NUCLEAR EXPORT
AAAARevRevRevRevNUCLEAR IMPORT
SpliceosomeSpliceosome
AAAA
Early: Regulatory Proteins
Tat
RevRev
RevRev MULTIMERIZATIONAAAARevRev
RNA BINDINGRevRev
(protein-RNA)
Susan Carpenter
55BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Rev is essential for lentiviral replication
• Rev is a small nucleoplasmic shuttling protein
(HIV Rev 115 aa; EIAV Rev 165 aa)
• Recognizes a specific binding site on viral RNA:
Rev Responsive Element (RRE)
• Interacts with CRM1 to export incompletely spliced viral RNAs from nucleus to the cytoplasm
• Specific domains of Rev mediate nuclear localization, RNA binding, and nuclear export
• Critical role of Rev in lentiviral replication makes it an attractive target for antiviral (AIDs) therapy
56BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Problem: no high resolution Rev structure! not even for HIV Rev, despite intense effort ($$)
• Why?? • Rev aggregates at concentrations needed for NMR or
X-ray crystallography
• What about insights from sequence comparisons? • "undetectable" sequence similarity among Revs from
different lentiviruses (eg, EIAV vs HIV <10%)
• But: • We know that lentiviral Rev proteins are functionally
"homologous" - even in highly diverse lentiviruses
57BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
• Computationally model structures of lentiviral Rev proteins
- using structural threading algorithm (with Ho et al)
• Predict critical residues for RNA-binding, protein interaction - using machine learning algorithms (with Honavar et al )
• Test model and predictions - using genetic/biochemical approaches (with Carpenter &
Culver)- using biophysical approaches (with Andreotti & Yu groups)
Initially: focus on EIAV Rev & RRE
Hypothesis: Rev proteins from diverse lentiviruses share structural features critical for function
Approach:
58BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
HIV-1 Rev
Functional domains: EIAV vs HIV Rev
1 31 165
EIAV Rev
NES NLS
RRDRW
ERLE
KRRRK
RBM Folding?
exon 1 exon 2
NES - Nuclear Export SignalNLS - Nuclear Localization SignalRBM - putative RNA Binding Motif
1 116
NESNLS/RBM
RQARRNRRRRWR
59BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Predicted EIAV Rev Structure
Yungok Ihm
60BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
EIAV HIVFIV
SIV Dimer HIV Dimer
Comparison of Predicted Rev Structures
Yungok Ihm
61BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
A
Predicted Structure HIV Rev
N-terminus
B
NMR Structure HIV Rev N-terminal
Peptide (Battiste & Williamson)
C
OverlayAlignment of Predicted
& NMR Structures
Predicted vs Experimental Structure of
N-terminal region of HIV Rev
Yungok Ihm
62BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Location of functional residues EIAV Rev?
Yungok Ihm
Putative RBM
NESLeu36,45,49: On surface,
consistent with rolein nuclear export
Leu95 & Leu109:Buried in core, critical
hydrophic contacts for fold?
63BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Mutate hydrophobic residues predicted to be critical for helical packing in core
L65
L95
L109
Yungok Ihm
Single Ala Mutation L A
Single AspMutation L D
Negligible effect on Rev activity
Dramatic change in Rev activity?
Insert charged aa in hydrophobic core
Double AlaMutation LL AA
Reduction in Rev activity?
L65 vs L95 & L109
Single mutants: Leu to Ala Leu to Asp
Double mutants: Leu to Ala
64BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
aaa
50100150
L65ADL95ADL109ADL65AL95AL65AL109AL95AL109ASingle MutationsDouble MutationsControls
Act
ivity
of
Rev
Str
uctu
ral M
utan
ts
Sha
m
RI
pcD
NA
3
Functional Analysis of Rev Structural Mutants in vivo (CAT assay)
Wendy Sparks
65BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Functional domains: EIAV vs HIV Rev
HIV-1 Rev
- RNA interaction - Protein interactionNES - Nuclear Export SignalNLS - Nuclear Localization SignalRBM - putative RNA Binding Motif
Green
Red
1 116
NESNLS/RBM
RQARRNRRRRWR
EIAV Rev
NES NLS
RRDRW
ERLE
KRRRK
RBM Folding?
66BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Putative RNA-binding Motifs & Predicted RNA-binding Residues Mapped onto Predicted EIAV Rev Structure
61 71 81 91
ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR …++ +++++++ ++++++++++ + +
121 131 141 151 161 HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL + ++++ ++ +++ +++++++++++++++
Michael Terribilini
Yungok Ihm
KRRRK
RRDRW
ERLE
67BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Express & purify MBP-ERev deletion mutants
60
42
3022
Mark
er
MB
P
1-1
65
31
-16
5
31
-14
5
57
-16
5
57
-14
5
57
-12
4
12
5-1
65
14
6-1
65
MBP-ERev
1-16531-165
31-145
57-165
57-145
57-124125-165
146-165
NES NLS
1 31 57 125 146 165RBM Folding?
Jae-Hyung Lee
MBP
MBP
MBP
MBP
MBP
MBP
MBP
MBP
68BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
MBP-ERev binds specifically to RRE in vitro
sense antisense
31
-16
5
BS
A MB
P1
-16
5
BS
A
MB
P
1-1
65
31
-16
5 Cold RRE
No p
rote
in
No c
old
RR
E
UV crosslinking Competition
Undigested32P-RRE
Jae-Hyung Lee
PREDICTED:
Structure
Protein binding residues
RNA binding residues
KRRRK
RRDRW
VALIDATED:
Protein binding residues
RNA binding residues
EIAV Rev: Binding Predictions vs Experiments
++
131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL++++++++++ ++ +++ ++++++ + ++++++++++++++++++++
61 71 81 91
ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI+++++++++++++++ ++++++++++++++++
41 51GPLESDQWCRVLRQSLPEEKISSQTCI++++++++ ++
Lee et al (2006)J Virol 80:3844
Terribilini et al (2006)PSB 11:415
57-1
65
MB
PW
T
31-1
65
31-1
45
145-1
65
RRDRW
ERLE KRRR
K
NES
57 125 145 16531 FOLD?
NLS/RBM
RBM
Jae-Hyung Lee
70BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
AADAA
AALA
KAAAK
Roles of Putative RNA Binding Motifs?
NES NLS
RRDRW
ERLE
KRRRK
RBD
ERDE
RBD
1 31 57 124 146 165
Jae-Hyung Lee
Rev RNA Binding Motifs: Predicted vs Experiment
AADAA AALA KAAAK
ERDE
PREDICTED:
Structure
Protein binding residues
RNA binding residues
KRRRK
RRDRW
VALIDATED:
Protein binding residues
RNA binding residues
++
131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL++++++++++ ++ +++ ++++++ + ++++++++++++++++++++
61 71 81 91
ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI+++++++++++++++ ++++++++++++++++
41 51GPLESDQWCRVLRQSLPEEKISSQTCI++++++++ ++
RRDR
WERLE KRRRK
NES
57 125 145 16531
KA
AA
K
AA
DA
A
AA
LA
ER
DE
WT NLS
RBM FOLD?
NLS/RBM
Jae-Hyung Lee
KRRRK
RRDRW
Summary: Predictions vs Experiments
131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL++++++++++ ++ +++ ++++++ + ++++++++++++++++++++
61 71 81 91
ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI+++++++++++++++ ++++++++++++++++
41 51GPLESDQWCRVLRQSLPEEKISSQTCI++++++++ ++
Lee et al (2006)J Virol 80:3844
Terribilini et al (2006)PSB 11:415
RRDRW ERLE
KRRR
K
NES
57 125 145 16531
FOLD NLS/RBMRBM
ERLE
73BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Conclusions & Future Directions
Combination of computational & wet lab approaches revealed that:• EIAV Rev has a bipartite RNA binding domain• Two Arg-rich RBMs are critical
• RRDRW in central region (but not ERLE)• KRRRK at C-terminus, overlapping the NLS
• Based on computational modeling, the RBMs are in close proximity within the 3-D structure of protein
• Lentiviral Rev proteins & their cognate RRE binding sites may be more similar in structure than has been appreciated
Lee et al (2006)J Virol 80:3844
Terribilini et al (2006)PSB 11:415
Future: Computational: Use Rev-RRE model system to discover "predictive rules" for protein-RNA recognition
Experimental?
74BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Experimentally determine the structure of Rev-RRE complex !!!
BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction
Building “Designer” Zinc Finger DNA-binding Proteins J Sander, P Zaback, F Fu, J Townsend, R Winfrey
D Wright, K Joung, L Miller, D Dobbs, D Voytas
Wright et al (2006)Nature Protocols
Sander et al (2007)Nucleic Acids Res
76BCB 444/544 F07 ISU Dobbs #23 - Protein Tertiary Structure Prediction 10/15/07
Chp 16 - RNA Structure Prediction
SECTION V STRUCTURAL BIOINFORMATICS
Xiong: Chp 16 RNA Structure Prediction (Terribilini)
• Introduction• Types of RNA Structures• RNA Secondary Structure Prediction Methods• Ab Initio Approach• Comparative Approach• Performance Evaluation