Protein Structure Prediction Graham Wood Charlotte Deane.
-
Upload
scarlett-carson -
Category
Documents
-
view
256 -
download
0
Transcript of Protein Structure Prediction Graham Wood Charlotte Deane.
![Page 1: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/1.jpg)
Protein Structure Prediction
Graham Wood
Charlotte Deane
![Page 2: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/2.jpg)
The problem - in brief
MVLSEGEWQL
VLHVWAKVEA
DVAGHGQDIL
…
AKYKELCYOG
Databases
Algorithms
Software
+ =
![Page 3: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/3.jpg)
Why is protein structure prediction needed?
• Essential functioning of cells is mediated by proteins
• It is protein structure that leads to protein function
• 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR)
• Assists in the engineering of new proteins
![Page 4: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/4.jpg)
Terminology
Target
- the unknown structure you are trying to model
Parent
- a known structure which provides a basis for modelling
![Page 5: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/5.jpg)
The problem- more detail
Configuration space
EnergyEKGPDLYLIPLT
Protein databases
EKGPDLYLIPLT
Biologist Physicist
![Page 6: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/6.jpg)
CASPCritical Assessment of Structure Prediction
Jan-Apr May Jun Jul Aug Sept Oct Nov Dec
Biologists
Caspers
Organisers
Call for structures
Publish seqs on web
Give sequences to organisersStructure determination Give structures to
organisers
Predict structure from sequence
Expert assessment
4 day mtg
![Page 7: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/7.jpg)
Degree of evolutionary conservation
Less conservedInformation poor
More conservedInformation rich
DNA seq
Protein Seq Structure
Function
ACAGTTACACCGGCTATGTACTATACTTTG
HDSFKLPVMSKFDWEMFKPCGKFLDSGKLG
![Page 8: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/8.jpg)
Three main approaches(in order of current success)
1. Comparative modelling
2. Fold recognition
3. De novo
![Page 9: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/9.jpg)
Comparative modelling
Conserved backbone
EnergyEKGPDLYLIPLT
Target
Close homologues
Variable backbone
Side chains
![Page 10: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/10.jpg)
Comparative modelling(protein building)
1. Prepare the raw materials
2. Build the model (two methods)
3. Check the model
4. Accept or reject the model
![Page 11: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/11.jpg)
C1: Preparing the raw materials
Structurally align parents
Align target to parents
EKGPDLYLIPLTGiven target AA sequence
Identify parents (homologues)
![Page 12: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/12.jpg)
loop region
secondary structure region
Structurally conserved regions and structurally variable regions
SCR
SVR
![Page 13: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/13.jpg)
C2: Building (choice of two methods)
Attach and orient side-chains
Refine model
Determine SCRs and build associated backbone
Determine SVRs and buildrest of backbone
Assemble fragments Use spatial restraints
![Page 14: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/14.jpg)
![Page 15: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/15.jpg)
C2: Building (choice of two methods)
Orient side-chains
Refine model
Determine SCRs and build associated backbone
Determine SVRs and buildrest of backbone
Assemble fragments Use spatial restraints
Optimally satisfyspatial restraints
![Page 16: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/16.jpg)
Extrapo lation
D T N V A Y C N K D
![Page 17: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/17.jpg)
C3: Test model (C4: then accept or reject)
• Examine the model in the light of all experimental data
• PROCHECK, VERIFY3D, PROSA II, Visual inspection using 3D software, JOY
![Page 18: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/18.jpg)
Problems in comparative modelling
• Aligning the target to the parents
• The packing of secondary structure elements in the core
• The long insertions and deletions in the structurally variable regions
![Page 19: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/19.jpg)
Fold Recognition
?Target
![Page 20: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/20.jpg)
Fold recognition
EnergyEKGPDLYLIPLT
Target Structurally similar proteins
![Page 21: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/21.jpg)
Fold recognition(protein finding)
1. Obtain library of non-duplicate folds
2. Perform sequence-structure alignment
3. Assess success of alignment
• Biologist – use substitution matrix
• Physicist – use potentials
4. Accept or reject the model
![Page 22: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/22.jpg)
Sequence-structure alignment
1. Construct sequence profile
2. Use profile to score the sequence
Target Parent
BLASTP
OWL MULTAL
Dynamic programming algorithm
Score
![Page 23: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/23.jpg)
Amino acid substitutions are constrained by local environments
Different substitution patterns
Environment-specific substitution tables
![Page 24: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/24.jpg)
•Main-chain conformation and secondary structure(α-helix, β-strand, coil and positive φ)
•Solvent accessibility(accessible and inaccessible)
•Hydrogen bonds(side-chain to main-chain NH, side-chain to main-chain CO and side-chain to side-chain)
Definition of local environments
![Page 25: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/25.jpg)
Substitution scores
c
Eac
Eab
ffEabP ),|(
))/),|((log(round bEab PEabPS
bPBackground probability of observing amino acid b,
match occurring by chance
Log odds score scaled to the nearest integer
Probability that amino acid a in environment E
is replaced by amino acid b
Eabf
Frequency of observing amino acid a in environment E replaced by b
![Page 26: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/26.jpg)
Scoring with potentials
))(
)(1log()1log()(
sf
sfmRTmRTsE
k
abk
abababk
Energy potential
Solvation potential
))(
)(log()(
rf
rfRTrE
aa
![Page 27: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/27.jpg)
The Novel Fold Problem
?
asdghklprtwecvm
nasetyasdghklprtwecvm
nasety
![Page 28: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/28.jpg)
De novo – new fold methods
EnergyEKGPDLYLIPLT
Segment configurations Sets of local configurations
![Page 29: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/29.jpg)
Defining a “New Fold”
• CATH– Somewhat objective
• SCOP– No objective definition
– Tends towards evolutionary relationships
• Ask A. Murzin
![Page 30: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/30.jpg)
New fold approach
• All structure information is in the AA sequence (Anfinson, Science, 1973)
• Seek “lowest free energy conformation”
• Tactic is to simplify the problem, for example
•Simplified model of protein (one atom per residue)
•Simple or knowledge based potential function
• Assist in detecting distant homologues
![Page 31: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/31.jpg)
New fold recognition(structure discovery)
1. Set up domain and objective function
2. Perform optimisation
3. Check the model
4. Accept or reject the model
![Page 32: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/32.jpg)
De Novo (biologist)ROSETTA (Baker et al.)
Domain of objective function
sequence
9 residues
.
.
. Set of local structures
consistent with local sequence
![Page 33: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/33.jpg)
De Novo (biologist)ROSETTA
Objective function to be maximised
)sequence(
)structure|sequence()structure(
)sequence|structure(
P
PP
P
constantFunction of energy
i
ii EAAP )|(
![Page 34: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/34.jpg)
De Novo (biologist)ROSETTA
Maximising the probability of the sequence
1. Choose each local conformation and start with a fully extended chain
2. Generate a neighbouring conformation
3. Accept in simulated annealing style, using P(structure|sequence)
4. Do this many times and cluster results – use centre of largest cluster as prediction
![Page 35: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/35.jpg)
![Page 36: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/36.jpg)
De Novo (physicist)ASTROFOLD (Floudas et al.)
1. Predict α-helices and β-strands
2. Predict β-sheets and disulphide bridges using ILP
3. Use deterministic global optimisation, with energy function and constraints to predict tertiary structure
![Page 37: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/37.jpg)
Testing of prediction
servers- LiveBench
Sensitivity Specificity Added Value
Server Type Easy Hard All Hard Easy Hard
Pcons2 Consensus 6 4 2 2 3 3
ShotGun on 5 Consensus 1 2 4 4 7 5
ShotGun on 3 Consensus 2 1 1 1 2 2
Shotgun-INBGU Threading 3 3 3 3 4 1
INBGU Threading 7 5 6 9 5 6
Fugue3 Threading 14 8 9 8 15 9
Fugue2 Threading 12 7 8 7 10 8
Fugue1 Threading 17 14 14 11 16 15
mGenTHREADER Threading 8 11 16 13 6 11
GenTHREADER Threading 13 12 17 15 8 13
3D-PSSM Threading 5 10 12 12 12 10
ORFeus Sequence 4 6 7 6 1 4
FFAS Sequence 9 9 5 5 9 7
Sam-T99 Sequence 10 15 13 16 11 16
Superfamily Sequence 15 13 11 10 17 12
ORF-BLAST BLAST 11 16 10 14 14 14
PDB-BLAST BLAST 16 17 15 17 13 17
BLAST BLAST 18 18 18 18 18 18
![Page 38: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/38.jpg)
Review - comparative modelling
Conserved backbone
EnergyEKGPDLYLIPLT
Target
Close homologues
Variable backbone
Side chains
![Page 39: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/39.jpg)
Review - fold recognition
EnergyEKGPDLYLIPLT
Target Structurally similar proteins
![Page 40: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/40.jpg)
Review - new fold methods
EnergyEKGPDLYLIPLT
Segment configurations Sets of local configurations
![Page 41: Protein Structure Prediction Graham Wood Charlotte Deane.](https://reader033.fdocuments.in/reader033/viewer/2022061612/5697bfca1a28abf838ca93bb/html5/thumbnails/41.jpg)
Summary: Prediction Methods
• Comparative modelling– There exists a protein with clear homology– PSI-BLAST
• Fold recognition– There exists a protein of similar fold (analogy)– DALI (CATH & SCOP)
• Novel Fold methods– The sequence has a new fold
• Better methods needed yet for it all to be useful!