Structure Prediction

32
Structure Prediction

description

Structure Prediction. Tertiary protein structure: protein folding. Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2] Comparative modeling (based on homology) [3] Ab initio (de novo) prediction (Dr. Ingo Ruczinski at JHSPH). - PowerPoint PPT Presentation

Transcript of Structure Prediction

Page 1: Structure Prediction

Structure Prediction

Page 2: Structure Prediction

Tertiary protein structure: protein folding

Three main approaches:

[1] experimental determination (X-ray crystallography, NMR)

[2] Comparative modeling (based on homology)

[3] Ab initio (de novo) prediction (Dr. Ingo Ruczinski at JHSPH)

Page 3: Structure Prediction

Experimental approaches to protein structure

[1] X-ray crystallography-- Used to determine 80% of structures-- Requires high protein concentration-- Requires crystals-- Able to trace amino acid side chains-- Earliest structure solved was myoglobin

[2] NMR-- Magnetic field applied to proteins in solution-- Largest structures: 350 amino acids (40 kD)-- Does not require crystallization

Page 4: Structure Prediction

Steps in obtaining a protein structure

Target selection

Obtain, characterize protein

Determine, refine, model the structure

Deposit in database

Page 5: Structure Prediction

X-ray crystallography

http://en.wikipedia.org/wiki/X-ray_diffraction

Sperm Whale Myoglobin

Page 6: Structure Prediction
Page 7: Structure Prediction
Page 8: Structure Prediction

PDB

• April 08, 2008 – 50,000 proteins, 25 new experimentally determined structures each day

New folds

Old folds

New

PD

B s

truct

ure

s

Page 9: Structure Prediction

Example 1wey

Page 10: Structure Prediction

Ab initio protein prediction

• Starts with an attempt to derive secondary structure from the amino acid sequence– Predicting the likelihood that a subsequence will fold into an alpha-

helix, beta-sheet, or coil, using physicochemical parameters or HMMs and ANNs

– Able to accurately predict 3/4 of all local structures

Page 11: Structure Prediction

Structure Characteristics

Page 12: Structure Prediction

Beta Sheets

Page 13: Structure Prediction

Ab Inito Prediction

Page 14: Structure Prediction

Secondary structure prediction

Chou and Fasman (1974) developed an algorithmbased on the frequencies of amino acids found in helices, -sheets, and turns.

Proline: occurs at turns, but not in helices.

GOR (Garnier, Osguthorpe, Robson): related algorithm

Modern algorithms: use multiple sequence alignmentsand achieve higher success rate (about 70-75%)

Page 279-280

Page 15: Structure Prediction

Table

Page 16: Structure Prediction
Page 17: Structure Prediction
Page 18: Structure Prediction
Page 19: Structure Prediction

Frequency Domain

Page 20: Structure Prediction

Neural Networks

Page 21: Structure Prediction

Training the Network

• Use PDB entries with validated secondary structures

• Measures of accuracy– Q3 Score percentage of protein correctly predicted

(trains to predicting the most abundant structure)– You get 50% if you just predict everything to be a

coil– Most methods get around 60% with this metric

Page 22: Structure Prediction

Correlation Coeficient

• How correlated are the predictions for coils, helix and Beta-sheets to the real structures

• This ignores what we really want to get to– If the real structure has 3 coils, do we predict 3

coils?

• Segment overlap score (Sov) gives credit to how protein like the structure is, but it is correlated with Q3

Page 23: Structure Prediction

Artificial Neural Network

PredictsStructure at this point

Page 24: Structure Prediction

Danger

• You may train the network on your training set, but it may not generalize to other data

• Perhaps we should train several ANNs and then let them vote on the structure

Page 25: Structure Prediction

Profile network from HeiDelberg• family (alignment is used as input) instead of just the

new sequence• On the first level, a window of length 13 around the

residue is used • The window slides down the sequence, making a

prediction for each residue• The input includes the frequency of amino acids

occurring in each position in the multiple alignment (In the example, there are 5 sequences in the multiple alignment)

• The second level takes these predictions from neural networks that are centered on neighboring proteins

• The third level does a jury selection

Page 26: Structure Prediction

PHD

Predicts 4

Predicts 6Predicts 6

Predicts 5Predicts 5

Page 27: Structure Prediction

Fold recognition (structural profiles)

• Attempts to find the best fit of a raw polypeptide sequence onto a library of known protein folds

• A prediction of the secondary structure of the unknown is made and compared with the secondary structure of each member of the library of folds

Page 28: Structure Prediction

Threading

• Takes the fold recognition process a step further:– Empirical-energy functions for residue pair

interactions are used to mount the unknown onto the putative backbone in the best possible manner

Page 29: Structure Prediction

Fold recognition by threading

Query sequence

Compatibility scores

Fold 1

Fold 2

Fold 3

Fold N

Page 30: Structure Prediction

CASP

• http://www.predictioncenter.org/casp8/index.cgi

Page 31: Structure Prediction

SCOP

• SCOP: Structural Classification of Proteins.• http://scop.mrc-lmb.cam.ac.uk/scop/

Page 32: Structure Prediction

CATH

• CATH: Protein Structure Classification• Class (C), Architecture (A), Topology (T) and

Homologous superfamily (H)