1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point...

64
1 Protein structure Prediction

Transcript of 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point...

Page 1: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

1

Protein structure Prediction

Page 2: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

2

Copyright notice

• Many of the images in this power point presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by John Wiley & Sons, Inc.

• Many slides of this power point presentation Are from slides of Dr. Jonathon Pevsner and other people. The Copyright belong to the original authors. Thanks!

Page 3: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

3

Levels of Protein Structure

Page 4: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

4

Why is protein structure prediction needed?

• 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR)

• Assists in the engineering of new proteins

Page 5: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

5

Approaches to predicting protein structures

• ab initio– Use just first principles: energy, geometry, and

kinematics

• Homology Comparative– Find the best match to a database of sequences with

known 3D-structure

Combinations

• Threading

Page 6: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

6

Protein Data Bank PDB http://www.pdb.org

Database of templates

Separate into single chainsRemove bad structures (models)

Create BLAST database

Comparative Modeling

Template(s) selection

Sequence Alignment

Structure Modeling

Structure E

valuation

Final Structural Models

Target sequence

Known Structures (templates)

Page 7: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

7

Known Structures (templates)

Sequence Alignment

Structure Modeling

Structure E

valuation

Final Structural Models

Target sequence

Sequence Similarity / Fold recognition

Structure quality (resolution, experimental method)

Experimental conditions (ligands and cofactors)

Comparative Modeling

Template(s) selection

Page 8: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

8

Known Structures (templates)

Template(s) selection

Structure Modeling

Structure E

valuation

Final Structural Models

Target sequence

Key step in homology modeling

Global alignment is required

Small error in alignment can lead to big error in model

Multiple alignments are better than pairwise alignments

Comparative Modeling

Sequence Alignment

Page 9: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

9

Known Structures (templates)

Template(s) selection

Structure E

valuation

Final Structural Models

Target sequence

Template based fragment Assembly (SwissMod). Satisfaction of Spatial Restraints: MODELLER

Comparative Modeling

Sequence Alignment

Structure Modeling

Page 10: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

10

Known Structures (templates)

Template(s) selection

Sequence Alignment

Structure Modeling

Final Structural Models

Target sequence

Errors in template selection or alignment result in bad models

Iterative cycles of alignment, modeling and evaluation

Comparative Modeling

Structure E

valuation

Page 11: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

11

Measure Proteins Structure Similarity

• Need ways to determine if two protein structures are related and to compare predicted models to experimental structures

• Commonly used measure is the root mean square deviation (RMSD) of the Cartesian atoms between two structures after optimal superposition (McLachlan, 1979):

 

• Usually use C atoms 

N

dzdydxN

i iii

1

222

3.6 Å 2.9 Å

NK-lysin (1nkl) Bacteriocin T102/as48 (1e68) T102 best model• Other measures include contact maps and torsion angle RMSDs

Page 12: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

12

Comparative modeling

• In general, accuracy of structure prediction depends on the percent amino acid identity shared between target and template.

• For >50% identity, RMSD is often only 1 Å.

Page 13: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

13

Many web servers offer comparative modeling services.

Examples areSWISS-MODEL (ExPASy)Predict Protein server (Columbia)WHAT IF (CMBI, Netherlands)

Comparative modeling

Page 14: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

14

Ab Initio Methods

• Ab initio: “From the beginning”.• Assumption 1: All the information about the

structure of a protein is contained in its sequence of amino acids.

• Assumption 2: The structure that a (globular) protein folds into is the structure with the lowest free energy.

• Finding native-like conformations require: - A scoring function (potential). - A search strategy.

Page 15: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

15

Ab initio prediction can be performed when a proteinhas no detectable homologs.

Protein folding is modeled based on global free-energyminimum estimates.

Ab initio protein structure prediction

Page 16: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

16

Ab initio Prediction

• Sampling the global conformation space– Lattice models / Discrete-state models– Molecular Dynamics

• Picking native conformations with an energy function– Solution model: how protein interacts with water– Pair interactions between amino acids

• Predicting secondary structure– Local homology– Fragment libraries

Page 17: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

17

ROSETTA

• ROSETTA is mainly an ab initio structure prediction algorithm, although various parts of it can be used for other purposes as well (such as homology modeling).

• Rationale

– Local structures often fold

independently of full protein

– Can predict large areas of protein by

matching sequence to I-Sites

DavidDavid BakerBaker

Page 18: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

18

Ab initio Prediction – ROSETTA 1. PSI-BLAST – homology search

Discard sequences with >25% homology

2. PHD

For each 3-long and each 9-long sequence fragment, get 25 structure fragments that match “well”

3. Markov-Chain Monte Carlo method

Insert and remove iteratively one short structure fragment at a time

?? ?

Page 19: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

19

Ab initio Prediction

Page 20: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

20

Protein Threading

• The goal: find the “correct” sequence-structure alignment between a target sequence and its native-like fold in PDB

• Energy function – knowledge (or statistics) based rather than physics based – Should be able to distinguish correct structural folds from

incorrect structural folds

– Should be able to distinguish correct sequence-fold alignment from incorrect sequence-fold alignments

MTYKLILN …. NGVDGEWTYTE

Page 21: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

21

Threading

• Threading is in-between homology-based prediction and molecular modeling

MTYKLILN …. NGVDGEWTYTE

Main difference between homology-based prediction and threading:

Threading uses the structure to compute energy function during alignment

Page 22: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

22

Threading – Overview

• Build a structural template database

• Define a sequence–structure energy function

• Apply a threading algorithm to query sequence

• Perform local refinement of secondary structure

• Report best resulting structural model

Page 23: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

23

Threading – Template Database• FSSP, SCOP, CATH

• Remove pairs of proteins with highly similar structures– Efficiency

– Statistical skew in favor of large families

Page 24: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

24

Threading – Energy Function

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

how well a residue fits a structural environment: Es

how preferable to put two particular residues nearby: Ep

alignment gap penalty: Eg

total energy: wmEm + wsEs + wpEp + wgEg + wssEss

how often a residue mutates to the template residue: Em

compatibility with local secondary structure prediction: Ess

Page 25: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

25

Protein Threading -- algorithm

• Threading algorithm – to find a sequence-structure alignment with the minimum energy– considering only singleton energy and gap penalty

– considering all three energy terms

sequence

fold

links

Page 26: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

26

Protein Threading -- algorithm

• Iterative procedurese.g. repeated 3D-profile alignment

• Double dynamic programming

• Integer programming

Page 27: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

27

Assessing Prediction Reliability

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

Score = -1500 Score = -900Score = -1120Score = -720

Which one is the correct structural fold for the target sequence if any?

The one with the highest score ?

Page 28: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

28

Assessing Prediction Reliability

Template #1: AATTAATACATTAATATAATAAAATTACTGA

Query sequence: AAAA

Template #2: CGGTAGTACGTAGTGTTTAGTAGCTATGAA

Better template?

Which of these two sequences will have better chance to have a good match with the query sequence after randomly reshuffling them?

Page 29: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

29

Assessing Prediction Reliability

• Different template structures may have different background scores, making direct comparison of threading scores against different templates invalid

• Comparison of threading results should be made based on how standout the score is in its background score distribution rather the threading scores directly

Page 30: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

30

Assessing Prediction Reliability

Threading 100,000 sequences against a template structure provides the baseline information about the background scores of the template

By locating where the threading score with a particular query sequence, one can decide how significant the score, and hence the threading result, is!

Not significant significant

E-value

Page 31: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

31

Assessing Prediction Reliability

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

Score = -1500

E-value = e-1

Score = -900

E-value = e-21

Score = -1120

E-value = 0.5 e-1

Score = -720

E-value = e-2

If no predictions have non-significant e-values, a prediction program should indicate that it could not make a prediction!

Page 32: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

32

Prediction of Protein Structures

• Threading against a template database

• Select the hits with good e-values, e.g., < e-10

• Put the backbone atoms in the backbone into the corresponding positions in the aligned residues

FMFTAIGEEVVQRSRKIL- - - DDLVELVK

AVLTRYGQRLIQLYDLLAQIQQKAFDVLS

Unaligned residues will not have 3D coordinates

Page 33: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

33

Prediction of Protein Structures

• Protein threading can predict only the backbone structure of a protein (side-chains have to be predicted using other methods)

• Typically the lower the e-value, the higher the prediction accuracy

Blue: actual structure

Green: predicted structure

predicted actual

Page 34: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

34

Prediction of Protein Structures

• Examples – a few good examples

actual predicted actual

actual actual

predicted

predicted predicted

Page 35: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

35

Prediction of Protein Structures

• Not so good example

Page 36: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

36

Prediction of Protein Structures

• State of the art: ~50% of the soluble proteins in a microbial genome could have correct fold prediction and might be 50% of these proteins have good backbone structure prediction

• Functional inference could be made based on– accurately predicted structures:

– correctly identified structural folds:

Page 37: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

37

Prediction of Protein Structures

• All-atom structures could be predicted through prediction of– prediction of backbone structure

– prediction of sidechain packing• Backbone-dependent rotamers• Ab initio prediction of sidechains

• State of the art – accurate prediction of side chains remains a challenging problem

Page 38: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

38

Structure prediction using additional information

• Some structural information may be available before whole structure is solved

– disulfide bonds– active sites– residues identified buried/exposed– (partial) secondary structure– partial NMR data– inter-residual distances by cross-linking and mass spec– overall shape derived from cryo-EM– …….

• These data can provide highly useful constraints on threading prediction

Page 39: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

39

Structure prediction using additional information

• The basic idea

MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE

Distance or other types of constraints could be derived before the structure is solved, which could help to the structure prediction more accurate

Page 40: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

40

Applications• Many protein structures have been successfully predicted prior to the solution of

their experimental structures (and later were verified by experimental structures)

• Structure predictions of all predicted genes in three microbial genomes, Synechococcus, Procholorococcus MIT/MED

~60% of predicted genes have structural fold assignments

Page 41: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

41

Existing Prediction Programs

• PROSPECT– https://csbl.bmb.uga.edu/protein_pipeline

• FUGU– http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html

• THREADER– http://bioinf.cs.ucl.ac.uk/threader/

Page 42: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

42

CASP: Critical Assessment of Structure Prediction

• A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction, John Moult

• First held in 1994, every 2 years afterwards

• Teams make structure predictions from sequences alone

Page 43: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

43

CASP

• Two categories of predictors– Automated

• Automatic Servers, must complete analysis within 48 hours

• Shows what is possible through computer analysis alone

– Non-automated• Groups spend considerable time and effort on

each target• Utilize computer techniques and human analysis

techniques

Page 44: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

44

CAFASP

GOAL

The goal of CAFASP is to evaluate the performance of fully automatic structure prediction servers available to the community. In contrast to the normal CASP procedure, CAFASP aims to answer the question of how well servers do without any intervention of experts, i.e. how well ANY user using only automated methods can predict protein structure. CAFASP assesses the performance of methods without the user intervention

allowed in CASP.

Page 45: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

45

Performance Evaluation in CAFASP3

Servers

(54 in total)

Sum MaxSub

Score

# correct

(30 FR targets)

3ds5 robetta 5.17-5.25 15-17

pmod 3ds3 pmode3 4.21-4.36 13-14

RAPTOR 3.98 13

shgu 3.93 13

3dsn 3.64-3.90 12-13

pcons3 3.75 12

fugu3 orf_c 3.38-3.67 11-12

… … …

pdbblast 0.00 0

(http://ww.cs.bgu.ac.il/~dfischer/CAFASP3, released in December, 2002.)

Servers with name in italic are meta servers

MaxSub score ranges from 0 to 1

Therefore, maximum total score is 30

Page 46: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

46

One structure where RAPTOR did best

Red: true structure

Blue: correct part of prediction

Green: wrong part of prediction

• Target Size:144

• Super-imposable size within 5A: 118

• RMSD:1.9

Page 47: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

47

Some more results by other programs

Page 48: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

48

Some more results by other programs

Page 49: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

49

Some more results by other programs

Page 50: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

50

Summary of current state of the art

Page 51: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

51

Secondary Structure Prediction

• Given a protein sequence a1a2…aN, secondary structure prediction aims at defining the state of each amino acid ai as being either H (helix), E (extended=strand), or O (other) (Some methods have 4 states: H, E, T for turns, and O for other).

Page 52: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

52

Measures used to evaluated secondary structure predictions

• Percentage of residues predicted ("PP") Percentage of residues for which secondary structure prediction was made (residues were assigned secondary structure with nonzero probability). The number is provided for the reference.

Page 53: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

53

Measures used to evaluated secondary structure predictions

• Qindex: Qindex (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted correctly as helix(H), strand(E), coil(C) or for all three conformational states.

• Qhelix ("Q_H") • Qstrand("Q_S") • Qcoil("Q_C") • Q3 ("Q3")

Page 54: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

54

Qindex

• For a single conformational state:

• where i is either helix, strand or coil.

• For all three states:

Page 55: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

55

Limitations of Q3

ALHEASGPSVILFGSDVTVPPASNAEQAK

hhhhhooooeeeeoooeeeooooohhhhh

ohhhooooeeeeoooooeeeooohhhhhh

hhhhhoooohhhhooohhhooooohhhhh

Amino acid sequence

Actual Secondary Structure

Q3=22/29=76%

Q3=22/29=76%

(useful prediction)

(terrible prediction)

Q3 for random prediction is 33%

Secondary structure assignment in real proteins is uncertain to about 10%; Therefore, a “perfect” prediction would have Q3=90%.

Page 56: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

56

Early methods for Secondary Structure Prediction

• Chou and Fasman(Chou and Fasman. Prediction of protein conformation.

Biochemistry, 13: 211-245, 1974)

• GOR(Garnier, Osguthorpe and Robson. Analysis of the accuracy and implications of simple methods for predicting the

secondary structure of globular proteins. J. Mol. Biol., 120:97-120, 1978)

Page 57: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

57

Chou and Fasman

• Start by computing amino acids propensities to belong to a given type of secondary structure:

)(

)/(

)(

)/(

)(

)/(

iP

TurniP

iP

BetaiP

iP

HelixiP

Propensities > 1 mean that the residue type I is likely to be found in theCorresponding secondary structure type.

Page 58: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

58

Amino Acid -Helix -Sheet Turn Ala 1.29 0.90 0.78 Cys 1.11 0.74 0.80 Leu 1.30 1.02 0.59 Met 1.47 0.97 0.39 Glu 1.44 0.75 1.00 Gln 1.27 0.80 0.97 His 1.22 1.08 0.69 Lys 1.23 0.77 0.96 Val 0.91 1.49 0.47 Ile 0.97 1.45 0.51 Phe 1.07 1.32 0.58 Tyr 0.72 1.25 1.05 Trp 0.99 1.14 0.75 Thr 0.82 1.21 1.03 Gly 0.56 0.92 1.64 Ser 0.82 0.95 1.33 Asp 1.04 0.72 1.41 Asn 0.90 0.76 1.23 Pro 0.52 0.64 1.91 Arg 0.96 0.99 0.88

Chou and Fasman

Favors-Helix

Favors-strand

Favorsturn

Page 59: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

59

Chou and Fasman

Predicting helices:- find nucleation site: 4 out of 6 contiguous residues with P()>1- extension: extend helix in both directions until a set of 4 contiguous residues has an average P() < 1 (breaker)- if average P() over whole region is >1, it is predicted to be helical

Predicting strands:- find nucleation site: 3 out of 5 contiguous residues with P()>1- extension: extend strand in both directions until a set of 4 contiguous residues has an average P() < 1 (breaker)- if average P() over whole region is >1, it is predicted to be a strand

Page 60: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

60

Chou and Fasman

Position-specific parametersfor turn:Each position has distinctamino acid preferences.

Examples:

-At position 2, Pro is highly preferred; Trp is disfavored

-At position 3, Asp, Asn and Gly are preferred

-At position 4, Trp, Gly and Cys preferred

f(i) f(i+1) f(i+2) f(i+3)

Page 61: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

61

Chou and Fasman

Predicting turns:- for each tetrapeptide starting at residue i, compute:

- PTurn (average propensity over all 4 residues)- F = f(i)*f(i+1)*f(i+2)*f(i+3)

- if PTurn > P and PTurn > Pand PTurn > 1 and F>0.000075 tetrapeptide is considered a turn.

Chou and Fasman prediction:

http://fasta.bioch.virginia.edu/fasta_www/chofas.htm

Page 62: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

62

The GOR method

Position-dependent propensities for helix, sheet or turn is calculated for each amino acid. For each position j in the sequence, eight residues on either side are considered.

A helix propensity table contains information about propensity for residues at 17 positions when the conformation of residue j is helical. The helix propensity tables have 20 x 17 entries.Build similar tables for strands and turns.

GOR simplification:The predicted state of AAj is calculated as the sum of the position-dependent propensities of all residues around AAj.

GOR can be used at : http://abs.cit.nih.gov/gor/ (current version is GOR IV)

j

Page 63: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

63

Accuracy

• Both Chou and Fasman and GOR have been assessed and their accuracy is estimated to be Q3=60-65%.

(initially, higher scores were reported, but the experiments set to measure Q3 were flawed, as the test cases included proteins used to derive the propensities!)

Page 64: 1 Protein structure Prediction. 2 Copyright notice Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.

64

-Available servers:

- JPRED : http://www.compbio.dundee.ac.uk/~www-jpred/

- PHD: http://cubic.bioc.columbia.edu/predictprotein/

- PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/

- NNPREDICT: http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html

- Chou and Fassman: http://fasta.bioch.virginia.edu/fasta_www/chofas.htm

Secondary Structure Prediction