Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of...

133
/135 © Burkhard Rost 1 title: Secondary structure prediction 2 short title: cb1_sec2 lecture: Computational Biology 1 - Protein structure (for Informatics) - TUM summer semester

Transcript of Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of...

Page 1: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�1

title: Secondary structure prediction 2short title: cb1_sec2

lecture: Computational Biology 1 - Protein structure (for Informatics) - TUM summer semester

Page 2: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

Videos: YouTube / www.rostlab.org/talks THANKS :. EXERCISES: Special lectures: • Mikal Boden UQ Brisbane No lecture: • 04/26 Security check Rostlab (exercise WILL be) • 05/01 May Day (also no exercise) • 05/08 Student representation (SVV) - exercise WILL happen • 05/10 Ascension Day (also no exercise) • 05/22 Whitsun holiday (also no exercise) • 05/31 Corpus Christi (also no exercise) • 06/21 no lecture (but exercise) LAST lecture: bef: Jul 12 Examen: Jul 12 18-20:00 (room TBA) • Makeup: no makeup (sorry due to overload)

�2

Announcements

Dmitrij Nechaev

Your Name

Lothar Richter

Michael Heinzinger

next

CONTACT: [email protected]© Michael Leunig

Page 3: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

Recap: protein prediction

�3

Page 4: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�4

Goal of structure prediction

Epstein & Anfinsen, 1961:sequence uniquely determines structure

• INPUT: sequence

3D structureand function

• OUTPUT:

Page 5: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�5

Zones

Day

light

Zon

e

Twili

ght Z

one

Mid

nigh

t Zon

eprofile - profile

sequence - profilesequence - sequence

sequ

ence

sim

ilar

->

stru

ctur

e sim

ilar

B Rost (1997) Fold Des 2:S19-24B Rost (1999) Protein Eng 12:85-94

Page 6: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Experimental 3D structure for 1 protein:>$100K

PDB=database of proteins of known 3D structure about 120 k in May 2017

�6

Page 7: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�7structure (PDB id 4lpk): JM Ostrem et al. & KM Shokat (2013) Nature 503:548-51

Comparative modeling predicts 3D structure in silico

pretein seqwence

priteen peqwinse

Query

PDB

Page 8: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Good news: comparative modeling

reliably predicts structure for over 40 million proteins

at 100k/protein this translates to: $4 trillion, i.e. $4x1012: more than the GDP of England and France!

�8

Page 9: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Bad news:For most residues

comparative modeling cannot be applied

�9

Page 10: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�10

Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL

PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL

1281757077

120238169200247114740

904

466268

11831

1241

292449726217

102691

140

1109760691481976248590

690

730

415371597395000

5851300

79586900

EEEEE

EEEEEE

EEEEEEE

EE

EEEEE

EEEEEE

EE

kcal/mol0 -1 -2 -3 -4 -5

1 10 20 30 40 50 60 70 80 90

1

10

20

30

40

50

60

70

80

90

1D1D 2D2D 3D3D

Page 11: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

L Pauling & RB Corey (1953) PNAS 39:247-252L Pauling, RB Corey & HR Branson (1951) PNAS 37:205-234W Kabsch & C Sander (1983) Biopolymers 22:2577-2637

DSSP

�11

Pauling’s H-bond pattern used in DSSP

Page 12: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Science is communication

questions are often the first step

�12

Page 13: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

1D: secondary structure prediction

�13

Page 14: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

Secondary structure prediction 2ndary structure prediction “2D prediction”?

�14

Words

Page 15: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

Secondary structure prediction 2ndary structure prediction 2D prediction

�15

Words

PQITLWQRPLVTIKIGGQLKEALLDTGADDTVL

PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL

1281757077

120238169200247114740

904

466268

11831

1241

292449726217

102691

140

1109760691481976248590

690

730

415371597395000

5851300

79586900

EEEEE

EEEEEE

EEEEEEE

EE

EEEEE

EEEEEE

EE

kcal/mol0 -1 -2 -3 -4 -5

1 10 20 30 40 50 60 70 80 90

1

10

20

30

40

50

60

70

80

90

1D1D 2D2D 3D3D

Page 16: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

DSSP secondary assignment has 8 “states”

�16

Secondary structure prediction

H = HelixG = 310 helixI = Pi helixE = Extended (strand)B = beta-bridge, single strand residueT = Turn, i.e. one turn of helix S = bent“ “ = loop

Page 17: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�17

Local sequence determines secondary structure

LEDKSPDHNPTGID

AKGKPMDRNFTGRNHPPKDSS

AAQVKDALTK

LEQWGTLAQLRAIWEQELTDFPEFLTMMARQETWLGWLTI

helix strand

loop

LAVIGVLMKW

FVFLMIEKIYHKLT

DIRVGLTYYIAQ

VNTFVGTFAAVAHAL

W Kabsch & C Sander (1985) Identical pentapetides with different backbones. Nature 317:207

Page 18: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�18

??

???

How penta-peptides occur in 2 states?

Page 19: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

L Pauling & RB Corey (1953) PNAS 39:247-252L Pauling, RB Corey & HR Branson (1951) PNAS 37:205-234W Kabsch & C Sander (1983) Biopolymers 22:2577-2637

DSSP

�19

Pauling’s H-bond pattern used in DSSP

Page 20: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�20

Helix is local, sheet is not

residuesiandi+3

H-bondresiduesi <-> i+4

Erabutoxin β (3ebx)

H-bondresiduesi <-> i±jj∈[4,L-4]

HELIX (H)

SHEET (E) with 3 strands)

Page 21: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

take known structures find longest consecutive runs of motifs that occur ONLY in one of the three statesH (helix), E (strand), O (other)

�21

Simple method to predict sec str

Page 22: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

First actual prediction method was much simpler

�22

Page 23: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

First step (Szent-Györgyi)Proline breaks a helixHelices span several turns, i.e. >4 residues-> identify helices/non-helices

�23

Simple prediction: frequency

Proline bends main chain

Page 24: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

First step (Szent-Györgyi)Proline breaks a helixHelices span several turns, i.e. >4 residues-> identify helices/non-helices

from Proline to odds for all ....

�24

Simple prediction: frequency

Page 25: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

from Proline to odds for all

�25

Simple prediction: frequency

....,....1....,....2....QEKSPREVTMKKGDILTLLNSTNK E..E EEEEEE

AA D E G I K L M N P Q R S T V

E 1 1 3 1 1 1

L 1 1 1 4 1 1 1 1 2 1

Page 26: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

Robson B & Pain RH (1971) Analysis of the Code Relating Sequence to Conformation in Proteins: Possible Implications for the Mechanism of Formation of Helical Regions. J. Mol. Biol. 58:237-259.Chou PY & Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:211-215.Garnier J, Osguthorpe DJ and Robson B (1978) Analysis of the accuracy and Implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120:97-120.

�26

Secondary structure prediction methods

Page 27: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

1st generation (1957-1978):e.g. Chou-Fasman / GORsingle residue odds

�27

Sec struc pred: 1st gen

p(SEC|AAi)=probability for observing secondary structure state SEC for amino acid AA at position i, j=p(SEC|AAj) - ∀ i ⋀ j

Erabutoxin β (3ebx)

V32 V36 V51

Page 28: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

how to assess performance? problem 1: where to

get secondary structure from?

�28

Page 29: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

L Pauling & RB Corey (1953) PNAS 39:247-252L Pauling, RB Corey & HR Branson (1951) PNAS 37:205-234W Kabsch & C Sander (1983) Biopolymers 22:2577-2637

DSSP

�29

Pauling’s H-bond pattern used in DSSP

Page 30: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

Resource with 3D-coordinates of proteins (RNA & DNA) www.rcsb.org e.g. “Molecule of the Month” 2016/05: over 120,000 molecules

�30

PDB = Protein Data Bank

Num

ber o

f stru

ctur

es in

PD

B

1

10

100

1,000

10,000

100,000

1,000,000

Year

1975

1980

1985

1990

1995

2000

2005

2010

2015

2020

Page 31: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

find unique subset from proteins of known structure (PDB) convert 3D to 1D (secondary structure) with DSSP

�31

Prediction method

Page 32: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

1st generation (1957-1978):e.g. Chou-Fasman / GORsingle residue odds

�32

Sec struc pred: 1st gen

p(SEC|AAi)=probability for observing secondary structure state SEC for amino acid AA at position=p(SEC|AAj) - ∀ i ⋀ j

Erabutoxin β (3ebx)

V32 V36 V51

Page 33: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

1st generation (1957-1978):e.g. Chou-Fasman / GORsingle residue odds

�33

Sec struc pred: 1st gen

Num

ber o

f stru

ctur

es in

PD

B

1

10

100

1,000

10,000

100,000

1,000,000

Year

1975

1980

1985

1990

1995

2000

2005

2010

2015

2020

Page 34: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

how to assess performance? problem 2: how to

measure?

�34

Page 35: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�35

Assessing performance of secondary structure prediction

1.,.,.,.,.10,.,.,.,.20,.,.,.,.30,.,.,.,.40,.,.,.,.50 obs EEEE E EEEEEE EEEEEE EEEEEEEEEEE prd EEHHH EEEE EE HHEE EEEHHH

obs=observed, prd=predicted H: helix, E: strand, ‘ ‘: other

Page 36: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

• Q3 : three-state per-residue accuracy number of correctly predicted residues in states helix, strand, other Q3= ---------------------------------------------------------------------------- number of residues in proteinSchulz GE & Schirmer RH (1979) Prediction of secondary structure from the amino acid sequence. In: (eds). Principles of protein structure. Berlin: Springer-Verlag, pp 108-130.

�36

Secondary structure prediction accuracy

Page 37: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

published: 63% accuracy

Robson B & Pain RH (1971) Analysis of the Code Relating Sequence to Conformation in Proteins: Possible Implications for the Mechanism of Formation of Helical Regions. J. Mol. Biol. 58:237-259.Chou PY & Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:211-215.Garnier J, Osguthorpe DJ and Robson B (1978) Analysis of the accuracy and Implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120:97-120.

�37

Secondary structure prediction methods

Page 38: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

published: 63% accuracy assessed in 1994: 50-55% accuracy

Robson B & Pain RH (1971) Analysis of the Code Relating Sequence to Conformation in Proteins: Possible Implications for the Mechanism of Formation of Helical Regions. J. Mol. Biol. 58:237-259.Chou PY & Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:211-215.Garnier J, Osguthorpe DJ and Robson B (1978) Analysis of the accuracy and Implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120:97-120.

�38

Secondary structure prediction methods

Page 39: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

2nd Generation: how would you

improve?

�39

Page 40: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�40

Segments instead of isolated residues

Erabutoxin β (3ebx)

V32 V36 V51

Page 41: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues 1. generation • Chou-Fasman, GOR 1957-70/80

50-55% accuracy (Q3) segments 2. generation

• GORIII 1986-92 55-60% Q3

• Gibrat J-F, Garnier J and Robson B (1987) Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J. Mol. Biol. 198:425-443.

• Biou V, Gibrat JF, Levin JM, Robson B and Garnier J (1988) Secondary structure prediction: combination of three different methods. Prot. Engin. 2:185-191.

• Garnier J & Robson B (1989) The GOR method for predicting secondary structure in proteins. In: D. FG (eds). Prediction of protein structure and the principles of protein conformation. New York: Plenum Press, pp 417-465.

�41

Secondary structure prediction: 1.+2. Generation

Page 42: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

1st generation (1957-1978): single residue oddse.g. Chou-Fasman/GOR

2nd generation (1983-1992):e.g. GORIIIodds for windows

�42

Sec struc pred: 1st gen

p1(SECi|AAi)=probability for observing secondary structure state SEC for amino acid AA at position i

p(SEC|AAi)=probability for observing secondary structure state SEC for amino acid AA at position i= SUM (j=i-w,i+w) p1(SECj,AAj)

Erabutoxin β (3ebx)

V32 V36 V51

w=3

Page 43: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

50-55% accuracy

segments (2. generation) • GORIII 1986-92

55-60% accuracy

problems • < 100% they said: 65% max

�43

Secondary structure prediction: 1.+2. Generation

Page 44: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�44

Helix formation is local

residuesiandi+3

THYROID hormone receptor (2nll)

Page 45: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

50-55% accuracy

segments (2. generation) • GORIII 1986-92

55-60% accuracy

problems • < 100% may be: 65% max

• < 40% may be: strand non-local

�45

Secondary structure prediction: 1.+2. Generation

Page 46: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�46

β-sheet formation is NOT local

Erabutoxin β (3ebx)

Page 47: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

50-55% accuracy

segments (2. generation) • GORIII 1986-92

55-60% accuracy

problems • < 100% may be: 65% max

• < 40% may be: strand non-local

• short segments

�47

Secondary structure prediction: 1.+2. Generation

B Rost and C Sander (2000) Methods in Molecular Biology 143: 71-95

Page 48: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�48

SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLDOBS EEEE E E E EEEEEE EEEEEE EEEEEEHHHEEEE

TYP EHHHH EE EEEE EE HHHEE EEEHH

Problems of secondary structure predictions (before 1994)

obs EEEE E E E EEEEEE EEEEEE EEEEEEEEEEE prd EEHHH EE EEEE EE HHEE EEEHHH

Page 49: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

INSERT: concept of neural

networks

�49

Page 50: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�50

J11

J12

1

1

1

0

out0 = in1J11 in2J12 +

out = tanh (out0)

Simple Neural Network

Simple neural network

Page 51: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�51

10

Training a neural network 1

Page 52: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�52

10

Errare = (out net - out want) 2

.

1

- 121-1-2

out

in

Training a neural network 2

Page 53: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�53

Error

Junctions

1001

11

11

Training a neural network 3

Page 54: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�54

1001

11

11

.

1

- 121-1-2

out

in

1001

01

12

1001

- 11

12+?

Training a neural network 4

Page 55: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�55

Neural networks classify points

Page 56: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�56

Simple Neural NetworkWith Hidden Layer

outi = f ij2 J ⋅ f jk

1 Jk∑ ⋅ kin#

$%

&

'(

j∑

#

$%%

&

'((

Simple neural network with hidden layer

Page 57: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�57

Principles of networks: input -> output

two steps:1. linear: sum over all input × connection2. non-linear: sigmoid trigger, i.e., project sum onto 0-1

.

:ACACC:

1.0

0input to unit

(=sum)

Σconnectionij*inputjstep 1:

step 2:

outp

utfr

om u

nit

inpu

t = 3

adj

acen

t res

idue

s in

pro

tein

seq

uenc

e

outp

ut =

sec

onda

ry s

truct

ure

stat

e of

cen

tral r

esid

ue

α

L

s1s2s3

Jdecision line

sum

result: < decision line

Page 58: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

outi = ∑i=1

Nin+1

  Jij inj

inj value of input unit j ; outi value of output unit i ; Jij connection between input unit j and output unit i

E = ∑i=1

Nout

  (outi - desi)2

outi value of output unit i ; desi secondary structure stateobserved for central amino acid for output unit i (e.g. fora helix: des1=1, des2=0, des3=0)

• output:

• error:

• free variables: connections { J } • goal:

representation of set of examples (training set) for which the mapping input->output is known, i.e., the secondary structure state of the central residue has been observed by the network

�58

Principles of neural networks: error

Page 59: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

training = change of connections {J} such that E decreases simplest procedure: • gradient descent

�59

Principles of neural networks: training

∆Jij(t+1) = - ε ∂E(t)∂Jij(t) + α ∆Jij(t-1)

where ∂E/∂J is the derivative of the error with respect tothe network connection; t is the algorithmic time given bythe presentation of one example; ε determines the stepwidth of the change (learning strength, typically some0.01); α gives the contribution of the momentum term(∆J(t-1) , typically some 0.2), which permits uphill moves

Error

{ J }

Page 60: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�60

Effect of over-training: theory

100

50

0Training time

Page 61: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�61

Effect of over-training: theory

100

50

0Training time

over-train

Page 62: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�62

Effect of over-training: theory

100

50

0Training time

over-train

toy problems

Page 63: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

what were those two curves?

�63

Page 64: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�64

Effect of over-training: theory

100

50

0Training time

over-train

training set

cross-training testing

validation set

Page 65: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�65

Sketch of simplified cross-validation

Page 66: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�66

Sketch of simplified cross-validation

TRAIN

TESTcross- TRAIN

Page 67: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�67

Effect of over-training: practice

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

num

ber o

f cor

rect

class

ifica

tions

per

exam

ple

0 5 10 15 20 25

number of cycles

ratio for training set

ratio for testing set

Training cycles

Cor

rect

cla

ssifi

catio

ns

testing

training

Page 68: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�68

Effect of over-training: practice

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

num

ber o

f cor

rect

class

ifica

tions

per

exam

ple

0 5 10 15 20 25

number of cycles

ratio for training set

ratio for testing set

Training cycles

Cor

rect

cla

ssifi

catio

ns

testing

training100

50

0Training time

toy problems

Page 69: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

RETURN: secondary structure prediction

�69

Page 70: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

50-55% accuracy

segments (2. generation) • GORIII 1986-92

55-60% accuracy

problems • < 100% they said: 65% max

• < 40% they said: strand non-local

• short segments

�70

Secondary structure predictions of 1. and 2. generation

Page 71: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�71B Rost (1996) Methods in Enzymology 266: 525-39

ACDEFGHIKLMNPQRSTVWY.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

Neural Network for secondary structure

Page 72: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

helix strand otheroverallaccuracymethod

unbalanced 62%

�72

NN predicts secondary structure

neural network

Page 73: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

helix strand otheroverallaccuracymethod

unbalanced 62%

�73

NN predicts secondary structure

neural network

... and developer believes that application of machine learning is all the intelligence he will ever need...

Page 74: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�74

NN sec str: training dynamics

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10

Other Strand Helix

time: 1 step = 20,000 training samples

Perfo

rman

ce

Eµ = oiµ − di

µ( )i∑

2

ΔJµ ∝ - ∂Eµ{J}∂J

Page 75: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

helix strand otheroverallaccuracymethod

unbalanced 62%neural network

�75

NN predicts secondary structure

full pie: all correctly predicted residues

Page 76: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

helix strand otheroverallaccuracymethod

unbalanced 62%comparison:data bankdistribution

�76

NN predicts secondary structure

neural network

full pie: all correctly predicted residues

Page 77: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

helix strand otheroverallaccuracymethod

unbalanced 62%comparison:data bankdistribution

comparison:33:33:33

�77

NN predicts secondary structure

neural network

full pie: all correctly predicted residues

Page 78: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

Eµ = oiµ − di

µ( )i∑

2

ΔJµ ∝ - ∂Eµ{J}∂J

normal training

�78

Balanced training

Page 79: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

E = oiµ − di

µ( )i∑

µ=α ,β,L∑

2

Eµ = oiµ − di

µ( )i∑

2

ΔJµ ∝ - ∂Eµ{J}∂J

normal training

balanced training

�79

Balanced training

Page 80: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�80

Balanced training: dynamics

00.20.40.60.8

1

1 2 3 4 5 6 7 8 9 10

Other Strand Helix

1 2 3 4 5 6 7 8 9 10

1 0.8 0.6 0.4 0.2 0

unbalanced balancedEµ = oi

µ − diµ( )

i∑

2

ΔJµ ∝ - ∂Eµ{J}∂J

train:E = oi

µ − diµ( )

i∑

µ=α ,β,L∑

Page 81: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

helix strand otheroverallaccuracymethod

unbalanced 62%comparison:data bankdistribution

comparison:33:33:33balanced 60%

�81

full pie: all correctly predicted residues

Page 82: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Neural networks DO improve if developer does something more

than dream the machine learning dream...

�82

Page 83: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

50-55% accuracy

segments (2. generation) • GORIII 1986-92

55-60% accuracy

problems • < 100% they said: 65% max

• < 40% they said: strand non-local

• short segments

�83

Secondary structure predictions of 1. and 2. generation

Page 84: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�84

β-sheet formation is NOT local

Erabutoxin β (3ebx)

Page 85: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Conclusion: not all sound

explanations are right!

�85

Page 86: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

single residues (1. generation) • Chou-Fasman, GOR 1957-70/80

50-55% accuracy

segments (2. generation) • GORIII 1986-92

55-60% accuracy

problems • < 100% they said: 65% max

• < 40% they said: strand non-local

• short segments

�86

Secondary structure predictions of 1. and 2. generation

Page 87: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�87

Bad segment prediction

HHHHHHHHHEEEEE

HHHHEEE

HHHHHHHEEEEE

1st level

2nd level

comparison:observed:

SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLDOBS EEEE E E E EEEEEE EEEEEE EEEEEEHHHEEEE

TYP EHHHH EE EEEE EE HHHEE EEEHH

Page 88: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�88

Select samples at random

∆Jij(t+1) = - ε ∂E(t)∂Jij(t) + α ∆Jij(t-1)

where ∂E/∂J is the derivative of the error with respect tothe network connection; t is the algorithmic time given bythe presentation of one example; ε determines the stepwidth of the change (learning strength, typically some0.01); α gives the contribution of the momentum term(∆J(t-1) , typically some 0.2), which permits uphill moves

Error

{ J }

Page 89: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�89

Local correlations in reality

residuesiandi+3

Erabutoxin β (3ebx)

Page 90: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�90

??

???

How to get those into the prediction?

Page 91: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

H

E

L

V (E)

P (E)

A (H)

PHDsec:

structure-to-structure

�91

PHDsec: structure-to-structure network

B Rost (1996) Methods Enzymol 266:525-39

Page 92: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�92

Better segment prediction

HHHHHHHHHEEEEE

HHHHEEE

HHHHHHHEEEEE

1st level

2nd level

comparison:observed:

Page 93: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

.

0

200

400

600

800

1000

1200

0 10 20 30 40 50

Num

ber o

f seg

men

ts

Segment length

0

5

10

15

20

25

25 30 35 40 45 50

DSSPPHD

-800

-600

-400

-200

0

200

400

600

800

0 2 4 6 8 10

helixstrandloop

Diff

eren

ce in

num

ber

of o

bser

ved

- pre

dict

ed se

gmen

tsSegment length

A B

�93

Better prediction of segment lengths

Page 94: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

N Qian & TJ Sejnowski (1988) Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202:865-884.

�94

Structure-to-structure network: Invented?

H

E

L

V (E)

P (E)

A (H)

PHDsec:

structure-to-structure

PHDsec 1993

Page 95: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

More output units, e.g. instead of central residue: take central 31. 9 output units2. average output -> 3 units output back into neural networks:Gianluca Pollastri, Dariusz Przybylski, B Rost and Pierre Baldi (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins: Structure, Function, and Bioinformatics 47:228-235.

�95

Other ideas

Page 96: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

output back into neural networks:

�96

Other ideas

Gianluca Pollastri, Dariusz Przybylski, B Rost and Pierre Baldi (2002) Proteins 47:228-235: Fig. 1

idea: P Frasconi & M Gori (1996) IEEE Trans Neural netw 7:1521-5

Page 97: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

STILL ONLY 60+ε% accuracy.

How to improve beyond that?

�97

Page 98: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�98

How to get more data into it?

?

Page 99: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�99

Evolution has it!

.

0

20

40

60

80

100

0 50 100 150 200 250

Perc

enta

ge se

quen

ce id

entit

y

Number of residues aligned

Sequence identityimplies structural

similarity !

Don't know region

C Sander & R Schneider 1991 Proteins 9:56-68B Rost 1999 Prot Engin 12:85-94

Page 100: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

�100B Rost (1996) Methods Enzymol 266:525-39

Page 101: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF

SH3 Src-homology 3 domain one domain of proteins such as Src tyrosine kinase (STK)

�101

Page 102: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�102

Evolution improves prediction

Evolutionary profile implicitly captures history of and individual protein!

fly

chicken

rat

mouse

human

Page 103: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

Η

Ε

L

>

>

>

pickmaximal

unit=>

currentprediction

J2

inputlayer

first orhidden layer

second oroutput layer

s0 s1 s2J1

:GYIY

DPAVGDPDNGVEP

GTEF:

:GYIY

DPEVGDPTQNIPP

GTKF:

:GYEY

DPAEGDPDNGVKP

GTSF:

:GYEY

DPAEGDPDNGVKP

GTAF:

Alignments

5 . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 5 . .. . . . . . . 2 . . . . . 3 . . . . . .. . . . . . . . . . . . . . . . . 5 . .

. . . . 5 . . . . . . . . . . . . . . .

. . . 5 . . . . . . . . . . . . . . . .

. . 3 . . . . 2 . . . . . . . . . . . .

. . . . 1 . . 2 . . . 2 . . . . . . . .5 . . . . . . . . . . . . . . . . . . .. . . . 5 . . . . . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .. . . . 4 . 1 . . . . . . . . . . . . .. . . . 1 3 . . . 1 . . . . . . . . . .4 . . . . 1 . . . . . . . . . . . . . .. . . . . . . . . . . 4 . 1 . . . . . .. . . 1 . 1 . 1 2 . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .

5 . . . . . . . . . . . . . . . . . . .. . . . . . 5 . . . . . . . . . . . . .. 1 1 . 1 . . 1 1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5 .

GSAPD NTEKQ CVHIR LMYFW

profile table

:GYIY

DPEDGDPDDGVNP

GTDF:

Protein

corresponds to the the 21*3 bits coding for the profile of one residue

�103

PHD: Neural network & evolutionary information

B Rost & C Sander (1993) PNAS 90:7558-62B Rost (1996) Methods Enzymol 266:525-39

Page 104: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�104B Rost & C Sander (1993) PNAS 90:7558-62B Rost (1996) Methods Enzymol 266:525-39

Same idea as for regular secondary structure

ACDEFGHIKLMNPQRSTVWY.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

Η

Ε

L

>

>

>

pickmaximal

unit=>

currentprediction

J2

inputlayer

first orhidden layer

second oroutput layer

s0 s1 s2J1

:GYIY

DPAVGDPDNGVEP

GTEF:

:GYIY

DPEVGDPTQNIPP

GTKF:

:GYEY

DPAEGDPDNGVKP

GTSF:

:GYEY

DPAEGDPDNGVKP

GTAF:

Alignments

5 . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 5 . .. . . . . . . 2 . . . . . 3 . . . . . .. . . . . . . . . . . . . . . . . 5 . .

. . . . 5 . . . . . . . . . . . . . . .

. . . 5 . . . . . . . . . . . . . . . .

. . 3 . . . . 2 . . . . . . . . . . . .

. . . . 1 . . 2 . . . 2 . . . . . . . .5 . . . . . . . . . . . . . . . . . . .. . . . 5 . . . . . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .. . . . 4 . 1 . . . . . . . . . . . . .. . . . 1 3 . . . 1 . . . . . . . . . .4 . . . . 1 . . . . . . . . . . . . . .. . . . . . . . . . . 4 . 1 . . . . . .. . . 1 . 1 . 1 2 . . . . . . . . . . .. . . 5 . . . . . . . . . . . . . . . .

5 . . . . . . . . . . . . . . . . . . .. . . . . . 5 . . . . . . . . . . . . .. 1 1 . 1 . . 1 1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . 5 .

GSAPD NTEKQ CVHIR LMYFW

profile table

:GYIY

DPEDGDPDDGVNP

GTDF:

Protein

corresponds to the the 21*3 bits coding for the profile of one residue

sing

le s

eque

nce

alignment

ACDEFGHIKLMNPQRSTVWY.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

Page 105: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

25%

80

100%

number of residues alignedSequ

ence

iden

tity

filterMaxHom

sequencedata bank

protein Aprotein B

:protein N

protein Aprotein C

:protein M

MaxHom

BLAST

11

22

33

ext ractal ignment

PHD

U

�105

From sequence to profile

B Rost (1996) Methods Enzymol 266:525-39

Page 106: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

P H D s e c

H

L

E

4+1""""""

20444

outputlayer

inputlayer

hiddenlayer

20444

21+3""""""

H

L

E

0.5

0.1

0.4percentage of each amino acid in proteinlength of protein (≤60, ≤120, ≤240, >240)distance: centre, N-term (≤40,≤30,≤20,≤10)distance: centre, C-term (≤40,≤30,≤20,≤10)

input global in sequence

input local in sequence

localalign-ment13

adjacentresidues

:::AAAAA.LLLLIIAAGCCSGVV:::

globalstatist.wholeprotein

%AALength∆ N-term∆ C-term

A C L I G S V ins del cons100 0 0 0 0 0 0 0 0 1.17100 0 0 0 0 0 0 33 0 0.42 0 0 100 0 0 0 0 0 33 0.92 0 0 33 66 0 0 0 0 0 0.74 66 0 0 0 33 0 0 0 0 1.17 0 66 0 0 0 33 0 0 0 0.74 0 0 0 33 0 0 66 0 0 0.48

first levelsequence-to- structure

second levelstructure-to- structure

�106

PHDsec: more details

B Rost (1996) Methods Enzymol 266:525-39

Page 107: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�107

Jury

centre of mass = jury over 1-4

architecture 3architecture 4

singlenetworkvs.jurydecision

architecture 2architecture 1

Page 108: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�108

PROFsec: Evolutionary information + more

B Rost (2001) J Struct Biol 134, 204-18

Page 109: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

HEADER CYTOSKELETONCOMPND ALPHA SPECTRIN (SH3 DOMAIN) �SOURCE CHICKEN (GALLUS GALLUS) BRAINAUTHOR M.NOBLE,R.PAUPTIT,A.MUSACCHIO,M.SARASTE

�109

Spectrin homology domain (SH3)

59%65%

72%

Page 110: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�110

Prediction accuracy varies!

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

Page 111: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�111

Stronger predictions more accurate!

.

0

20

40

60

80

100

0

20

40

60

80

100

3 4 5 6 7 8 9

Q per protein3 fit: Q3fit = 21 + 8.7 * Q

3

Q3 p

er p

rote

in

Reliability index averaged over protein

ACDEFGHIKLMNPQRSTVWY.

H

E

L

D (L)

R (E)

Q (E)

G (E)

F (E)

V (E)

P (E)

A (H)

A (H)

Y (H)

V (E)

K (E)

K (E)

H=0.5E=0.4L=0.1

H=0.8E=0.1L=0.1

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

Page 112: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�112

Correct prediction of correctly predicted residues.

70

75

80

85

90

95

100

0 20 40 60 80 100

PHDsec

PHDacc

PHDhtm

70

75

80

85

90

95

100RI=9

RI=0RI=9

RI=0

RI=9

RI=4

7

over

all p

er-r

esid

ue a

ccur

acy

percentage of resdidues predicted

Page 113: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�113

False prediction for engineered proteins!

GB1: IgG-binding domain of protein G (CHAMELEON) Kim & Berg, Nature, 366, 267-270, 1993

....,....1....,....2....,....3....,....4....,....5....,..AA TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKDSSP EEEEEEE EEEEEEEEE HHHHHHHHHHHHHHHHH EEEEEEE EEEEEEEE

PHD 30 EEEEEE E EEHHHHHHHHHHHHHHEEE EEEEEE EEEEEPHD no EEEEEE EEEEEHHHHHHHHHHHHHHHH EEEEE EEEEEE

AATAEKVFKQY AWTVEKAFKTFPHD 30 EEEEEE EEEEEEE HHHHHHHHHEEE EEEE EEEEEEPHD no EEEEEE EEEEEEHHHHHHHHHHHHHHH EEEEE EEEEEE

EWTYDDATKTF AWTVEKAFKTFPHD 30 EEEEEE EEE EHHHHHHHHHHHHHHHH EEEEE EEEEEEPHD no EEEEEE E E EHHHHHHHHHHHHHHHH HHHHHHH EEEEE

AWTVEKAFKTF HHHHH

Page 114: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost

Proper comparison of methods

�114

Page 115: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Method A=60% Method B=63%

B better?

�115

Page 116: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

same measure?e.g. both Q3?

�116

Method A=60% B=63%, B better?

Page 117: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set

�117

Method A=60% B=63%, B better?

Page 118: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set: note both used 100 proteins, and both used random splits to take one half for testing,ok?

�118

Method A=60% B=63%, B better?

Page 119: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set: must contain ALL available proteins!

�119

Method A=60% B=63%, B better?

Page 120: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set: must contain ALL available proteins! split training/testing: random ok?

�120

Method A=60% B=63%, B better?

Page 121: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set: must contain ALL available proteins! split training/testing: must ascertain that there was NO overlap between sets.Overlap defined as, e.g. comparative modeling cannot be applied

�121

Method A=60% B=63%, B better?

B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69

Page 122: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set: must contain ALL available proteins! split training/testing: must ascertain that there was NO overlap between sets. 63-60=3significant?

�122

Method A=60% B=63%, B better?

Page 123: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

use same (meaningful) measure e.g. both Q3 same data set: must contain ALL available proteins! split training/testing: must ascertain that there was NO overlap between sets. 63-60=3, whether significant or not depends on distribution and number:

�123

Method A=60% B=63%, B better?

Page 124: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�124

DeltaQ3=3%, 100 proteins->significant?

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

Page 125: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�125

DeltaQ3=3% for 100 proteins is significant!

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60 70 80 90 100

Num

ber o

f pro

tein

cha

ins

Per-residue accuracy (Q3)

<Q3>=72.3% ; sigma=10.5%

1spf

1bct

1stu

3ifm

1psm

rule-of-thumb:

Stderror=sigma/sqrt(proteins)

here: StdErr=10.5/sqrt(100)=±1.05

> DeltaQ3=3

-> statistically significant

Page 126: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Method B 20 years older than A, still better?

�126

Page 127: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Difference statistically signficant

-> age no difference!

�127

Page 128: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

Any other test to do?

(mind you B is 20 years old)

�128

Page 129: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

© Burkhard Rost /135

pre-release test: ideally use data added after both methods had been

developed�129

Page 130: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�130

Cross-validation: how to

150

0

150

TrainTest

Table 1Nhidden Q315 6230 6445 63

Conclusion:Q3=64%best method has 30 hidden units

Page 131: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�131

Cross-validation: how to

150

0

150

TrainTest

Table 1Nhidden Q315 6230 6445 63

Conclusion:Q3=64%best method has 30 hidden units

OK?

Page 132: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

�132

Cross-validation: need 3 sets!

100

100

100TrainTest

Cross-train

Table 1

Nhid cross-train test

15 62 60

30 64 61

45 63 62

Conclusion:Q3=61%best method has 30 hidden units

Page 133: Secondary structure prediction 2 cb1 sec2 · 2018. 5. 29. · structure prediction: combination of three different methods. Prot. Engin. 2:185-191. • Garnier J & Robson B (1989)

/135© Burkhard Rost

01: 04/10 Tue: No lecture 02: 04/12 Thu: No lecture 03: 04/17 Tue: No lecture 04: 04/19 Thu: Intro 1: organization of lecture: intro into cells & biology 05: 04/24 Tue: Intro 2: amino acids, protein structure (comparison), domains 06: 04/26 Thu: No lecture 07: 05/01 Tue: SKIP: May Day 08: 05/03 Thu: Alignment 1 09: 05/08 Tue: SKIP: Student Representation (SVV) 10: 05/10 Thu: SKIP: Ascension Day 11: 05/15 Tue: Alignment 2 12: 05/17 Thu: Comparative modeling & exp structure determination & secondary structure assignment 13: 05/22 Tue: SKIP: Whitsun holiday 14: 05/24 Thu: Comparative modeling 2 & 1D: Secondary structure prediction 1 15: 05/29 Tue: 1D: Secondary structure prediction 2 16: 05/31 Thu: SKIP: Corpus Christi 17: 06/05 Tue: 1D: Secondary structure prediction 3 & Transmembrane structure prediction 1 18: 06/07 Thu: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction 19: 06/12 Tue: 1D: Transmembrane structure prediction 3 / Solvent accessibility prediction 20: 06/14 Thu: 1D: Disorder prediction 21: 06/19 Tue: 2D prediction / 3D prediction 22: 06/21 Thu: No lecture 23: 06/26 Tue: recap 1 24: 06/28 Thu: recap 2 25: 07/03 Tue: TBA 26: 07/05 Thu: TBA 27: 07/10 Tue: TBA 28: 07/12 Thu: TBA

�133

Lecture plan (CB1 structure: INF)

today