Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The -Helix Fold From Protein...

46
Finding the Beta Helix Motif By Marcin Mejran
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The -Helix Fold From Protein...

Page 1: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Finding the Beta Helix Motif By Marcin Mejran

Page 2: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Papers

Predicting The -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger

Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition by Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi Gopalakrishnan

Page 3: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Secondary StructureBeta Strand

• Forms -sheets

Alpha Helix• Stand alone

Can combine into more complex structures:

• Beta sheets

• Beta Helixes

Images from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html

Page 4: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

sheet

Page 5: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Second and a half Structure

beta helix

beta barrel

beta trefoil

Page 6: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

-Helix

Page 7: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

-Helix

Helix composed of three parallel sheets

Three -strands per “rung”

Connecting “loops” Not in Eukaryotes Secreted by various

bacteria Right and left handed

Page 8: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

-Helix Few solved

structures9 SCOP

SuperFamilies14 RH solved

structures in PDB Solved structures

differ widely

B3T2

B2

B1

Page 9: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

-Helix

T2 turn: unique two residue loop

-strands are 3 to 5 residues.

T1 and T3 vary in size, may contain secondary structures

-strands interact between rungs

Page 10: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

-Helix

Good choice from computational point of view

“Nice” structure Repeating parallel -stands Rungs have similar structure Stacking is predictable Well conserved -stand across super-

families

Page 11: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

-Helix

Long term interactions Close in 3D but not 1D

“Non-unique” features B2-T2-B3 segment

Unique features not clearly shown in sequence

Usual methods don’t workImage from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.html

Page 12: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap

“Wraps” sequences around helix Finds best “wrap” Uses B2, B3 strands and T2 turn

Rest of rung varies greatly in size

Decomposes into sub-problems Rungs Find multiple rungs Find B1 by local optimization

Page 13: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Hydrophobic/charged

HydrophobicDislikes Water

HydrophilicLike water

ChargedOn Outside

B3T2

B2

B1

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

Page 14: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Rungs

Given a T2 turn, find the next T2 turn

B2

B3 T2Candidate

Rung

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

Page 15: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Rungs More weight given to

inward pairs Certain stacked

Amino Acids preferred

Penalty for highly charged inward residues

Penalizes too few or too many residues

B3T2

B2

B1

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

Page 16: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Multiple Rungs

Find multiple initial B2-T2-B3 segments

Match pattern based on hydrophobic residues (appear on the inside)

Φ – A,F,I,L,M,V,W,Y

– D,E,R,K

X - Any

AFDEMVRKYE FIFDDEAK EDEMVMVFD

Page 17: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Multiple Rungs

DP is used to find 5 rungs in either direction from initial positions

α-helix filtering Take average score

of top 10 remaining wraps

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

Page 18: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Completing

Find B1 positionsHighest scoring parseDoes not affect wrap

score. Further filtering on

hydrophobic residues in T1 and T2

Page 19: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Training Seven fold cross-validation

Partitioned based on families Scores calculated for

α-helix filtering thresholdB1-score thresholdHydrophobic count thresholddistribution of unmatched residues between

rungs

Image from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtml

Page 20: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Results

Page 21: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: Results

Correctly identifies Beta-Helixes Correctly separates helixes and non-helixes Can predict -helixes across families

Page 22: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

BetaWrap: SummaryPros: Finds beta-helixes AccurateCons: Still makes errors

Rung placement Hard coded information

Over-fittingHard to generalize

Page 23: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Conditional Random Fields (CRFs)

y1

x1

y2

x2

y3

x3

y4

x4

y5

x5

y6

x6

…HMM

y1

x1

y2

x2

y3

x3

y4

x4

y5

x5

y6

x6

…CRF

Page 24: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Hidden Markov Model

Set of States Transition Probabilities Emission Probabilities Only given sequence of

emitted residues Find sequence of true

states Generative

Res ProbA .2B .8

Res ProbA .2B .8

Res ProbA .2B .8

Page 25: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Hidden Markov Model HMM: Maximize

P(x,y|θ) = P(y|x,θ)P(x|θ)x: emitted state/given sequencey: “hidden”/true stateP(x,y|θ): Joint probability of x and yP(y|x,θ): Probability of y given xP(x|θ): Probability of x

Need to make assumptions about the distribution of x

Page 26: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Viterbi Algorithm HMM

Find most likely path/most likely sequence of hidden states

e3(x1)

e2(x1)

e1(x1)

e3(x2)

e2(x2)

e1(x2)

e3(x3)

e2(x3)

e1(x3)

e3(x4)

e2(x4)

e1(x4)

x1 x2 x3 x4

Page 27: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Viterbi Algorithm HMM

e3(x1)

e2(x1)

e1(x1)

e3(x2)

e2(x2)

e1(x2)

e3(x3)

e2(x3)

e1(x3)

e3(x4)

e2(x4)

e1(x4)

x1 x2 x3 x4

v(i,j) = max(v(i-1,1)*t1,j*ej(xi), v(i-1,2)*t2,j*ej(xi) … v(i-k,1)*tk,j*ej(xi))

Page 28: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

HMM Disadvantages There is a strong independence assumption Long term interactions are difficult to model Overlapping features are difficult to model

Page 29: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Conditional Random Fields (CRFs) Replace transition and emission probabilities with a set

of feature functions f(i,j,k) Feature functions based on all xs, not just one Not generative

f(3,0,1)

f(2,0,1)

f(1,0,1)

f(3,i,2)

f(2,i,2)

f(1,i,2)

f(3,i,3)

f(2,i,3)

f(1,i,3)

f(3,i,4)

f(2,i,4)

f(1,i,4)

x1 x2 x3 x4

Page 30: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Conditional Random Fields (CRFs)

HMM: Maximize

P(x,y|θ)=P(y|x,θ)P(x|θ) CRF: Maximize

P(y|x,θ) Do not make assumptions about

underlying distribution

Page 31: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Viterbi CRFs Same method as for HMM

f(3,0,1)

f(2,0,1)

f(1,0,1)

f(3,i,2)

f(2,i,2)

f(1,i,2)

f(3,i,3)

f(2,i,3)

f(1,i,3)

f(3,i,4)

f(2,i,4)

f(1,i,4)

x1 x2 x3 x4

Page 32: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Conditional Random Fields (CRFs) States should form a chain Likelihood function is convex for chain

Z0 = number of states

λk = weights

Page 33: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Segmented CRFs Each state corresponds to a structure Represented as a graph G

States represent secondary structures Nodes represent interactions Chains are nicer than graphs

Page 34: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Segmented CRFs G =<V,E1,E2>

E1: Edges between neighborsE2: Edges for long-term interactions

E1 edges can be implied in model

Page 35: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Only E2 needs to be explicitly considered

However Graph needs to be a chain for E2 Deterministic state transitions

Page 36: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Beta-Helix CRF

Page 37: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Beta-Helix CRF

Combined states B23: B2,B3,T2

Size assumptions: B23: 8 residues B1: 3 residues T1,T3: 1 to 80

res.

Page 38: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Intra-Node Features

Regular Expression Template for B23

FIFDDEAK

Φ – A,F,I,L,M,V,W,Y

– D,E,R,K

X - Any

Page 39: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Intra-Node Features

Probabilistic motif profiles for B23 and B1 Use HMMER to generate profiles from known

B23 and B1 sequences

Page 40: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Intra-Node Features

Secondary Structure PredictionPSIPREDHelps locate T1 and T376 to 78% accuracy for α-helixes and coils

Segment length for T1 and T3Estimated as density function

Page 41: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Inter-Node Features

Side chain alignment scoresAlignment between

B23 regionsMore weight given to

inward pairs

B3T2

B2

Page 42: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Inter-Node Features

Parallel Beta-sheet alignment scores

Distance between adjacent B23 segments

Page 43: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

SCRF: Results

Page 44: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

SCRF: Results

Page 45: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Summary

Discovered new beta-helix proteinSf6 gp14

Detected beta-helixes in plantsNone known of before

More robust than BetaWrap

Page 46: Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Questions