Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The -Helix Fold From Protein...

Post on 19-Dec-2015

223 views 0 download

Tags:

Transcript of Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The -Helix Fold From Protein...

Finding the Beta Helix Motif By Marcin Mejran

Papers

Predicting The -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger

Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition by Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi Gopalakrishnan

Secondary StructureBeta Strand

• Forms -sheets

Alpha Helix• Stand alone

Can combine into more complex structures:

• Beta sheets

• Beta Helixes

Images from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html

sheet

Second and a half Structure

beta helix

beta barrel

beta trefoil

-Helix

-Helix

Helix composed of three parallel sheets

Three -strands per “rung”

Connecting “loops” Not in Eukaryotes Secreted by various

bacteria Right and left handed

-Helix Few solved

structures9 SCOP

SuperFamilies14 RH solved

structures in PDB Solved structures

differ widely

B3T2

B2

B1

-Helix

T2 turn: unique two residue loop

-strands are 3 to 5 residues.

T1 and T3 vary in size, may contain secondary structures

-strands interact between rungs

-Helix

Good choice from computational point of view

“Nice” structure Repeating parallel -stands Rungs have similar structure Stacking is predictable Well conserved -stand across super-

families

-Helix

Long term interactions Close in 3D but not 1D

“Non-unique” features B2-T2-B3 segment

Unique features not clearly shown in sequence

Usual methods don’t workImage from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.html

BetaWrap

“Wraps” sequences around helix Finds best “wrap” Uses B2, B3 strands and T2 turn

Rest of rung varies greatly in size

Decomposes into sub-problems Rungs Find multiple rungs Find B1 by local optimization

Hydrophobic/charged

HydrophobicDislikes Water

HydrophilicLike water

ChargedOn Outside

B3T2

B2

B1

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

BetaWrap: Rungs

Given a T2 turn, find the next T2 turn

B2

B3 T2Candidate

Rung

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

BetaWrap: Rungs More weight given to

inward pairs Certain stacked

Amino Acids preferred

Penalty for highly charged inward residues

Penalizes too few or too many residues

B3T2

B2

B1

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

BetaWrap: Multiple Rungs

Find multiple initial B2-T2-B3 segments

Match pattern based on hydrophobic residues (appear on the inside)

Φ – A,F,I,L,M,V,W,Y

– D,E,R,K

X - Any

AFDEMVRKYE FIFDDEAK EDEMVMVFD

BetaWrap: Multiple Rungs

DP is used to find 5 rungs in either direction from initial positions

α-helix filtering Take average score

of top 10 remaining wraps

Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

BetaWrap: Completing

Find B1 positionsHighest scoring parseDoes not affect wrap

score. Further filtering on

hydrophobic residues in T1 and T2

Training Seven fold cross-validation

Partitioned based on families Scores calculated for

α-helix filtering thresholdB1-score thresholdHydrophobic count thresholddistribution of unmatched residues between

rungs

Image from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtml

BetaWrap: Results

BetaWrap: Results

Correctly identifies Beta-Helixes Correctly separates helixes and non-helixes Can predict -helixes across families

BetaWrap: SummaryPros: Finds beta-helixes AccurateCons: Still makes errors

Rung placement Hard coded information

Over-fittingHard to generalize

Conditional Random Fields (CRFs)

y1

x1

y2

x2

y3

x3

y4

x4

y5

x5

y6

x6

…HMM

y1

x1

y2

x2

y3

x3

y4

x4

y5

x5

y6

x6

…CRF

Hidden Markov Model

Set of States Transition Probabilities Emission Probabilities Only given sequence of

emitted residues Find sequence of true

states Generative

Res ProbA .2B .8

Res ProbA .2B .8

Res ProbA .2B .8

Hidden Markov Model HMM: Maximize

P(x,y|θ) = P(y|x,θ)P(x|θ)x: emitted state/given sequencey: “hidden”/true stateP(x,y|θ): Joint probability of x and yP(y|x,θ): Probability of y given xP(x|θ): Probability of x

Need to make assumptions about the distribution of x

Viterbi Algorithm HMM

Find most likely path/most likely sequence of hidden states

e3(x1)

e2(x1)

e1(x1)

e3(x2)

e2(x2)

e1(x2)

e3(x3)

e2(x3)

e1(x3)

e3(x4)

e2(x4)

e1(x4)

x1 x2 x3 x4

Viterbi Algorithm HMM

e3(x1)

e2(x1)

e1(x1)

e3(x2)

e2(x2)

e1(x2)

e3(x3)

e2(x3)

e1(x3)

e3(x4)

e2(x4)

e1(x4)

x1 x2 x3 x4

v(i,j) = max(v(i-1,1)*t1,j*ej(xi), v(i-1,2)*t2,j*ej(xi) … v(i-k,1)*tk,j*ej(xi))

HMM Disadvantages There is a strong independence assumption Long term interactions are difficult to model Overlapping features are difficult to model

Conditional Random Fields (CRFs) Replace transition and emission probabilities with a set

of feature functions f(i,j,k) Feature functions based on all xs, not just one Not generative

f(3,0,1)

f(2,0,1)

f(1,0,1)

f(3,i,2)

f(2,i,2)

f(1,i,2)

f(3,i,3)

f(2,i,3)

f(1,i,3)

f(3,i,4)

f(2,i,4)

f(1,i,4)

x1 x2 x3 x4

Conditional Random Fields (CRFs)

HMM: Maximize

P(x,y|θ)=P(y|x,θ)P(x|θ) CRF: Maximize

P(y|x,θ) Do not make assumptions about

underlying distribution

Viterbi CRFs Same method as for HMM

f(3,0,1)

f(2,0,1)

f(1,0,1)

f(3,i,2)

f(2,i,2)

f(1,i,2)

f(3,i,3)

f(2,i,3)

f(1,i,3)

f(3,i,4)

f(2,i,4)

f(1,i,4)

x1 x2 x3 x4

Conditional Random Fields (CRFs) States should form a chain Likelihood function is convex for chain

Z0 = number of states

λk = weights

Segmented CRFs Each state corresponds to a structure Represented as a graph G

States represent secondary structures Nodes represent interactions Chains are nicer than graphs

Segmented CRFs G =<V,E1,E2>

E1: Edges between neighborsE2: Edges for long-term interactions

E1 edges can be implied in model

Only E2 needs to be explicitly considered

However Graph needs to be a chain for E2 Deterministic state transitions

Beta-Helix CRF

Beta-Helix CRF

Combined states B23: B2,B3,T2

Size assumptions: B23: 8 residues B1: 3 residues T1,T3: 1 to 80

res.

Intra-Node Features

Regular Expression Template for B23

FIFDDEAK

Φ – A,F,I,L,M,V,W,Y

– D,E,R,K

X - Any

Intra-Node Features

Probabilistic motif profiles for B23 and B1 Use HMMER to generate profiles from known

B23 and B1 sequences

Intra-Node Features

Secondary Structure PredictionPSIPREDHelps locate T1 and T376 to 78% accuracy for α-helixes and coils

Segment length for T1 and T3Estimated as density function

Inter-Node Features

Side chain alignment scoresAlignment between

B23 regionsMore weight given to

inward pairs

B3T2

B2

Inter-Node Features

Parallel Beta-sheet alignment scores

Distance between adjacent B23 segments

SCRF: Results

SCRF: Results

Summary

Discovered new beta-helix proteinSf6 gp14

Detected beta-helixes in plantsNone known of before

More robust than BetaWrap

Questions