Probabilistic Methods for Interpreting Electron-Density Maps

76
Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department [email protected]

description

Probabilistic Methods for Interpreting Electron-Density Maps. Frank DiMaio University of Wisconsin – Madison Computer Sciences Department [email protected]. 3D Protein Structure. backbone. backbone sidechain. backbone sidechain C -a l p h a. ALA. LEU. PRO. VAL. ARG. ?. ?. ?. - PowerPoint PPT Presentation

Transcript of Probabilistic Methods for Interpreting Electron-Density Maps

Page 1: Probabilistic Methods for Interpreting Electron-Density Maps

Probabilistic Methods for Interpreting Electron-Density Maps

Frank DiMaio

University of Wisconsin – Madison Computer Sciences Department

[email protected]

Page 2: Probabilistic Methods for Interpreting Electron-Density Maps

3D Protein Structure

backbonebackbonesidechainbackbonesidechainC-alpha

Page 3: Probabilistic Methods for Interpreting Electron-Density Maps

3D Protein Structure

ALALEU PRO VAL

ARG

… …

?? ?? ??

Page 4: Probabilistic Methods for Interpreting Electron-Density Maps

High-Throughput Structure Determination

Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design

Some proteins produce poor density maps Interpreting poor electron-density maps is very

(human) laborious I aim to automatically interpret

poor-quality electron-density maps

Page 5: Probabilistic Methods for Interpreting Electron-Density Maps

Electron-Density Map Interpretation

……

GIVEN: 3D electron-density map,(linear) amino-acid sequence

Page 6: Probabilistic Methods for Interpreting Electron-Density Maps

Electron-Density Map Interpretation

……

FIND: All-atom Protein Model

Page 7: Probabilistic Methods for Interpreting Electron-Density Maps

My focus

Density Map Resolution

Morris et al. (2003) Ioerger et al. (2002)Terwilliger (2003)

2.0Å 3.0Å 4.0Å1.0Å

Page 8: Probabilistic Methods for Interpreting Electron-Density Maps

Thesis Contributions

A probabilistic approach to protein-backbone tracingDiMaio et al., Intelligent Systems for Molecular Biology (2006)

Improved template matching in electron-density mapsDiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007)

Creating all-atom protein models using particle filteringDiMaio et al. (under review)

Pictorial structures for atom-level molecular modelingDiMaio et al., Advances in Neural Information Processing Systems (2004)

Improving the efficiency of belief propagationDiMaio and Shavlik, IEEE International Conference on Data Mining (2006)

Iterative phase improvement in ACMI

Page 9: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Page 10: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Page 11: Probabilistic Methods for Interpreting Electron-Density Maps

5-mer Lookup

…SAW C VKFEKPADKNGKTE…

ProteinDB

ACMI searches map for each template independently Spherical-harmonic decomposition allows rapid search

of all template rotations

Page 12: Probabilistic Methods for Interpreting Electron-Density Maps

Spherical-Harmonic Decomposition

f (θ,φ)

Page 13: Probabilistic Methods for Interpreting Electron-Density Maps

5-mer Fast Rotation Search

pentapeptide fragmentfrom PDB (the “template”)

electron density map

calculated (expected)density in 5A sphere

map-region sampled in

spherical shells

template-density sampled in

spherical shells

sampled region ofdensity in 5A sphere

Page 14: Probabilistic Methods for Interpreting Electron-Density Maps

5-mer Fast Rotation Search

map-region sampled in

spherical shells

template-density sampled in

spherical shells

template spherical-harmonic coefficients

map-region spherical-harmonic coefficients

correlationcoefficientas functionof rotation

fast-rotation function

(Navaza 2006, Risbo 1996)

Page 15: Probabilistic Methods for Interpreting Electron-Density Maps

Convert Scores to Probabilities

correlation coefficientsover density map ti (ui)

scan density map for fragment

probability distribution

over density mapP(5-mer at ui | EDM)

Bayes’rule

Page 16: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Page 17: Probabilistic Methods for Interpreting Electron-Density Maps

Probabilistic Backbone Model Trace assigns a position and orientation

ui={xi, qi} to each amino acid i

The probability of a trace U = {ui} is

1( | ) ( | )NP P u u U EDM EDM

This full joint probability intractable to compute

Approximate using pairwise Markov field

Page 18: Probabilistic Methods for Interpreting Electron-Density Maps

Pairwise Markov-Field Model

Joint probabilities defined on a graph as product of vertex and edge potentials

AAs ( | )i i

iu

EDMAAs ,

( , )ij i ji j

u u ( | )P U EDM

GLY LYS LEU SERALA

Page 19: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI’s Backbone Model

Observational potentials tie the map to the model

LEU SERGLY LYSALA

Page 20: Probabilistic Methods for Interpreting Electron-Density Maps

GLY LYS LEU SERALA

ACMI’s Backbone Model

Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in proper orientation

Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space

Page 21: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Model Potential

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

EDM

Page 22: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Model Potential

Constraints between adjacent amino acids

×),( jiadj uu ) |||| ( jix xxp ),( ji uup=

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

EDM

Page 23: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Model Potential

otherwise1

if0),(

K||x||xuu ji

jiocc

( | )p U EDM

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

EDM

Constraints between all other amino acid pairs

Page 24: Probabilistic Methods for Interpreting Electron-Density Maps

2 2

( | )

Pr(5mer ... at )i i

i i i

u

s s u

EDM

( | )p U EDM

Backbone Model Potential

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

EDM

Observational (“template-matching”) probabilities

Page 25: Probabilistic Methods for Interpreting Electron-Density Maps

Inferring Backbone Locations Want to find backbone layout that maximizes

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

EDM

Page 26: Probabilistic Methods for Interpreting Electron-Density Maps

Inferring Backbone Locations

Exact methods are intractable Use belief propagation (Pearl 1988)

to approximate marginal distributions

Want to find backbone layout that maximizes

, ku k i( | ) ( | )i ip u p EDM U EDM

AAs , AAs , AAs | | 1 | | 1

( , ) ( , ) ( | )adj i j occ i j i ii j i j i

i j i j

u u u u u

EDM

Page 27: Probabilistic Methods for Interpreting Electron-Density Maps

Belief Propagation Example

LYS31 LEU32

mLYS31→LEU32

pLEU32pLYS31ˆ ˆ

Page 28: Probabilistic Methods for Interpreting Electron-Density Maps

Belief Propagation Example

LYS31 LEU32

mLEU32→LYS31

pLEU32pLYS31ˆ ˆ

Page 29: Probabilistic Methods for Interpreting Electron-Density Maps

Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map

O(G2) computation for each message passed O(G log G) as Fourier-space multiplication

O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator

Improved implementation O(NG log G)

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

Page 30: Probabilistic Methods for Interpreting Electron-Density Maps

Naïve implementation O(N2G2) N = the number of amino acids in the protein G = # of points in discretized density map

O(G2) computation for each message passed O(G log G) as Fourier-space multiplication

O(N2) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator

Improved implementation O(NG log G)

Scaling BP to Proteins(DiMaio and Shavlik, ICDM 2006)

Page 31: Probabilistic Methods for Interpreting Electron-Density Maps

To pass a message

( , )occ i ju u1ˆ ( )

ni i

i

p udu

( )ni j j

ui

m u 1 ( )nj i im u

Occupancy Message Approximation

occupancyedge potential

product of incoming msgs to i except from j

Page 32: Probabilistic Methods for Interpreting Electron-Density Maps

To pass a message

1ˆ( ) ( , ) ( ) n ni occ i i i i

ui

m u u u p u du

( , )occ i ju u1ˆ ( )

ni i

i

p udu

( )ni j j

ui

m u 1 ( )nj i im u

Occupancy Message Approximation

occupancyedge potential

product of all incoming msgs to i

“Weak” potentials between nonadjacent amino acids lets us approximate

Page 33: Probabilistic Methods for Interpreting Electron-Density Maps

1 5 62 3 4

Occupancy Message Approximation

3

3

1 3

ˆocc

x

p

m

3

3

5 3

ˆocc

x

p

m

3

3

6 3

ˆocc

x

p

m

Page 34: Probabilistic Methods for Interpreting Electron-Density Maps

1 5 62 3 4

Occupancy Message Approximation

3

3ˆocc

x

p 3

3ˆocc

x

p 3

3ˆocc

x

p

Page 35: Probabilistic Methods for Interpreting Electron-Density Maps

1 5 62 3 4

Occupancy Message Approximation

Send outgoing occupancy message product to a central accumulator

AAs

( )i

ACC x ( )im x

ACC

Page 36: Probabilistic Methods for Interpreting Electron-Density Maps

1 5 62 3 4

Occupancy Message Approximation

ACC

Then, each node’s incoming message product is computed in constant time

3 3p̂

ACC

2m 3m 4m

2 3m 4 3m

Page 37: Probabilistic Methods for Interpreting Electron-Density Maps

BP Output

After some number of iterations, BP gives probability distributions over Cα locations

ALA LEU PRO VAL ARG… …

… … …

LEU LEUp x VAL VALp x

Page 38: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI’s Backbone Trace

Independently choose Cα locations that maximize approximate marginal distribution

* ˆarg max ( )i

i i ix

b p x

Page 39: Probabilistic Methods for Interpreting Electron-Density Maps

Example: 1XRI

HIGH

LOW0.1

0.9

0.9009Å RMSd93% complete

prob(AA at location) 3.3Å resolution density map39° mean phase error

Page 40: Probabilistic Methods for Interpreting Electron-Density Maps

Testset Density Maps (raw data)

Density-map resolution (Å)

Den

sity

-map

mea

n ph

ase

erro

r (d

eg.)

15

30

45

60

75

1.0 2.0 3.0 4.0

Page 41: Probabilistic Methods for Interpreting Electron-Density Maps

0

20

40

60

80

100

Experimental Accuracy

% C

α’s

loca

ted

with

in 2

Å o

f s

om

e C

α /

co

rre

ct C

α

ACMI ARP/wARP

TextalResolve

% backbone correctly placed% amino acids correctly identified

Page 42: Probabilistic Methods for Interpreting Electron-Density Maps

Experimental Accuracy on a Per-Protein Basis

AC

MI %

’s lo

cate

d

ARP/wARP % Cα’s located

Resolve % Cα’s located

Textal % Cα’s located

0

20

40

60

80

100

0 20 40 60 80 100

0

20

40

60

80

100

0 20 40 60 80 1000

20

40

60

80

100

0 20 40 60 80 100

Page 43: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Page 44: Probabilistic Methods for Interpreting Electron-Density Maps

Problems with ACMI

Biologists want location of all atoms All Cα’s lie on a discrete grid Maximum-marginal backbone model may be

physically unrealistic

Ignoring a lot of information Multiple models may better represent

conformational variation within crystal

Probability=0.4 Probability=0.35 Probability=0.25 Maximum-marginal structure

Page 45: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI with Particle Filtering(ACMI-PF)

Idea: Represent protein using a set of static 3D all-atom protein models

Page 46: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Filtering Overview (Doucet et al. 2000)

Given some Markov process x1:KX with observations y1:K Y

Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates

( ) ( )1: 1: 1: 1:

1

|N

i iK K K K K

i

p x y wt x x

Page 47: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Filtering Overview

Markov process gives recursive formulation

1: 1: 1 1: 1 1: 1| | | |k k k k k k k kp x y p y x p x x p x y

Use importance fn. q(x k |x 0:k-1 ,y k) to grow particles

Recursive weight update,

( ) ( ) ( )1( ) ( )

1 ( ) ( )1

| |

| ,

i i ik k k ki i

k k i ik k k

p y x p x xwt wt

q x x y

Page 48: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Filtering for Protein Structures

Particle refers to one specific 3D layout of some subsequence of the protein

At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms

Page 49: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Filtering for Protein Structures

Alternate extending chain left and right

Page 50: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Filtering for Protein Structures

Alternate extending chain left and right An iteration alternately places

Cα position bk+1 given bk

All sidechain atoms sk given bk-1:k+1

bk bk+1

sk

bk-1

Page 51: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Filtering for Protein Structures

Key idea: Use the conditional distribution p(bk|bi

k-1,Map) to advance particle trajectories

Construct this conditional distribution from BP’s marginal distributions

bk bk+1

sk

bk-1

Page 52: Probabilistic Methods for Interpreting Electron-Density Maps

Algorithmplace “seeds” bk

i for each particle i=1…N

while amino-acids remainplace bk

i+1 / bj

i-1 given bj:k

i for each i=1…N

place ski given bk

i-1:k+1 for each i=1…N

optionally resample N particlesend while

Particle Filtering for Protein Structures

bkbk-1 bk+1

sk

… …

Page 53: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Step (for particle i )

(1) Sample L bk+1’s from bk-1–bk–bk+1

pseudoangle distribution

bkbk+1

1…L

bk-1

place bki+1 given bk

i for each i=1…N

Page 54: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Step (for particle i )

pk+1(b )k+11

pk+1(b )k+12

pk+1(b )k+1L

…bk

bk-1

(2) Weight each sample by its ACMI-computed approximate marginal

place bki+1 given bk

i for each i=1…N

bk+11…L

Page 55: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Step (for particle i )

pk+1(b )k+11

pk+1(b )k+12

pk+1(b )k+1L

…bk

bk-1

(3) Select bk+1 with probability

proportional to sample weight

place bki+1 given bk

i for each i=1…N

bk+11…L

Page 56: Probabilistic Methods for Interpreting Electron-Density Maps

Backbone Step (for particle i )

bk-1

bk bk+1

1 1 11

L

k k k kwt p b wt

(4) Update particle weight as sum of sample weights

place bki+1 given bk

i for each i=1…N

Page 57: Probabilistic Methods for Interpreting Electron-Density Maps

Sidechain Step (for particle i )

place ski given bk

i-1:k+1 for each i=1…N

(1) Sample sk from a database of

sidechain conformations

ProteinData Bank

Page 58: Probabilistic Methods for Interpreting Electron-Density Maps

Sidechain Step (for particle i )

pk(EDM | s ) k 1

pk(EDM | s ) k 2

pk(EDM | s ) k 3

(2) For each sidechain conformation, compute probability of density map given the sidechain

place ski given bk

i-1:k+1 for each i=1…N

Page 59: Probabilistic Methods for Interpreting Electron-Density Maps

Sidechain Step (for particle i )

pk(EDM | s ) k 1 pk(EDM | s ) k

3

pk(EDM | s ) k 2

(3) Select sidechain conformation from this weighted distribution

place ski given bk

i-1:k+1 for each i=1…N

Page 60: Probabilistic Methods for Interpreting Electron-Density Maps

Sidechain Step (for particle i )

11

|M

mk k k

m

wt p s wt

EDM

(4) Update particle weight as sum of sample weights

place ski given bk

i-1:k+1 for each i=1…N

Page 61: Probabilistic Methods for Interpreting Electron-Density Maps

Particle Resampling

wt = 0.1wt = 0.1

wt = 0.1wt = 0.1

wt = 0.4wt = 0.4

wt = 0.3wt = 0.3

wt = 0.1wt = 0.1

wt = 0.2

wt = 0.2

wt = 0.2

wt = 0.2

wt = 0.2

wt = 0.1

wt = 0.1

wt = 0.4

wt = 0.3

wt = 0.1

Page 62: Probabilistic Methods for Interpreting Electron-Density Maps

Amino-Acid Sampling Order

Begin at some amino acid k with probability

ˆ( ) exp entropy ( )k kP k p b

At each step, move left to right with probability

j k

1 1

1 1

ˆ( 1) exp entropy ( )

ˆ( 1) exp entropy ( )

j j

k k

P j p b

P k p b

Page 63: Probabilistic Methods for Interpreting Electron-Density Maps

Experimental Methodology

Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5

Compare 10-structure model to others using Rfree

obs calc

obs

F FR

F

Page 64: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI-PF Versus ACMI-Naïve

Ref

ined

Rfr

ee

Number of ACMI-PF runs

0.2

0.3

0.4

0.5

1 2 3 4 5 6 7 8 9 10

Acmi-PF

Acmi-Naive

Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å)

Page 65: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI-PF Versus OthersA

CM

I-P

F R

free

ARP/wARP Rfree Resolve Rfree Textal Rfree

0.25

0.35

0.45

0.55

0.65

0.25 0.35 0.45 0.55 0.65

0.25

0.35

0.45

0.55

0.65

0.25 0.35 0.45 0.55 0.65

0.25

0.35

0.45

0.55

0.65

0.25 0.35 0.45 0.55 0.65

Page 66: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI-PF Example: 2A3Q

1.79Å RMSd92% complete

2.3Å resolution 66° phase err.

Page 67: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI Overview

Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007)

Independent amino-acid search Templates model 5-mer conformational space

Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints

Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

Phase 4: Iterative phase improvement Use particle-filtering models to

improve density-map quality Rerun entire pipeline on

improved density map Repeat until convergence

Page 68: Probabilistic Methods for Interpreting Electron-Density Maps

Phase Problem

, f I Φ

Intensities

Phases

Measured by X-raycrystallography

Experimentallyestimated (e.g. MAD, MIR)

Page 69: Probabilistic Methods for Interpreting Electron-Density Maps

Density-Map Phasing

30° 60° 75°0°

mean phase error

Page 70: Probabilistic Methods for Interpreting Electron-Density Maps

calcΦ

calcI

calcΦ

calcI

calcΦexpΦ

Iterative Phase Improvement

obsI

Predicted3D model

Initialdensity map

Reviseddensity map

Page 71: Probabilistic Methods for Interpreting Electron-Density Maps

ACMI-PF’s Phase Improvement

Error in initial phases(deg. mean phase error)

Err

or in

AC

MI-

PF

’s p

hase

s(d

eg. m

ean

pha

se e

rro

r)

0

15

30

45

60

75

0 15 30 45 60 75

Page 72: Probabilistic Methods for Interpreting Electron-Density Maps

Two-Iteration ACMI

% backbone locatedIteration 1

% b

ackb

one

loca

ted

Iter

atio

n 2

50

60

70

80

90

100

50 60 70 80 90 100

Page 73: Probabilistic Methods for Interpreting Electron-Density Maps

Future Work: Many-iteration ACMI

0

10

20

30

40

50

60

0 1 2 3 40

5

10

15

20

1 2 3 4 5

Number of ACMI iterations Number of ACMI iterations

Ave

rag

e %

un

inte

rpre

ted

AA

s

Ave

rag

e m

ea

n p

has

e e

rro

r

Page 74: Probabilistic Methods for Interpreting Electron-Density Maps

Conclusions

ACMI’s three steps construct a set of all-atom protein models from a density map

Novel message approximation allows inference on large, highly-connected models

Resulting protein models are more accurate than other methods

Page 75: Probabilistic Methods for Interpreting Electron-Density Maps

Ongoing and Future Work

Incorporate additional structural biology background knowledge

Incorporate more complex potential functions

Further work on iterative phase improvement

Generalize my algorithms to other 3D image data

Page 76: Probabilistic Methods for Interpreting Electron-Density Maps

Acknowledgements

Advisor Jude Shavlik Committee

George Phillips Charles Dyer David Page Mark Craven

Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman

6th floor MSCers

Center for Eukaryotic Structural Genomics

Funding UW-Madison Graduate

School NLM 1T15 LM007359 NLM 1R01 LM008796