Identification of biomolecular conformations from incomplete torsion angle observations by hidden...

Identification of Biomolecular Conformations fromIncomplete Torsion Angle Observations by Hidden

Markov Models

ALEXANDER FISCHER, SONJA WALDHAUSEN, ILLIA HORENKO, EIKE MEERBACH, CHRISTOF SCHÜTTEInstitute of Mathematics II, Free University Berlin, Arnimallee 2–6, 14195 Berlin, Germany

Received 15 September 2006; Revised 13 December 2006; Accepted 8 January 2007DOI 10.1002/jcc.20692

Published online 6 August 2007 in Wiley InterScience (www.interscience.wiley.com).

Abstract: We present a novel method for the identification of the most important conformations of a biomolecularsystem from molecular dynamics or Metropolis Monte Carlo time series by means of Hidden Markov Models (HMMs).We show that identification is possible based on the observation sequences of some essential torsion or backbone angles.In particular, the method still provides good results even if the conformations do have a strong overlap in these angles.To apply HMMs to angular data, we use von Mises output distributions. The performance of the resulting method isillustrated by numerical tests and by application to a hybrid Monte Carlo time series of trialanine and to MD simulationresults of a DNA–oligomer.

© 2007 Wiley Periodicals, Inc. J Comput Chem 28: 2453–2464, 2007

Key words: aggregation; angular data; biomolecules; EM algorithm; HMM; metastability; torsion angle; Viterbialgorithm; von Mises distribution

Introduction

The macroscopic dynamics of typical biomolecular systems ismainly characterized by the existence of biomolecular conforma-tions, which can be understood as metastable geometrical large scalestructures, i.e., geometries, which are persistent for long periodsof time. On the longest time scales, biomolecular dynamics is akind of flipping process between these conformations,1–4 while oncloser inspection it exhibits a rich temporal multiscale structure.5

Biophysical research seems to indicate that typical biomolecularsystems possess only few dominant conformations that can becharacterized in terms of a small number of essential degrees offreedom.6 In many cases, these essential degrees of freedom can beidentified from some torsion or backbone angles of the moleculeunder consideration. Then, metastable conformations can be char-acterized in terms of these angles and can thus be identified fromthe molecular dynamics time series projected onto these angles.

In this article, we discuss the application of Hidden MarkovModels (HMM)7, 8 to the analysis of time series from biomolecu-lar simulations. Numerous applications of HMMs can be found inthe fields of speech recognition,7, 8 signal processing,9 and bioin-formatics.10 Applications to the analysis of the dynamical behaviorof biomolecules is rather rare, and mainly is concerned with theanalysis of experimental data, see refs. 11 and 12, for example. Inthis article, HMMs are employed to identify biomolecular confor-mations observation sequences of a few torsion angles given by

some high dimensional time series. An intriguing feature of HMManalysis is that the set of essential coordinates chosen for analysisneed not enable a geometric separation of the important conforma-tions. Instead, the EM13–15 and Viterbi algorithm16 associated witha HMM extract geometric as well as dynamical properties out of thegiven observation sequences allow for a separation of overlappingconformations from the dynamics within the sequences.

Most applications of HMM use discrete or Gaussian distri-butions as output observations. To deal with continuous angulardata, we employ the so-called von Mises distribution that is per-fectly designed to analyze Gaussian-like distributions of angu-lar data.17 Although this distribution is encountered rarely, it hasalready been shown that it can easily be included into the HMMframework.18, 19

The conceptual basis of the proposed approach is new: The infor-mation about which and how many biomolecular conformations areimportant for the description of effective dynamics of the systemunder consideration is understood as being hidden within the timeseries. Thus, the available conformations are hidden conformationstates in the sense of HMMs. In addition, the number of hiddenconformation states is automatically determined via aggregation of

Correspondence to: Ch. Schütte; e-mail: [email protected]

Contract/grant sponsor: DFG research center “Mathematics for key tech-nologies,” Berlin; contract/grant number: FZT 86

© 2007 Wiley Periodicals, Inc.

2454 Fischer et al. • Vol. 28, No. 15 • Journal of Computational Chemistry

hidden states by methods proposed in refs. 20 and 21, i.e., ana-priori knowledge about the number of biomolecular conforma-tions is not needed.

Outline

We will proceed as follows: First, we shortly review the HiddenMarkov Model and the von Mises distribution. Then, we discussthe associated algorithms that are fundamental for the intendedconformational analysis and particularly introduce the combinationof HMMs with the Perron cluster analysis20, 21 for aggregation ofhidden states into conformation states. In the last section, we firstillustrate the application of this novel dynamical clustering tech-nique to metastable observation sequences by suitable numericalexperiments and finally discuss the applicability of the resultingconcept to two realistic molecular dynamics time series.

Model

Hidden Markov Models

A HMM is a stochastic process with hidden and observable states.The hidden process consists of a sequence X1, X2, X3, . . . of randomvariables taking values in some “state space,” the value of Xt being“the state of the system at time t.” In applications, these states arenot observable, and therefore called hidden.

In HMM context, the sequence of hidden states is assumed tobe a Markov process, i.e., it has the Markov property, meaning thatthe conditional distribution of the “future” Xn+1 given the “past,”X1, . . . , Xn, depends on the past only through Xn. Since the HMMstate space in general is finite, we thus are concerned with a Mar-kov chain, which is characterized by the so-called transition matrixP = (pij), whose entry pij correspond to the conditional probabilityof switching from the (hidden) state i to state j. The sum over j hasto be one for each i, which means that the transition matrix is arow-stochastic matrix.

Each hidden state causes a specific output that might be eitherdiscrete or continuous. This output is distributed according to acertain conditional distribution (conditioned to the hidden state).Thus, realizations of HMM are concerned with two sequences, anobservation sequence and a sequence of hidden states.

In our case, the information initially hidden is in whichmetastable subset (conformation) the molecular system is at a cer-tain time, while the information on the state of the selected torsionor backbone angles is completely known. A HMM then consistsof a Markov chain model for the hidden (metastable) states thatencodes with which probability one switches from one hidden stateto another, and a conditional probability of observation of specifictorsion angles if one is in a certain hidden state. To describe thewhole system, we need to know the number of hidden states, thetransition matrix between them, an initial distribution, and for eachstate a certain probability distribution for the observation.

Therefore, a HMM formally is defined as a tuple λ = (S, V , P,f , π) where

• S = {s1, s2, . . . , sN } is a set of a finite number N of states,• V ⊂ Rk is the observation space,

• P = (pij) is the transition matrix, where pij = P(Xt+1 = sj|Xt = si) describes the transition probability from state si tostate sj ,

• f = (f1, f2, . . . , fN ) is a vector of probability density functions(pdf) in the observation space,

• π = (π1, . . . , πN ) is a stochastic vector, that describes the initialstate distribution, πi = P(X1 = si).

Often, the short notation λ = (P, f , π) is used since S and V areimplicitly included. Particularly S is identified with the index set{1, 2, . . . , N} in the following.

HMMs can be set up for discrete or continuous observations.For continuous observations, the most popular choice is to use(multivariate) normal distributions for the output distributions fk .However, in the case of circular data (like torsion angle positionsin a molecular dynamics simulation) the use of normal distributionsoften induces crucial problems due to periodicity.

von Mises Distribution

For circular data, the normal distribution can be replaced with thevon Mises distribution, whose properties are comparable but regard-ing the sphere17 instead of regarding the real axis. In one dimension,the von Mises distribution M(µ, κ) is given by the following pdfdepending on the two parameters µ ∈ [0, 2π ] and κ > 0:

fµ,κ (θ) = eκ cos(θ−µ)

2π I0(κ), 0 ≤ θ < 2π , (1)

where I0(κ) is the modified Bessel function of the first kind andorder zero, i.e.,

I0(κ) =∞∑

r=0

1

(r!)2

(κ

2

)2r. (2)

The parameters characterizing the von Mises distribution are calledthe mean direction µ and the concentration parameter κ . It isunimodular, i.e., single-peaked, and symmetrical about the meandirection. The maximum of the pdf, the so-called mode, is at themean, while the minimum of the pdf (antimode) is located at µ+π .The ratio of the pdf at the mode to that at the antimode is givenby e2κ , so that the larger the value of the concentration parameterκ , the more pronounced the concentration of the distributed dataaround the mode. In contrast, for κ → 0, the von Mises distributionbecomes uniformly distributed. For an illustration, see Figure 1, andfor more details, see ref. 17.

Maximum Likelihood Principle

In statistics, likelihood—unlike probability, which estimates thedegree of belief in unknown consequences of known causes—works backwards, from observed results to hypothetical modelsand parameters. The Maximum Likelihood Principle is the stan-dard technique for estimating parameters in a statistical model withrespect to observed data. For the context important herein, it can beexplained as follows: Let λ denote a set of parameters, e.g., the speci-fication of an HMM-model as used before, � = (θ1, θ2, . . . , θT )

Journal of Computational Chemistry DOI 10.1002/jcc

Identification of Biomolecular Conformations from Incomplete Torsion Angle Observations 2455

Figure 1. Different von Mises distributions fµ,κ = fµ,κ (θ) (eachwith mean µ = 0 and different concentration parameters κ incomparison with a Gaussian distribution with mean 0; here and inthe following figures we display angles in terms of degrees in theinterval [−180◦, 180◦), simply for the sake of convenience). [Colorfigure can be viewed in the online issue, which is available atwww.interscience.wiley.com.]

an observed sequence of data, and p(�|λ) the statistical model,i.e., the probability of the observation sequence, or, depending onthe state space, the density of the observation sequence, underthe conditions that the parameter set λ is given.∗ The likelihoodfunction then is defined as L(λ) = p(�|λ), i.e., we consider theobservation sequence as being given and ask for the variation ofthe probability (density function) in terms of the parameters. Themaximum likelihood principle then simply states that the optimalparameters are given by the absolute maximum of L. Thus, the max-imum likelihood principle is an optimization problem in parameterspace.

Parameter Estimation

Like for the mean and variance of a normal distribution, there existmaximum likelihood estimators (MLE) for the mean and concen-tration parameter of the von Mises distribution. We shortly reviewthe results collected in ref. 17.

Suppose that we are given T data points on the sphere with angles(θ1, θ2, . . . , θT ) in radians, i.e., θk ∈ [0, 2π [. The likelihood functionL(µ, κ) for the von Mises distribution is then given by

L(µ, κ) =T∏

t=1

fµ,κ (θt)

=exp

(κ

∑Tt=1 cos(θt − µ)

)(2π)T I0(κ)T

. (3)

According to the maximum likelihood principle, the optimal param-eters (µ, κ) are the ones for which L(µ, κ) is maximal. Maximization

∗In the following, to avoid linguistic confusion, we refer to p(�|λ) asprobability, but always include the meaning of a probability density function.

of L(µ, κ) gives

µ ={

Arctan(y/x) if x > 0,π + Arctan(y/x) if x < 0,

(4)

with

x =∑T

t=1 cos(θt)

Tand y =

∑Tt=1 sin(θt)

T, (5)

and [0, π [ as the range of Arctan.The formula for the mean estimate µ becomes intuitively clear by

observing that the cartesian coordinates x and y are the mean valuesof the projection of the data, seen as points on the unit sphere, on thex and y axis, respectively. Thus, eq. (4) tells us that µ is the angle ofthe cartesian mean (x, y), and that the operation Arctan(·) in eq. (4)means the inverse of the tangent operation in the sense that it givesback the correct angle in polar coordinates of the point (x, y) on theunit circle. Then, setting

r = (x2 + y2)1/2,

we get the polar coordinate representation (µ, r) of the cartesianmean (x, y).

The cartesian mean does not necessarily lie on the unit circle. Thelength r can take values between 0 and 1 and measure the clusteringof the data around the mean angle µ. The closer r is to 1, the moreconcentrated the sample is around the mean.

But, besides the intuitive picture, the MLE for κ can not givenexplicitly. Instead, by maximization of L(µ, κ) wrt. κ and makinguse of properties of the Bessel-functions, one can show that the MLEfor κ solves the expression

I1(κ)

I0(κ)= 1

T

T∑i=1

cos(θi − µ), (6)

where I1(κ) is the modified Bessel function of the first kind andorder one. Equation (6) can only be solved numerically, which isnot a big obstacle as the lhs of (6) is monotonically increasing. Ourintuitive picture remains valid in the sense that one can show thatthe rhs of (6) equals r.

Multidimensional generalization of the von Mises distributionare obvious as long as we simply use tensor products of one dimen-sional von Mises distributions. This means that the mean directionµ has to be replaced by a vector and the concentration parameter κ

simply is replaced by a diagonal matrix. The jth (diagonal) entriesare computed from the jth components of the data points θ ∈ Rd

due to (4) and (6).

Methods

Application of HMMs typically involves one or all of the threeproblems (P1) to (P3) stated below. For our specific task of identi-fying metastable conformations we have to complement these basicproblems by also addressing problems (P4) and (P5):



(P1) Calculation of the probability p(�|λ) of a certain observationsequence � for a given model λ = (P, f , π).

(P2) Estimation of the best model parameters for a given observa-tion sequence (e.g., torsion angle observations).

(P3) Given the model λ and an observation sequence �, finding themost probable hidden state sequence X∗ = (x∗

1 , x∗2 , . . . , x∗

T ).(P4) Aggregation of hidden states to conformation states, and

clustering of the observation sequence into conformations.(P5) Avoid combinatorial explosion of computational effort for

high dimensional observation sequences.

There are standard algorithms that allow to solve problems (P1) to(P3) efficiently, which we will shortly outline here; our presenta-tion will follow.7, 8 Most notably, due the Markovian structure ofthe hidden chain and the independence of observations given thehidden states, all algorithms run linear in the length of the observa-tion sequence. For dealing with problems (P4) and (P5), no standardalgorithm exist. Below, we will thus also outline the novel algorithmssuggested herein.

Problem (P1): Forward–Backward Variables

The most straightforward method to calculate the probabilityp(�|λ) of an observation sequence � = (θ1, θ2, . . . , θT ) giventhe model λ is computing every possible hidden state sequenceX = (x1, x2, . . . , xT ) of length T and sum up all probabilitiesconditioned on the hidden sequence:

p(�|λ) =∑

X=(x1,...,xT )

p(X|λ)p(�|X, λ).

This method involves on the order of 2TNT calculations thereforebecoming infeasible even for small numbers of hidden states N andshort observation lengths T .

Fortunately, the effort can be reduced to TN2 by using the so-called forward and backward variables. These divide the observationsequence � recursively in partial subsequences: those from time 1to time t and those from t + 1 up to T . The forward variables aregiven by

αt(i) = p(θ1, θ2, . . . , θt , Xt = i | λ), (7)

denoting the probability of the observation sequence up to time ttogether with the information that the system is in hidden state i attime t conditioned wrt. the given model λ. Whereas the backwardvariables are given by

βt(i) = p(θt+1, θt+2, . . . , θT | Xt = i, λ), (8)

denoting the probability of the observation sequence from timet + 1 to T , conditioned that the system is in hidden state i at timet and on the model λ. The computation of the probability αT (i) forthe whole sequence is possible with N2 T operations, as recursive

formulas can be used:

α1(i) = πifi(θ1), 1 ≤ i ≤ N ,

αt+1(j) =[

N∑i=1

αt(i)pij

]fj(θt+1),

1 ≤ t ≤ T − 1, 1 ≤ j ≤ N .

In the case of von Mises distributed output the fj’s denote theprobability density functions fµj ,κj we discussed before.

The backward variable βt(i) can be computed with analogousformula:

βT (i) = 1, 1 ≤ i ≤ N ,

βt(i) =N∑

j=1

pijfj(θt+1)βt+1(j),

T − 1 ≥ t ≥ 1, 1 ≤ i ≤ N

From (7) and (8), one can compute the desired probability forall t as

p(� | λ) =N∑

i=1

αt(i)βt(i).

Forward and backward variables form the basis of the EM algorithm.

Problem (P2): EM Algorithm

The Expectation-Maximization (EM) algorithm15 is a maximumlikelihood approach that improves iteratively an initial parameterset and converges to a local maximum of the likelihood function.In the case of HMMs, this method was already worked out beforethe general EM formulation and is often called the Baum-Welchalgorithm.13, 14

To apply the EM algorithm to a given observation sequence, wehave to set up an HMM by a parameter set λ = λ(P, f , π), assuminga finite number N of hidden states, an output distribution functionfor each of the hidden state, and the initial distribution for the hiddenMarkov chain.

EM Steps

There is no known way to analytically determine the model parame-ters that globally maximize the probability of the given observationsequence. We can, however, estimate a λ that locally maximizes thelikelihood L(λ) = p(� | λ). The EM algorithm is a learning algo-rithm, it alternately iterates two steps, the Expectation step and theMaximization step. Starting with some initial model λ(0) the stepsiteratively refine the model:

• The Expectation-step: In this step, the state occupation probabil-ities

γt(i) = p(Xt = i | �, λ(k)),



and the joint probabilities

ξt(i, j) = p(Xt = i, Xt+1 = j | �, λ(k)),

are calculated for each time t in the sequence, given the observa-tion � and the current model λ(k).

• The Maximization-step: This step finds a new model λ(k+1) via aset of re-estimation formulas. The maximization guarantees thatL(λ(k+1)) ≥ L(λ(k)).

The two conditional probabilities of the E-step can be calculatedefficiently using the forward-backward variables (7)–(8) as

ξt(i, j) = p(Xt = i, Xt+1 = j, �, λ(k))

p(�, λ(k))

= p(Xt = i, Xt+1 = j, � | λ(k))

p(� | λ(k))

= αt(i)pijfj(θt+1)βt+1(j)

p(� | λ(k)).

We omitted the superscript k for the forward and backward variablesto not obscure the formulas by too many indices.

With these values, the probability to be in hidden state i at timet can be expressed as

γt(i) =N∑

j=1

ξt(i, j).

The M-step consists of re-estimation formulas to obtain an im-proved model λ(k+1). The estimators for the initial distribution andthe transition probabilities of the hidden Markov chain are given by

π(k+1)i = γ1(i),

p(k+1)ij =

∑T−1t=1 ξt(i, j)∑T−1t=1 γt(i)

,

using the probabilities computed in the E-step before.We also need to reestimate parameters for the probability density

functions f (k+1)j via their maximum likelihood estimator. Hereby,

the observations θt used to determine f (k+1)j have to be weighted

with the probability γt(j) to be in the hidden state j. To estimateparameters of von Mises output distributions from torsion angleobservation sequences θ1, . . . , θT , eqs. (5) and (6) have to be adoptedas follows:18, 19

x(j) =∑T

t=1 γt(j) cos(θt)∑Tt=1 γt(j)

,

and

y(j) =∑T

t=1 γt(j) sin(θt)∑Tt=1 γt(j)

,

and the mean parameter µ(k+1)j of f (k+1)

j = fµ

(k+1)j ,κ(k+1)

jis computed

from x(j) and y(j) due to (4). Finally, the concentration parameter

is computed by fitting

I1(κ

(k+1)j

)I0

(κ

(k+1)j

) = 1∑Tt=1 γt(j)

T∑t=1

γt(j) cos(θi − µ

(k+1)j

).

Again, we avoided the superscript (k + 1) for x(j), y(j) and γj , toreduce complexity of the stated formulas.

The E- and M-steps are iteratively repeated until a predeter-mined maximal number of iterations is reached or the improvementof the likelihood becomes smaller than a given limit. As it is a prop-erty of the EM algorithm15 that L(λ(k+1)) ≥ L(λ(k)), we iterativelyapproximate a local maxima λ.

Problem (P3): Viterbi Algorithm

The Viterbi algorithm16 computes for a given HMM model λ

and an observation sequence � the most probable hidden pathX∗ = (x∗

1 , . . . , x∗T ). This path is called the Viterbi path.

For an efficient computation, we define the highest probability(density) along a single path, for the first t observations, ending inthe hidden state i at time t,

δt(i) = maxx1,x2,...xt−1

P(x1, x2, . . . , xt = i, θ1, θ2, . . . , θt |λ).

This quantity can be computed recursively by

δt(i) = max1≤j≤N

[δt−1(j)pji]fi(θt). (9)

In addition, we assign to ψt(i) the argument that maximizes (9)to actually retrieve the most likely hidden state sequence. Thesequantities are calculated for each t and i, afterwards the mostprobable hidden state sequence can be retrieved from δ and ψ bybacktracking, as shown in the Viterbi algorithm scheme:

(1) Initialization:

δ1(i) = πifi(θ1), 1 ≤ i ≤ N

ψ1(i) = 0

(2) Recursion:

δt(i) = max1≤j≤N

[δt−1(j)pji]fi(θt),

ψt(i) = argmax1≤j≤N

[δt−1(j)pji],

2 ≤ t ≤ T , 1 ≤ i ≤ N

(3) Backtracking:

x∗T = argmax

1≤j≤N[δT (j)]

x∗t = ψt+1(xt+1), t = T − 1, T − 2, . . . , 1.



The mapping of the T observation states to N hidden states pro-vided by the Viterbi path X∗ results in a dynamical clustering of thedata.

Problem (P4): Aggregation into Conformation States

The term “biomolecular conformation” is used in different mean-ings. We will herein use it in the following sense: A conformationis a kind of “global state” of the molecule, in which the largescale geometric structure is understood to be conserved, whilethe molecule locally exhibits metastability. Such a conformationis called metastable if a transition from it to other large scalegeometric structures is rare. As an intuitive picture for the occur-rence of metastable conformations might serve the vicinity of adeep minimum of the (free) energy landscape, defining the largescale geometry, together with all neighboring relative minima,contributing to local flexibility.

The problem of finding these metastable conformations canbe formulated as the problem of aggregating molecular statesinto conformation states (i.e., clustering states that belong to thesame metastable conformation). The identification of an optimalaggregation from observation of the dynamical behavior, e.g.,MD-trajectories, is a difficult algorithmic problem.

However, if a suitable trajectory and a box discretization, whichis a collection of nonoverlapping sets (boxes) B1, . . . , BN coveringthe whole observable space, is at hand, optimal aggregates can beidentified via the spectral properties of a stochastic transition matrixP. This transition matrix P = (pij), containing the overall transitionprobabilities between all discrete states of the system under con-sideration, can be obtained by counting transitions in the trajectorybetween the discrete states:

pij = #{θt ∈ Bi θt+1 ∈ Bj}#{θt ∈ Bj} ,

where t is meant to vary from 1 to T − 1.The identification of an optimal aggregation from the spectrum

of the transition matrix P is possible, as the number of metastablesets, corresponding to metastable conformations, equals the num-ber of eigenvalues close to one, while the metastable sets themselvescan be obtained from the structure of the corresponding eigenvec-tors.4, 22 This has led to the construction of an aggregation techniquecalled “Perron Cluster Cluster Analysis” (PCCA).20, 21 Of course,the technique depends on extracting a suitable set of observablesfrom a given trajectory and finding an appropriate discretization ofthe observable space. Both problems are nontrivial and solutionsoften depend on a mixture of insight and preliminary analysis, analgorithmic scheme is described in ref. 23.

Although PCCA can detect the number of (hidden) metastablesets by the spectrum of the transition matrix, HMM methods havethe drawback that for a given observation sequence one is alwaysconfronted with the task to select in advance the number of hiddenstates. On the other hand, PCCA crucially depends on geometricalseparation of metastable sets in the observable space, as otherwiseno appropriate box discretization can be found, while HMM canidentify overlapping distributions. To harvest the benefit or bothmethods, we will use PCCA within the HMM framework as fol-lows: the HMM algorithm is started with some sufficient number

of hidden states, say N , that should be greater than the expectednumber of conformation states (keeping in mind that the algorith-mic effort will increase like O(N2T)). After termination of the EMalgorithm, resulting transition matrix P is taken, and the N hiddenstates are aggregated into M ≤ N conformation states by means ofPCCA, i.e., by means of analyzing the spectrum of P. Note that,we replaced the transition matrix P, obtained from counting tran-sitions in the trajectory, by the transition matrix P, obtained fromthe estimated HMM model. By aggregating the Viterbi path accord-ing to the outcome of the PCCA analysis, we obtain a clusteringof the observation states into metastable sets. This novel combi-nation of HMMs with PCCA will be exemplified in the followingsection.

Problem (P5): High Dimensions and Viterbi-Clustering

In large biomolecular systems, the dimension of each state in theobservation sequence may be large, say d. The number of freeparameters of each output distribution will typically increase withd more than linearly. A full d-dimensional Gaussian or von Misesdistribution, for example, has O(d2) free parameters in the covari-ance matrices. However, for increasing dimensions of the parameterspace, the likelihood maximization via the EM algorithm willconverge more and more slowly (and eventually fail to detectthe maximum) if the dimensions get too large. To overcomethis “curse of dimension,” we suggest the following novel algo-rithmic scheme for which the name Viterbi-clustering seemsappropriate:

First, the high dimensional state space V of the observationsequence is decomposed into a sequence of low-dimensional sub-spaces V (1), . . . , V (k). For example, V could be the state space of alltorsion angles of the system under consideration, and V (j) the sub-space of a single torsion angle. By choice of the V (j), j = 1, . . . , k,and projection onto each one, we get k low-dimensional time series�(j) = θ

(j)1 , . . . , θ(j)

T out of the original time series θ1, . . . , θT .Second, each of these low-dimensional time series is sepa-

rately analyzed by means of the above HMM procedure, i.e.,by approximating the model with the EM-algorithm and after-wards obtaining the Viterbi path with the Viterbi algorithm, andfinally aggregating with PCCA. This results in k aggregated Viterbipaths X(j) = (

x(j)1 , x(j)

2 , . . . , x(j)T

)that represent the conformational

dynamics as detected from the information contained in the singletime series �(j), j = 1, . . . , k. Observe that, using HMMs,we are able to discretize the low-dimensional projections even ifthe metastable sets are overlapping in the projected space, as theoutput distributions of the HMM are allowed to overlap.

Third, these single Viterbi paths can be combined into a globalViterbi path X: at instance t its state is xt = (

x(1)t , . . . , x(k)

t

). This

global Viterbi path can be seen as a time series originating from asystem with discrete states from S = ×k

j=1S(j), where S(j) denotesthe state space of the jth Viterbi path. The combined global Viterbipath has finitely many states such that we can directly compute theassociated transition matrix. Fourth, the metastable states of thistransition matrix are identified, again by means of PCCA. On thebasis of these metastable states, the global Viterbi path is aggre-gated into an clustered global Viterbi path whose resulting discretestates finally are interpreted as the global conformation states of theoriginal full-dimensional time series θ1, . . . , θT .



Table 1. Von Mises Parameters and Transition Matrices of the Four HMMsUsed for Generation of the Observation Sequences.

Case (µ1, κ1) (µ2, κ2) P

(a) (−π/2, 1) (π/2, 3)

(0.95 0.050.02 0.98

)(b) (−π/2, 1/2) (π/2, 1)

(0.95 0.050.02 0.98

)(c) (−0.5, 1) (0.5, 3)

(0.95 0.050.02 0.98

)(d) (−π/2, 1) (π/2, 3)

(0.6 0.4

0.45 0.55

)

Results and Discussion

We first illustrate the fundamental features and performance of thealgorithms. Then, we demonstrate how conformation states can beidentified from a hybrid Monte Carlo simulation of trialanine andof a specific DNA-oligomer by analyzing torsion angle time series.

Illustrative Examples

To generate data sets for illustrative means, we computed timeseries from direct realizations of fully specified HMM models,i.e., for given parameters (P, f , π). On the basis of observationsequence of such realizations, we tried to re-identify the parametersby application of the EM and Viterbi algorithms. A realization of anobservation sequence � = θ1, . . . , θT for a given HMM (P, f , π) iseasily obtained: one starts in a single hidden state randomly chosenfrom π , say j, and draws the first output state θ1 from the associateddistribution fj . Then, the next hidden state is chosen according to thetransition probabilities given by the jth row of the transition matrixP. This procedure is iterated T times.

We considered four parameter sets, in which P always is a 2 × 2stochastic matrix, i.e., there are two hidden states, and the outputpdfs fµi ,κi , i = 1, 2 are von Mises distribution pdfs. The parametersof the four test cases are given in Table 1.

These four cases (a)–(d) have been chosen to illustrate thefollowing scenarios (see Fig. 2):

a. In this case, the two von Mises distributions show no significantoverlap since the means are well separated and the concentrationparameters are large enough. In addition, the transition matrixis clearly metastable such that the period of time in which thesystem remains within one hidden state is long. Therefore, thisis expected to be an easy case for the identification algorithm.

b. This case differs from the first case by smaller concentrationparameters broadening the two von Mises distributions, thusintroducing a significant overlap.

c. This case differs from the first only by closer means of the vonMises distributions, thus also introducing a significant overlap.

d. The last case differs from the first only by the fact that thetransition matrix is no longer metastable.

Figure 2. Von Mises probability density functions fµ1,κ1 and fµ2,κ2 foreach of the four test cases (a)–(d), specified in Table 1. [Color figure canbe viewed in the online issue, which is available at www.interscience.wiley.com.]

The problematic cases for parameter identification by the EMalgorithm should be obvious: If there are two (or more) hidden stateswith overlapping distributions, then, with input of the observationsequence alone, the algorithm may have trouble to detect or reliablyidentify the two states.

On the basis of an observation sequence of length T = 2000for each of the four HMMs, the EM algorithm has been startedwith the same generic initial parameters independent of the obser-vation sequence. The EM iteration has been automatically stoppedafter some iteration steps (denoted by M in Table 2) according toa preselected termination criterion that is based on the increase inthe likelihood function. In all four cases, the estimated parame-ters are good approximations of the original model parameters (seeTable 2).

To furthermore illustrate the performance of the Viterbi algo-rithms for our four test cases, we compared for each case theinformation about the original hidden state sequence with the Viterbi

Table 2. Parameters Estimated (up to two digits) by the EM AlgorithmAfter M Steps Based on the Observation Sequences from the Four TestCases (a)–(d).

Case (µ1, κ1) (µ2, κ2) P

(a)

M = 8(−1.54, 1) (1.57, 3.1)

(0.96 0.040.02 0.98

)(b)

M = 26(−1.58, 0.5) (1.59, 1.01)

(0.95 0.050.02 0.98

)(c)M = 17

(−0.53, 1) (0.51, 32)

(0.96 0.040.02 0.98

)(d)

M = 13(−1.54, 1.3) (1.62, 2.6)

(0.60 0.400.45 0.55

)



Figure 3. Observation sequences θ(a) and θ(b) for the test cases (a) and(b). The colors (or greyscales) represent the two hidden states accordingto the Viterbi path; all instances with mismatches between the originalhidden state sequence and the Viterbi path have been marked with circlesand diamonds.

path computed by the Viterbi algorithm based on the estimatedHMM given in Table 2. Figure 3 illustrates the differences in thehidden state sequences for the scenarios (a) and (b). The quality ofthe Viterbi paths is astonishingly good; most mismatches only occurduring transitions between hidden states.

Application to Molecular Data

Time series from molecular dynamics and Metropolis MonteCarlo simulations of biomolecules are often analyzed in terms ofsome essential torsion angles. We will demonstrate that even inthe case of overlapping marginal distributions such torsion angletime series possess sufficient characteristics to make an identifi-cation of conformational substates by HMM models possible. Itshould be noted that it is in general a difficult task to extracta meaningful set of collective observables or reaction coordi-nates from a molecular system. In many cases, system-dependentinsight is needed. This difficulty cannot be solved in general bythe algorithmic scheme we propose, but the use of HMM meth-ods allows to extract information even if the observables chosenonly reflect somehow the global changes. So, recent research sug-gests that it is possible to get information about water clustersfrom the torsion angles of a solvated molecule by a HMM basedanalysis.24

The use of von Mises output distributions in application ofHMMs to torsion angle time series is motivated by (a) the fact thatthe von Mises distribution is the simplest peaked distribution forcircular data, in the sense that it is completely determined by thefirst two moments (as in the planar case the Gaussian distributionis), (b) the observation that torsion angle distributions can almostever be interpreted as mixtures of Gaussian-like peaks; since theHMM embedding allows to couple single von Mises distributionsinto mixtures this seems to be an appropriate choice, and (c) bythe observation that HMMs with Gaussian output distributions fail

Figure 4. The trialanine molecule shown in ball-and-stick representa-tion. Typically, the overall structure of trialanine is sufficiently describedby the two torsion angles � and , but at higher temperatures oneshould also take into consideration changes of the peptide bond angle �.[Color figure can be viewed in the online issue, which is available atwww.interscience.wiley.com.]

to reproduce certain aspects of the system whenever periodicity isrelevant.

In addition, other (nonstationary) output distributions are wellpossible, as described in, e.g., ref. 25.

Trialanine

In the following, we present results from the application of the pro-posed HMM framework to trialanine, a small peptide composed ofthree alanine amino acid residues.

For the simulation of trialanine, we employed the Gromos96vacuum force field,26 in which trialanine is represented by 21 unitedatoms. The structural and dynamical properties of this molecule aremainly determined by two central peptide backbone angles � and . In addition, at very high simulation temperatures, the otherwiseplanar peptide bond angle � may also undergo some conformationaltransition (see Fig. 4).

The time series of 50,000 steps has been generated by means ofa Hybrid Monte Carlo (HMC)27 scheme at a temperature of 700 K.That means at each step momenta were randomly drawn from amaxwell distribution, according to the temperature, and proposalsteps were generated by integration in time for 500 fs, using theVerlet integration scheme based on a 1 fs time step. This yielded anacceptance rate of about 93%.

Figure 5. Ramachandran plot in � and of the time series obtainedfrom HMC simulation at 700 K. [Color figure can be viewed in theonline issue, which is available at www.interscience.wiley.com.]



Figure 6. Observation sequences of the three torsion angles �, ,and � (from top to bottom) of trialanine at 700 K. We observeconformational changes in � that are correlated to the dynamicalbehavior of and �. The three colors (or greyscales) correspondto the aggregated Viterbi path from the HMM analysis of the two-dimensional observation sequence given by � and . [Color figure canbe viewed in the online issue, which is available at www.interscience.wiley.com.]

From this time series, we computed the observation sequencesin �, , and �. Figure 5 shows the obtained data points projectedon the (�, )-plane, the so-called Ramachandran plot. We observethat the (�, )-plane has been explored rather intensively due tothe high temperature of 700 K. Furthermore, we observe transitionsbetween substates of the torsion angle � (see Fig. 4) that we do

Figure 7. Ramachandran plot in � and colored according to thethree aggregated hidden states as computed by the EM and Viterbialgorithms. Top left: all observation points colored according to theaggregated Viterbi path. Top right, bottom left, and bottom right:output from each of the aggregated hidden states plotted separately.[Color figure can be viewed in the online issue, which is available atwww.interscience.wiley.com.]

not observe in 300 K simulations of this length. As one can seein Fig. 6, these transitions introduce additional features of confor-mation dynamics. Thus, by raising the simulation temperature, weobtained an intriguing test of the novel HMM technique whetherit can identify this additional structure based on the observationsequences of the peptide angles � and alone.

Therefore, we used the two-dimensional observation sequencegiven by � and to estimate a HMM with seven hidden states,each with a two-dimensional von Mises output distribution. Aftersuccessful termination of the EM algorithm (after 28 iterationsteps), we got the following transition matrix (Fig. 7):

P =

0.81 0.01 0.01 0.00 0.00 0.17 0.000.20 0.59 0.01 0.00 0.02 0.16 0.020.00 0.00 0.77 0.02 0.00 0.00 0.210.01 0.00 0.19 0.72 0.05 0.00 0.030.00 0.01 0.01 0.03 0.95 0.00 0.000.50 0.05 0.00 0.00 0.00 0.45 0.000.00 0.00 0.45 0.01 0.00 0.00 0.54

Now we use the PCCA algorithm20, 21 as explained in the discus-sion of problem (P4), which yields into aggregation of the hiddenstates S1, S2, S6 and S3, S4, S7 into two new hidden aggregated states.This aggregation results in the following 3 × 3 transition matrix

Figure 8. Left: Illustration of the 15-AT B-DNA oligonucleotide inatomic resolution. The attached violet (grey) stings indicate the back-bones of the helix (strands). The molecular dynamics simulation referredto herein includes solvent (water and counter-ions) which is not shownhere. Right: A single base pair junction (adenine-thymine) attachedto a backbone piece. Marked are the six (backbone) torsion angleswhich we used as observables in the analysis. [Color figure can beviewed in the online issue, which is available at www.interscience.wiley.com.]



Figure 9. Time series of the 1st strand α, β, γ -backbone torsion anglesfor junction 2 (upper panel). Lower panel: Corresponding aggregatedViterbi paths as derived from the independent HMM analysis of thesethree torsion angle time series with an eigenvalue threshold of 0.97.[Color figure can be viewed in the online issue, which is available atwww.interscience.wiley.com.]

between the aggregated hidden states:

aggr(P) = 0.99 0.00 0.01

0.01 0.95 0.040.01 0.00 0.99

As it can be seen from the diagonal, all three aggregated states aremetastable in the sense that there is a high probability to stay withina state the next time step. We thus end up with three hidden states,each being the combination of some of the seven original states.Therefore, we can simply aggregate the seven-state Viterbi pathinto a new three-state Viterbi path. This path is shown in Figure 6by coloring all states in the observation sequence of �, , and� according to their assignment to one of the hidden aggregatedstates. We observe that the conformations and the associated jumpsbetween different phases of the simulation perfectly agree with

the rare transitions in �, while Figure 7 shows the overlap ofconformations in the (�, ) plane.

Torsion Dynamics of a B-DNA Oligomer in Water

Finally, we consider a time series originating from a long-termmolecular dynamics (MD) simulation of a 15-AT B-DNA oligo-nucleotide with the sequence GT(AT)6C (see Fig. 8). The 100 nsAMBER96 force field simulation with explicit water and potassiumions was conducted in the group of J. Maddocks (EPFL), the compu-tation protocol is described in detail in ref. 28. The MD simulationdelivers a time series of the cartesian coordinates of all atoms (about23.000 atoms, including solvent). The MD trajectory was sampledevery picosecond, which is rather large compared to the fastest timescales (1 fs) of the underlying dynamics, resulting in a time seriesof 105 data points. As observables we have chosen the physicallymotivated projections of the original data onto a sets of 84 torsionangle coordinates,29 arising from six backbone torsion angles foreach of the fourteen base-pairings (junctions) in the sequence (seeFig. 8).

Figure 9 (top panel) shows, as an example, three of the extractedtorsion angles of a single base-pair junction. As it can be seen thetorsion angles exhibit a metastable behavior with sharp transitionsbetween the metastable states. We started our analysis, by analyzingeach of the 84 dimensional torsion angle time series separately, i.e.,doing an HMM analyis assuming three hidden states for each of thetorsion angles and aggregating afterwards with PCCA. Thus, weget, by application of our HMM-PCCA analysis, a set of 84 aggre-gated Viterbi paths describing the conformational change betweenthe metastable states (for examples, see Fig. 9). As a threshold valuefor the PCCA analysis, i.e., the all eigenvalues of the transitionmatrix above the threshold are supposed to indicate metastability,we used 0.97 (we tested our analysis with less strict threshold valuesand obtained similar results).

The full set of 84 aggregated Viterbi paths is a coarse-graineddescription of the original time series and encapsulates the dynam-ical information about the process of the conformation changein the full 84-dimensional torsion angle space. As explained in

Figure 10. The largest twenty eigenvalues of the global transitionmatrix as a functions of the lag time τ along the respective global Viterbipath. [Color figure can be viewed in the online issue, which is availableat www.interscience.wiley.com.]



Figure 11. Different global Viterbi paths, resulting from the first four,resp. six and eight (top to bottom) eigenvalues of the global transi-tion matrices. [Color figure can be viewed in the online issue, which isavailable at www.interscience.wiley.com.]

Problem (P5), we extract a global Viterbi path of the 84 aggregatedpaths by superposition of all paths, i.e., we assign to each differentpattern

(x(1)

t , . . . , x(84)t

), 1 ≤ t ≤ 100,000 a different global state.

This yields 496 different global states, which is quite impressive, astheoretically there could be about 4 × 1011 different states.

Figure 12. Flexibility plots of 2 (out of 4) obtained global states forthe DNA molecule. The plot is obtained by aligning the moleculewrt. the backbone (blue) and coloring the regions where to findthe backbone with high probability. The different states are distin-guished by different coloring (red and green, belonging to index 1,resp 2, in Fig. 11). As it can be seen, the two shown conformationsdiffer in that the strands depart at the end. Visualization based onAMIRA-software.30 [Color figure can be viewed in the online issue,which is available at www.interscience.wiley.com.]

Upon this global state space, we set up a transition matrix bycounting the transitions between the global states in the given timeseries. Figure 10 represents the dependence of the eigenvalues of thetransition matrix as a function of the lag time τ used to count transi-tions, i.e., lag time τ = l�t means that the transition matrix countsl-step transitions between instances t and t + l�t along the giventime series. Increasing τ means decreasing correlation between suc-cessive time steps and therefore a more informative spectrum. As itcan be seen, the first eigenvalue gap can be identified after the firstfour dominant eigenvalues for all θ , which indicates a presence offour metastable sets in 84 dimensional space of torsion angles. How-ever, for a more detailed level of description, a choice of six or eighteigenvalues would also be reasonable.

In Figure 11, we present the aggregated global Viterbi pathbased upon the assumption of four metastable states, in comparisonwith the global Viterbi path based upon the assumption of sixand eight metastable sets. It can be seen how the assumption ofmore metastable states hierarchically resolves the time series. InFigure 12, we visualize two metastable states, as obtained from theanalysis assuming four methastable states.

Conclusion

We presented an HMM algorithm with von Mises distributed obser-vations that can be employed to identify metastable conformationsfrom molecular dynamics time series based solely on observation ofsome torsion angles. An outstanding feature of the HMM approachis a reliable detection of biomolecular conformations even if theconformations strongly overlap in the chosen set of torsion angles.Since computational complexity only depends on the number of tor-sion angles to be analyzed and the number of hidden states, in themodel, the presented technique is in principle applicable to arbi-trary high dimensions. Yet, for a high number of hidden states, theEM step of the algorithm will become computationally expensive,and for an increasing number of torsion angles the complexity ofthe likelihood landscape (i.e., parameter space) is hard to judge.Therefore, an additional scheme for use with high dimensions hasbeen devised. Its performance has been illustrated by application toa DNA oligomer.

References

1. Elber, R.; Karplus, M. Science 1987, 235, 318.2. Frauenfelder, H.; Steinbach, P. J.; Young, R. D. Chem Soc 1989, 29,

145.3. Schütte, C.; Huisinga, W. In Handbook of Numerical Analysis, Vol. X;

Ciaret, P. G.; Lions, J.-L., Eds.; North-Holland, 2003; pp. 699–744.4. Schütte, Ch.; Fischer, A.; Huisinga, W.; Deuflhard, P. J Comput Phys

(Special Issue) 1999, 151, 146.5. Nienhaus, G. U.; Mourant, J. R.; Frauenfelder, H. PNAS 1992, 89,

2902.6. Amadei, A.; Linssen, A. B. M.; Berendsen, H. J. C. Proteins: Struct

Funct Genet 1993, 17, 412.7. Rabiner, L. R. A Tutorial on Hidden Markov Models and Selected Appli-

cations in Speech Recognition; Morgan Kaufmann: San Francisco, CA,1990; pp. 267–296.

8. Rabiner, L.; Juang, B.-H. Fundamentals of Speech Recognition;Prentice-Hall: Upper Saddle River, NJ, 1993.



9. Elliot, R.; Aggoun, L.; Moore, J. Hidden Markov Models: Estimationand Control; Springer: New York, 1995.

10. Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. J Mol Biol2001, 3, 567.

11. Fredkin, D. R.; Rice, J. A. Proc Biol Sci 1992, 249, 125.12. Thayer, K. M.; Beveridge, D. L. PNAS 2002, 99, 8642.13. Baum, L. E.; Petrie, T.; Soules, G.; Weiss, N. Ann Math Stat 1970, 41,

164.14. Baum, L. E. Inequalities 1972, 3, 1.15. Dempster, A. P.; Laird, N. M.; Rubin, D. B. J Roy Stat Soc B 1977,

39, 1.16. Viterbi, A. J. IEEE Trans Informat Theory 1967, IT-13, 260.17. Mardia, K. V. Statistics of Directional Data; Academic Press: New York,

1972.18. Shatkay, H.; Kaelbling, L. P. In ICML ’98: Proceedings of the Fifteenth

International Conference on Machine Learning, Morgan Kaufmann, SanFrancisco, CA, USA, 1998; pp. 531–539.

19. Shatkay, H. In Proceedings of the 15th Annual Conference on Uncer-tainty in Artificial Intelligence (UAI-99), Morgan Kaufmann, SanFrancisco, CA, 1999; pp. 602–661.

20. Deuflhard, P.; Huisinga, W.; Fischer, A.; Schütte, Ch. Lin Alg Appl2000, 315, 39.

21. Deuflhard, P.; Weber, M. Robust Perron cluster analysis in conformationdynamics. ZIB-Report 03-19, Zuse Institute Berlin, 2003.

22. Dellnitz, M.; Junge, O. SIAM J Num Anal 1999, 36, 491.23. Cordes, F.; Weber, M.; Schmidt-Ehrenberg, J. Metastable conformations

via successive Perron cluster analysis of dihedrals. ZIB-Report 02-40,Zuse-Institute-Zentrum, Berlin, 2002.

24. Meerbach, E.; Schütte, C.; Horenko, I.; Schmidt, B. Metastable Con-formational Structure and Dynamics: Peptide Between Gas Phase ViaClusters and Aqueous Solution; Number 87 in Series in ChemicalPhysics. Springer: New York, 2007; pp. 796–806.

25. Meerbach, E.; Dittmer, E.; Horenko, I.; Schütte, Ch. In Computer Sim-ulations in Condensed Matter Systems, Vol. 703; Lecture Notes inPhysics, Springer: New York, 2006; pp. 475–497.

26. Lindahl, E.; Hess, B.; van der Spoel, D. J Mol Mod 2001, 7,306.

27. Brass, A.; Pendleton, B. J.; Chen, Y.; Robson, B. Biopolymers 1993,33, 1307.

28. Beveridge, D. L.; Barreiro, G.; Byun, K. S.; Case, D. A.; Cheatham,T. E. III; Dixit, S. B.; Giudice, E.; Lankas, F.; Lavery, R.; Maddocks, J.H.; Osman, R.; Seibert, E.; Sklenar, H.; Stoll, G.; Thayer, K. M.; Varnai,P.; Young, M. A. Biophys J 2004, 87, 3799.

29. Saenger, W. Principles of Nucleic Acid Structure; Springer: New York,1984.

30. Stalling, D.; Westerhoff, M.; Hege, H.-Ch. In The Visualization Hand-book; Hansen, C. D.; Johnson, C. R., Eds.; Elsevier, 2005; chapter 38,pp. 749–767.


Identification of biomolecular conformations from incomplete torsion angle observations by hidden...

Documents

Transcript of Identification of biomolecular conformations from incomplete torsion angle observations by hidden...