@let@token Applications of hidden Markov and related ... · Animal movement modelling I one of the...

Applications of hidden Markov and related models inecology, finance and other areas

Roland Langrock

Centre for Research into Ecological and Environmental Modelling& School of Mathematics and Statistics

.

How did I end up here?

Motivating example and quick overview

HMM machinery

Example applicationsEcological applicationsEconomic applicationsOther applications

Concluding remarks

How did I end up here?

My diploma thesis

First encounter with HMMs in 2008

Walter Zucchini (University of Gottingen)

Motivating example and quick overview

Hidden Markov models for animal movement

−400 −200 0 200 400

−40

0−

200

020

040

0simulated movement path

X[, 1]

X[,

2]

Pr(St = j | St−1 = j) = 0.95 for j = 1, 2

where St : state at time t

Hidden Markov models for animal movement

−400 −200 0 200 400

−40

0−

200

020

040

0simulated movement path

X[, 1]

X[,

2]

0 10 20 30 40 50 60

0.00

0.05

0.10

0.15

step length distributions

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

turning angle distributions

Pr(St = j | St−1 = j) = 0.95 for j = 1, 2,

where St : state at time t

Building blocks of HMMs

The key features of HMMs:

I observations depend on (at least partially) hidden states(with states describing, e.g., behavioural state of an animal,survival status, disease status, sleep state, economic climate)

I serial dependence in states(e.g., persistence in a resting mode)

Hence the building blocks:

I state-dependent distributions of the observations

I serial correlation in states induced by Markov chain

Markov chains

.

St−1 0St 0 St+1. . . . . .

I stochastic process St , t = 1, 2, . . .I N-state Markov chain: St ∈ 1, . . . ,N for all t

& Markov property,

Pr(St+1 = st+1 |St = st , . . . S1 = s1) = Pr(St+1 = st+1 |St = st)

I state transition probabilities: γij = Pr(St+1 = j |St = i)

I transition probability matrix (t.p.m.):

Γ =

γ11 . . . γ1N...

. . ....

γN1 . . . γNN

State-dependent distributions

I Poisson, geometric, negative binomial (for count data)

I Bernoulli (binary data)

I normal, gamma, Weibull, ... (for continuous observations)

I beta or Dirichlet (compositional data)

I . . .

I combinations of these, e.g. gamma for step lengths and vonMises for turning angles in an animal movement model

HMMs — summary/definition

St−1 0St 0 St+1

Xt−1 0Xt 0 Xt+1

. . . . . . (hidden)

(observed)

I two (discrete-time) stochastic processes, one of them hidden

I hidden state process is an N-state Markov chain

I distribution of observations determined by underlying state

I formally,

Pr(St |St−1,St−2, . . .) = Pr(St |St−1)

Pr(Xt |Xt−1,Xt−2, . . . ,St ,St−1, . . .) = Pr(Xt |St)

HMM machinery

Fitting an HMM to real data

I maximum likelihood (ML) estimation is an approach forfitting a model to data — key idea:

Parameter values for which the model has arelatively high chance of generating the observeddata should be more likely to be correct thanparameter values for which it seems very unlikelythat the corresponding model has generated theobserved data

I for HMMs, ML is used to estimate both the state transitionprobabilities and the parameters determining thestate-dependent distributions

HMMs – likelihood calculation using brute force

LHMM = f (x1, . . . , xT )

=N∑

s1=1

. . .

N∑sT =1

f (x1, . . . , xT , s1, . . . , sT )

=N∑

s1=1

. . .

N∑sT =1

δs1

T∏t=1

f (xt |st)T∏

t=2

γst−1,st

Simple form, but NT summands, numerical maximization of thisexpression thus infeasible.

HMMs – likelihood calculation via forward algorithm

Consider instead the so-called forward probabilities,

αt(j) = f (x1, . . . , xt , st = j).

These can be calculated using an efficient recursive scheme:

α1 = δP(x1)

αt+1 = αtΓP(xt+1)

with P(xt) = diag(f (xt |st = 1), . . . , f (xt |st = N)

)and t.p.m. Γ.

⇒ LHMM =N∑

j=1

αT (j) = δP(x1)ΓP(x2) · . . . · ΓP(xT )1

Computational effort linear in T !

HMMs – example code

R code for computing the log-likelihood of a gamma HMM:

loglik<-function(x,delta,Gamma,pshape,pscale)llk<-0foo<-deltafor (t in 1:length(x))foo<-foo%*%Gamma*dgamma(x[t],pshape,1/pscale)llk<-llk+log(sum(foo)); foo<-foo/sum(foo)return(llk)

Simple, isn’t it?

Estimation times for a simple gamma HMM

Example times required to numerically maximize LHMM:

N=2 N=3 N=4

T=200 0.3s 3s 10s

T=2000 2s 13s 29s

T=20000 21s 107s 284s

Computational tractability is one of the reasons for thepopularity of HMMs!

Other inferential issues

I uncertainty quantification→ bootstrap or Hessian-based

I model selection→ criteria such as the AIC

I model checking→ pseudo-residuals, simulation-based, ...

I state decoding→ Viterbi algorithm

Related model classes

I state-space models −→ can be approximated arbitrarilyaccurately by HMMs by finely discretizing the state space

I Markov-modulated Poisson processes −→ can be regardedas HMMs (with slightly modified dependence structure)

I Markov-switching regression models −→ these are HMMs(with slightly modified dependence structure)

Ecological applications, Part I:animal movement modelling

Animal movement modelling

I one of the standard movement models (an HMM!):I N behavioural states, switching governed by Markov chainI e.g., von Mises and gamma state-dependent distributions for

turning angles and step lengths, respectively

Figure: Turning angle and step length distributions for an elk in twobehavioural states (taken from Morales et al., 2004, Ecology).




(non−zero) step lengths

step length (in metres)

0 20 40 60 80 100 120

0.00

0.02

0.04

0.06

0.08

turning angles

turning angle (in radians)

−3 −2 −1 0 1 2 3

0.00

0.10

0.20

0.30

Figure: Turning angle and step length distributions for woodpeckers intwo behavioural states. Morales et al., 2004, Ecology




step lengths

step length (in km)

0.0 0.5 1.0 1.5 2.0

01

23

45

6

turning angles

turning angle (in radians)

−3 −2 −1 0 1 2 3

0.00

0.05

0.10

0.15

0.20

0.25

Figure: Turning angle and step length distributions for a bison in twobehavioural states. Morales et al., 2004, Ecology

Ecological applications, Part II:capture-recapture

Capture-recapture

I capture-recapture: mark animals at a site of interest, visit siteat regular time intervals and record re-sightings of animals

I aim: usually either abundance or survival rate estimation

I a capture-recapture encounter history such as

1 1 0 1 0 0 0

(0: not seen; 1: seen alive)

can be regarded as the outcome of an HMM, withI the states corresponding to the animal’s survival state, so that

Γ =

(φ 1− φ0 1

),

I Bernoulli state-dep. distribution for state 1(the probability of success being the recapture probability)

Capture-recapture — example Soay sheep

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

lambs

weight

surv

. pro

b.

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

yearlings

weight

surv

. pro

b.

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

adults

weight

surv

. pro

b.

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

seniors

weight

surv

. pro

b.

Figure: Estimated yearly survival probabilities of Soay sheep as a functionof body mass (and 95% pointwise CIs indicated by dashed lines).

Economic applications, Part I:share returns

HMMs for share returns — a simulation

0 1000 2000 3000 4000

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

time series of share returns

time

shar

e re

turn

00

HMMs for share returns — a simulation

0 1000 2000 3000 4000

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

time series of share returns

time

shar

e re

turn

I discrete states — is that realistic??

Stochastic volatility (SV) model

I basic SV model for modelling share returns:

yt = βεt exp(gt/2), gt = φgt−1 + σηt

I yt : share return on day t

I gt : unobserved (log-)volatility, ηt ∼ N (0, 1)

I basic model (SVN ): εt ∼ N (0, 1)

I alternative model (SVt): εt ∼ tηI these capture most of the ‘stylized facts’ attributed

to series of share returns

SV model likelihood

L = f (y1, . . . , yT )

=

∫. . .

∫f (y1, . . . , yT , g1, . . . , gT ) dgT . . . dg1

=

∫. . .

∫f (g1)f (y1|g1)

T∏t=2

f (gt |gt−1)f (yt |gt) dgT . . . dg1

I evaluation of this expression is infeasible

Numerical integration of the SV model likelihood

I strategy: numerical integration(equivalent to a discretization of the state space)

I this reduces∫

to∑

:

L ≈ bTm∑

i1=1

. . .

m∑iT =1

f (g1 = b∗i1)f (y1|g1 = b∗i1)

×T∏

t=2

f (gt = b∗it |gt−1 = b∗it−1)f (yt |gt = b∗it ) = Lapprox

I mT summands render the evaluation infeasible!

I however, the discretization essentially makes this an HMM...

⇒ Lapprox = δP(y1)ΩP(y2)ΩP(y3) · · ·ΩP(yT−1)ΩP(yT )1

SV models — example application

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

Sonylo

g−re

turn

03 Jan 2000 31 Dec 2007 01 Aug 2013

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

Merck & Co.

log−

retu

rn

03 Jan 2000 31 Dec 2007 01 Aug 2013

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

Microsoft

log−

retu

rn

03 Jan 2000 31 Dec 2007 01 Aug 2013

−0.

3−

0.2

−0.

10.

00.

10.

20.

3

S&P 500

log−

retu

rn

03 Jan 2000 31 Dec 2007 01 Aug 2013

Figure: Observed time series, partitioned into calibration sample (inblack) and validation sample (in grey).

SV models — example application

yt = εt exp(gt/2), gt = φgt−1 + σηt

Table: Estimates of the parameters determining the log-volatility process.

SVN SVt

φ σ φ σ

Sony 0.957 0.249 0.992 0.092Merck 0.825 0.545 0.992 0.086Microsoft 0.979 0.239 0.994 0.116S&P 500 0.991 0.114 0.992 0.104

Economic applications, Part II:energy price modelling using Markov-switching

regression

Markov-switching regression

Markov-switching regression model, general formulation:

g(E(Yt | st , x·t)

)= β

(st)0 + f

(st)1 (x1t) + f

(st)2 (x2t) + . . . + f

(st)P (xPt)

I g : link function

I x·t : covariate vector at time t

I st : state of underlying Markov chain at time t

I distribution of Yt from exponential family

I f(j)i ’s: either polynomials or flexibly estimated vianonparametric approach

Markov-switching regression — example

24

68

Energy prices plotted against oil prices, and fitted regression functions

oil price (Euro/barrel)

ener

gy p

rice

(cen

t/kw

h)

21 91

24

68

Time series of energy prices and decoded states

ener

gy p

rice

(cen

t/kw

h)

01.03.2002 31.10.2008

Figure: Spanish energy prices: fitted Markov-switching regression modelwith 2 states and oil price as explanatory variable.

Other applications

Other areas where HMMs are applied

Computer science: e.g. speech recognition

I observed: Fourier transforms of recorded speechI unobserved: sequence of phonemes

Medicine: e.g. disease progression

I observed: symptomsI unobserved: disease status

Sociology: longitudinal studies, e.g. on unemployment

I observed: incomplete socio-economic dataI unobserved: actual situation of individuals

Marketing: e.g. customer-brand relationships

I observed: transaction dataI unobserved: general inclination to buy & relationship to brand

Concluding remarks

Concluding remarks

I over the last two decades, HMMs have become increasinglypopular as versatile general-purpose models for time seriesdata

I key features: flexibility, mathematical simplicity &computational tractability

I various time series modelling problems can be framed either asa special case or as a simple extension of HMMs

I as a student, you are more likely to encounter classical timeseries models like AR, MA, ARIMA, VAR, ARCH, GARCH, ...(all of which focus on predictive power)

@let@token Applications of hidden Markov and related ... · Animal movement modelling I one of the...

Documents

Transcript of @let@token Applications of hidden Markov and related ... · Animal movement modelling I one of the...