@let@token Applications of hidden Markov and related ... · Animal movement modelling I one of the...
Transcript of @let@token Applications of hidden Markov and related ... · Animal movement modelling I one of the...
Applications of hidden Markov and related models inecology, finance and other areas
Roland Langrock
Centre for Research into Ecological and Environmental Modelling& School of Mathematics and Statistics
.
How did I end up here?
Motivating example and quick overview
HMM machinery
Example applicationsEcological applicationsEconomic applicationsOther applications
Concluding remarks
How did I end up here?
My diploma thesis
First encounter with HMMs in 2008
Walter Zucchini (University of Gottingen)
Motivating example and quick overview
Hidden Markov models for animal movement
−400 −200 0 200 400
−40
0−
200
020
040
0simulated movement path
X[, 1]
X[,
2]
Pr(St = j | St−1 = j) = 0.95 for j = 1, 2
where St : state at time t
Hidden Markov models for animal movement
−400 −200 0 200 400
−40
0−
200
020
040
0simulated movement path
X[, 1]
X[,
2]
0 10 20 30 40 50 60
0.00
0.05
0.10
0.15
step length distributions
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
turning angle distributions
Pr(St = j | St−1 = j) = 0.95 for j = 1, 2,
where St : state at time t
Building blocks of HMMs
The key features of HMMs:
I observations depend on (at least partially) hidden states(with states describing, e.g., behavioural state of an animal,survival status, disease status, sleep state, economic climate)
I serial dependence in states(e.g., persistence in a resting mode)
Hence the building blocks:
I state-dependent distributions of the observations
I serial correlation in states induced by Markov chain
Building blocks of HMMs
The key features of HMMs:
I observations depend on (at least partially) hidden states(with states describing, e.g., behavioural state of an animal,survival status, disease status, sleep state, economic climate)
I serial dependence in states(e.g., persistence in a resting mode)
Hence the building blocks:
I state-dependent distributions of the observations
I serial correlation in states induced by Markov chain
Markov chains
.
St−1 0St 0 St+1. . . . . .
I stochastic process St , t = 1, 2, . . .I N-state Markov chain: St ∈ 1, . . . ,N for all t
& Markov property,
Pr(St+1 = st+1 |St = st , . . . S1 = s1) = Pr(St+1 = st+1 |St = st)
I state transition probabilities: γij = Pr(St+1 = j |St = i)
I transition probability matrix (t.p.m.):
Γ =
γ11 . . . γ1N...
. . ....
γN1 . . . γNN
State-dependent distributions
I Poisson, geometric, negative binomial (for count data)
I Bernoulli (binary data)
I normal, gamma, Weibull, ... (for continuous observations)
I beta or Dirichlet (compositional data)
I . . .
I combinations of these, e.g. gamma for step lengths and vonMises for turning angles in an animal movement model
HMMs — summary/definition
St−1 0St 0 St+1
Xt−1 0Xt 0 Xt+1
. . . . . . (hidden)
(observed)
I two (discrete-time) stochastic processes, one of them hidden
I hidden state process is an N-state Markov chain
I distribution of observations determined by underlying state
I formally,
Pr(St |St−1,St−2, . . .) = Pr(St |St−1)
Pr(Xt |Xt−1,Xt−2, . . . ,St ,St−1, . . .) = Pr(Xt |St)
HMM machinery
Fitting an HMM to real data
I maximum likelihood (ML) estimation is an approach forfitting a model to data — key idea:
Parameter values for which the model has arelatively high chance of generating the observeddata should be more likely to be correct thanparameter values for which it seems very unlikelythat the corresponding model has generated theobserved data
I for HMMs, ML is used to estimate both the state transitionprobabilities and the parameters determining thestate-dependent distributions
HMMs – likelihood calculation using brute force
LHMM = f (x1, . . . , xT )
=N∑
s1=1
. . .
N∑sT =1
f (x1, . . . , xT , s1, . . . , sT )
=N∑
s1=1
. . .
N∑sT =1
δs1
T∏t=1
f (xt |st)T∏
t=2
γst−1,st
Simple form, but NT summands, numerical maximization of thisexpression thus infeasible.
HMMs – likelihood calculation via forward algorithm
Consider instead the so-called forward probabilities,
αt(j) = f (x1, . . . , xt , st = j).
These can be calculated using an efficient recursive scheme:
α1 = δP(x1)
αt+1 = αtΓP(xt+1)
with P(xt) = diag(f (xt |st = 1), . . . , f (xt |st = N)
)and t.p.m. Γ.
⇒ LHMM =N∑
j=1
αT (j) = δP(x1)ΓP(x2) · . . . · ΓP(xT )1
Computational effort linear in T !
HMMs – example code
R code for computing the log-likelihood of a gamma HMM:
loglik<-function(x,delta,Gamma,pshape,pscale)llk<-0foo<-deltafor (t in 1:length(x))foo<-foo%*%Gamma*dgamma(x[t],pshape,1/pscale)llk<-llk+log(sum(foo)); foo<-foo/sum(foo)return(llk)
Simple, isn’t it?
Estimation times for a simple gamma HMM
Example times required to numerically maximize LHMM:
N=2 N=3 N=4
T=200 0.3s 3s 10s
T=2000 2s 13s 29s
T=20000 21s 107s 284s
Computational tractability is one of the reasons for thepopularity of HMMs!
Other inferential issues
I uncertainty quantification→ bootstrap or Hessian-based
I model selection→ criteria such as the AIC
I model checking→ pseudo-residuals, simulation-based, ...
I state decoding→ Viterbi algorithm
Related model classes
I state-space models −→ can be approximated arbitrarilyaccurately by HMMs by finely discretizing the state space
I Markov-modulated Poisson processes −→ can be regardedas HMMs (with slightly modified dependence structure)
I Markov-switching regression models −→ these are HMMs(with slightly modified dependence structure)
Ecological applications, Part I:animal movement modelling
Animal movement modelling
I one of the standard movement models (an HMM!):I N behavioural states, switching governed by Markov chainI e.g., von Mises and gamma state-dependent distributions for
turning angles and step lengths, respectively
Figure: Turning angle and step length distributions for an elk in twobehavioural states (taken from Morales et al., 2004, Ecology).
Animal movement modelling
I one of the standard movement models (an HMM!):I N behavioural states, switching governed by Markov chainI e.g., von Mises and gamma state-dependent distributions for
turning angles and step lengths, respectively
(non−zero) step lengths
step length (in metres)
0 20 40 60 80 100 120
0.00
0.02
0.04
0.06
0.08
turning angles
turning angle (in radians)
−3 −2 −1 0 1 2 3
0.00
0.10
0.20
0.30
Figure: Turning angle and step length distributions for woodpeckers intwo behavioural states. Morales et al., 2004, Ecology
Animal movement modelling
I one of the standard movement models (an HMM!):I N behavioural states, switching governed by Markov chainI e.g., von Mises and gamma state-dependent distributions for
turning angles and step lengths, respectively
step lengths
step length (in km)
0.0 0.5 1.0 1.5 2.0
01
23
45
6
turning angles
turning angle (in radians)
−3 −2 −1 0 1 2 3
0.00
0.05
0.10
0.15
0.20
0.25
Figure: Turning angle and step length distributions for a bison in twobehavioural states. Morales et al., 2004, Ecology
Ecological applications, Part II:capture-recapture
Capture-recapture
I capture-recapture: mark animals at a site of interest, visit siteat regular time intervals and record re-sightings of animals
I aim: usually either abundance or survival rate estimation
I a capture-recapture encounter history such as
1 1 0 1 0 0 0
(0: not seen; 1: seen alive)
can be regarded as the outcome of an HMM, withI the states corresponding to the animal’s survival state, so that
Γ =
(φ 1− φ0 1
),
I Bernoulli state-dep. distribution for state 1(the probability of success being the recapture probability)
Capture-recapture — example Soay sheep
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lambs
weight
surv
. pro
b.
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
yearlings
weight
surv
. pro
b.
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
adults
weight
surv
. pro
b.
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
seniors
weight
surv
. pro
b.
Figure: Estimated yearly survival probabilities of Soay sheep as a functionof body mass (and 95% pointwise CIs indicated by dashed lines).
Economic applications, Part I:share returns
HMMs for share returns — a simulation
0 1000 2000 3000 4000
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
time series of share returns
time
shar
e re
turn
00
HMMs for share returns — a simulation
0 1000 2000 3000 4000
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
time series of share returns
time
shar
e re
turn
00
HMMs for share returns — a simulation
0 1000 2000 3000 4000
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
time series of share returns
time
shar
e re
turn
I discrete states — is that realistic??
Stochastic volatility (SV) model
I basic SV model for modelling share returns:
yt = βεt exp(gt/2), gt = φgt−1 + σηt
I yt : share return on day t
I gt : unobserved (log-)volatility, ηt ∼ N (0, 1)
I basic model (SVN ): εt ∼ N (0, 1)
I alternative model (SVt): εt ∼ tηI these capture most of the ‘stylized facts’ attributed
to series of share returns
SV model likelihood
L = f (y1, . . . , yT )
=
∫. . .
∫f (y1, . . . , yT , g1, . . . , gT ) dgT . . . dg1
=
∫. . .
∫f (g1)f (y1|g1)
T∏t=2
f (gt |gt−1)f (yt |gt) dgT . . . dg1
I evaluation of this expression is infeasible
Numerical integration of the SV model likelihood
I strategy: numerical integration(equivalent to a discretization of the state space)
I this reduces∫
to∑
:
L ≈ bTm∑
i1=1
. . .
m∑iT =1
f (g1 = b∗i1)f (y1|g1 = b∗i1)
×T∏
t=2
f (gt = b∗it |gt−1 = b∗it−1)f (yt |gt = b∗it ) = Lapprox
I mT summands render the evaluation infeasible!
I however, the discretization essentially makes this an HMM...
⇒ Lapprox = δP(y1)ΩP(y2)ΩP(y3) · · ·ΩP(yT−1)ΩP(yT )1
SV models — example application
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
Sonylo
g−re
turn
03 Jan 2000 31 Dec 2007 01 Aug 2013
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
Merck & Co.
log−
retu
rn
03 Jan 2000 31 Dec 2007 01 Aug 2013
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
Microsoft
log−
retu
rn
03 Jan 2000 31 Dec 2007 01 Aug 2013
−0.
3−
0.2
−0.
10.
00.
10.
20.
3
S&P 500
log−
retu
rn
03 Jan 2000 31 Dec 2007 01 Aug 2013
Figure: Observed time series, partitioned into calibration sample (inblack) and validation sample (in grey).
SV models — example application
yt = εt exp(gt/2), gt = φgt−1 + σηt
Table: Estimates of the parameters determining the log-volatility process.
SVN SVt
φ σ φ σ
Sony 0.957 0.249 0.992 0.092Merck 0.825 0.545 0.992 0.086Microsoft 0.979 0.239 0.994 0.116S&P 500 0.991 0.114 0.992 0.104
Economic applications, Part II:energy price modelling using Markov-switching
regression
Markov-switching regression
Markov-switching regression model, general formulation:
g(E(Yt | st , x·t)
)= β
(st)0 + f
(st)1 (x1t) + f
(st)2 (x2t) + . . . + f
(st)P (xPt)
I g : link function
I x·t : covariate vector at time t
I st : state of underlying Markov chain at time t
I distribution of Yt from exponential family
I f(j)i ’s: either polynomials or flexibly estimated vianonparametric approach
Markov-switching regression — example
24
68
Energy prices plotted against oil prices, and fitted regression functions
oil price (Euro/barrel)
ener
gy p
rice
(cen
t/kw
h)
21 91
24
68
Time series of energy prices and decoded states
ener
gy p
rice
(cen
t/kw
h)
01.03.2002 31.10.2008
Figure: Spanish energy prices: fitted Markov-switching regression modelwith 2 states and oil price as explanatory variable.
Other applications
Other areas where HMMs are applied
Computer science: e.g. speech recognition
I observed: Fourier transforms of recorded speechI unobserved: sequence of phonemes
Medicine: e.g. disease progression
I observed: symptomsI unobserved: disease status
Sociology: longitudinal studies, e.g. on unemployment
I observed: incomplete socio-economic dataI unobserved: actual situation of individuals
Marketing: e.g. customer-brand relationships
I observed: transaction dataI unobserved: general inclination to buy & relationship to brand
Concluding remarks
Concluding remarks
I over the last two decades, HMMs have become increasinglypopular as versatile general-purpose models for time seriesdata
I key features: flexibility, mathematical simplicity &computational tractability
I various time series modelling problems can be framed either asa special case or as a simple extension of HMMs
I as a student, you are more likely to encounter classical timeseries models like AR, MA, ARIMA, VAR, ARCH, GARCH, ...(all of which focus on predictive power)