Observation-driven models for Poisson counts

Biometrika (2003), 90, 4, pp. 777–790

© 2003 Biometrika Trust

Printed in Great Britain

Observation-driven models for Poisson counts

B RICHARD A. DAVIS

Department of Statistics, Colorado State University, Fort Collins,Colorado 80523, U.S.A.

[email protected]

WILLIAM T. M. DUNSMUIR

Department of Statistics, University of New South Wales, Sydney,New South Wales 2052, Australia

[email protected]

SARAH B. STREETT

Geophysical Statistics Project, National Center for Atmospheric Research,PO Box 3000, Boulder, Colorado 80307-3000, U.S.A.

[email protected]

S

This paper is concerned with a general class of observation-driven models for time seriesof counts whose conditional distributions given past observations and explanatory vari-ables follow a Poisson distribution. These models provide a flexible framework for model-ling a wide range of dependence structures. Conditions for stationarity and ergodicity ofthese processes are established from which the large-sample properties of the maximumlikelihood estimators can be derived. Simulations are provided to give additional insightinto the finite-sample behaviour of the estimators. Finally an application to a regressionmodel for daily counts of asthma presentations at a Sydney hospital is described.

Some key words: Asthma hospitalisation; Observation-driven model; Poisson-valued time series.

1. I

Time series of counts arise in a variety of applications including modelling the effectsof environmental pollution on human health and the impact of policy controls on roaddeaths. Davis et al. (1999) provided a review of models for Poisson time series. There,Cox’s (1981) classification of models into observation- and parameter-driven processes isdescribed, and a new class of models, to which we will refer as generalised linear auto-regressive moving average models, was introduced. In this paper we develop the theoreticalproperties of these models more comprehensively.In Davis et al. (1999) generalised state space models are reviewed. Both parameter-

driven and observation-driven versions of these models have the same observation equa-tion but the state vector for the parameter-driven model evolves independently of the pasthistory of the observation process. In contrast, the state vector depends explicitly on pastobservations for the observation-driven model.

at Belgorod State U

niversity on Decem

ber 23, 2013http://biom

et.oxfordjournals.org/D

ownloaded from

http://biomet.oxfordjournals.org/


778 R. A. D, W. T. M. D S. B. S

An example of a parameter-driven model for a Poisson time series {Yt} has state equation

mt=exp (xT

tb+n

t), (1)

where xTtis a p-dimensional row vector of covariates observed at time t and b is a vector

of regression coefficients. The process {nt} is assumed to be stationary with autocovariance

function specified by a finite number of parameters. For example, one might assume that{nt} is an autoregressive moving average process. Conditional on {n

t} the observations

are independent Poisson random variables with means mt, written as Y

t|nt~Po (m

t).

Parameter-driven models are straightforward in their interpretation of the effects ofcovariates on the observed count process. However, in general, estimation of parametersin parameter-driven models requires considerable computational effort (Chan & Ledolter,1995; Durbin & Koopman, 2000; Jung & Liesenfeld, 2001), as does forecasting the latentprocess.At the present time, and for realistically long and numerous time series arising in

contexts of practical importance, possibly involving many covariates, the computationallyintensive methods required for fitting parameter-driven models are not yet routinely avail-able. The alternative observation-driven models are, however, very easily fitted to datafor quite complex regression models, as shown by the asthma example presented in § 4.Observation-driven models are sometimes referred to as transition models in the longi-

tudinal data analysis literature (Diggle et al., 1994, Ch. 10). Zeger & Qaqish (1988) reviewvarious observation-driven models for count time series. They note that for ease ofinterpretation of the regression effects the marginal mean of Y

tshould be approximated

as

E(Yt)=E(m

t)jexp (xT

tb)

so that the regression coefficients b can be interpreted as the approximate change in themarginal expectation of Y

ton the logarithm scale given a unit change in the regressor

variables xt. This property holds exactly for parameter-driven models in which the latent

process ntis Gaussian; see Davis et al. (2000). A flexible model class should also allow

for both positive and negative serial dependence; this is certainly the case for the parameter-driven models (1). Some of the observation-driven models that are discussed in Zeger &Qaqish (1988) do not satisfy this last property and will admit stationary solutions onlyfor negative serial dependence, a case that is less common in many applications than thatof positive dependence.LetH

t= (Y (t−1), x(t)) be the past of the observed count process and the past and present

of the regressor variables, and assume that the conditional distribution of Yt|Ht~Po (m

t).

A simple and appealing way of building serial dependence into the model is to requirethe log-mean process to depend linearly on scaled deviations of previous observationsfrom their means. To define the mean at time t in terms of the entire history of the observedcount process, the terms n

tin (1) are replaced by

Zt= ∑2

i=1ci AYt−i−mt−iml

t−iB , (2)

where the ciform a sequence of absolutely summable constants and 0<l∏1 is a constant.

Such processes satisfy the approximate relationship for the mean and allow both positiveand negative dependence. Various forms of this model have been considered previously.In particular, for no mean correction or standardisation and a truncated distributed lag

at Belgorod State U

niversity on Decem



ownloaded from



779Models for Poisson counts

structure we have

log (mt)=xTtb+ ∑

q

i=1ciYt−i

. (3)

Model (3) is applied to data in Fahrmeir & Tutz (1994). Zeger & Qaqish (1988) pointout that (3) cannot be stationary unless, at least in the case q=1, c1∏0, thereby excludingthe possibility of positive dependence. This point is also acknowledged by Fahrmeir &Tutz. It might be thought that the difficulties with model (3) could be overcome merelyby subtracting the ‘fixed effects’ component of the mean from the Y

tto arrive at

log (mt)=xTtb+ ∑

p

i=1ci{Yt−i−exp (xT

t−ib)}, (4)

but this will not lead to a stationary process, as can be seen by an extension of theargument presented by Zeger & Qaqish (1988). In an unpublished Oxford Universitytechnical report ‘Generalised linear autoregressions’, N. Shephard extended (4) using amodel for the mean function that is similar to the specification in (2) with l=1 and forthe exponential family of observation distributions. In Shephard’s formulation the fixed-effect terms enter slightly differently and values of l<1 are not considered. Also, existenceor uniqueness of stationary distributions for the state process and approximate normalityof the conditional likelihood estimators are not considered.For a fixed value of l it is generally very easy to fit these observation-driven models

using conditional maximum likelihood. Here conditioning refers to conditioning on someinitial values and not on random effects. Also forecasting of future observations is straight-forward. In fact m@

t+1=exp{xT

t+1b+Z

t+1} could be used as the one-step-ahead forecast

of Yt+1, given a value or a reliable forecast for x

t+1. However, because of the way in which

past observations feed into the mean term, the interpretation of the effect of covariateson the mean response can be difficult.A major objective of this paper is to establish ergodicity of the process {Y

t} for various

values of model parameters including l. These new results, together with a derivation ofbasic properties of this general class of observation-driven models, are given in § 2. In § 3,we consider maximum likelihood estimators for these models, with simulation results thatsupport the use of an approximate normal distribution for inference. In § 4 we fit thesemodels to data consisting of the number of asthma presentations at a hospital in theSydney metropolitan area. Technical results related to the properties of the process aregiven in the Appendix.

2. O-

2·1. T he model based on a general autoregressive moving average

The key feature of the observation-driven models considered here is the introductionof past observations of the count time series into the mean of the current observationthrough a linear filter applied to martingale differences constructed as follows. We assumethat, given the past history F

t−1=s(Y

s, s∏t−1),

Yt|Ft−1~Po (m

t).

It is further assumed that the state process Wt= log (m

t) follows the model

Wt=xTtb+Z

t=xTtb+ ∑

2

i=1ciet−i

,

at Belgorod State U

niversity on Decem



ownloaded from



780 R. A. D, W. T. M. D S. B. S

where, for lµ(0, 1],

et= (Yt−eW

t)e−lW

t.

The infinite moving average in this model can be specified in terms of a finite numberof parameters in many ways. Throughout this discussion we will consider distributed lagstructures that are generated by the linear predictor for autoregressive moving averageprocesses of the form

Ut=w1Ut−1+ . . .+w

pUt−p+et+h1et−1+ . . .+h

qet−1

, (5)

where the polynomials, w(z)=1−w1z− . . .−w

pzp and h(z)=1+h

1z+ . . .+h

qzq, have

all their zeros outside the unit circle. The best linear predictor of Utbased on the infinite

past {Ut−1

, Ut−2

, . . .} is

UCt= ∑2

i=1ciet−i

,

where

∑2

i=1cizi=A1− ∑p

i=1wiziB−1 A1+ ∑q

i=1hiziB−1.

Hence in the specification given above we set Zt=UCt. In the model-fitting stage, Z

tis

computed using the autoregressive moving average recursions (5). For t∏0 set et=0,

Zt=0, and for t>0 define

Zt=w1(Zt−1+et−1

)+ . . .+wp(Zt−p+et−p

)+h1et−1+ . . .+h

qet−q

, Wt=xTtb+Z

t.

2·2. Properties of the model

Under the initial conditions es=0 and Y

s=0 for s∏0,

Fes−1={et: t∏s−1}, F

s−1={Yt: t∏s−1}

generate the same s-fields and the etform a martingale difference sequence for which

E(es|Fes−1

)=0 (s�1).

Hence, the ethave zero mean and variance

E(e2t)=E{E(e2

t|mt)}=E(m1−2l

t) (t�1).

It also follows from the martingale difference property that cov (et, es)=0 for tNs. From

the above properties we have, for any l,

E(Wt)=xTtb, var (W

t)= ∑2

i=1c2iE(m1−2lt−i

),

and, for s=t+h, h>0,

cov (Wt, Wt+h

)= ∑2

i=1cici+h

E(m1−2lt−i

).

If l=0·5, then E(e2t)=1 and the covariances do not depend on time t even if {m

t} is not

strictly stationary.While the process {W

t} has mean xT

tb, the process {m

t} has mean greater than exT

tb. In

at Belgorod State U

niversity on Decem



ownloaded from




the case l=0·5 and under the assumption that Ztis a Gaussian process, we have exactly

that

E(eWt)=exT

tb+Dvar(Z

t)=exT

tb+DW

2i=1c2i

.

In this case the regression coefficients could be interpreted as the amount by which themean of Y

ton the log-scale would change for a unit change in the regressors and the

intercept term can be corrected for bias due to the latent process Zt. Although, for

the observation-driven formulation, Ztis not a Gaussian process the above calculation

suggests an approximate method for correcting the bias of E(mt) as an estimate of exT

tb

so that an approximately unbiased plot of mtmight be generated by

m@t=exp AWC t− 1

2∑2

i=1c@2iB ,

where estimates have been used throughout. The extent to which the joint distribution ofthe sequence {Z

t} differs from a Gaussian process will govern the extent to which the

approximation

E(eWt)jexT

tb+DW

2i=1c2i

holds. If the mean process mtis large then the approximation is likely to be good.

2·3. A simple example

The simplest nontrivial example of the above model for which we can make considerableprogress with the theory of ergodicity and stationarity assumes that p=0, q=1 andxTtb=b. In this simple but illuminating case, the state process reduces to

Wt=b+c(Y

t−1−eW

t−1)e−lW

t−1. (6)

While the state process {Wt} is Markov, the observation process {Y

t} is not, since, for

example, the conditional mean E(Yt|Y (t−1)) depends on the whole past. We now develop

in further detail the properties of this ‘basic model’.The log-mean process {W

t} in (6) is Markov with mean

E(Wt)=E{E(W

t|Wt−1

)}=b,

and variance

var (Wt)=c2E[exp{(1−2l)W

t−1}].

It follows that, for l=0·5, var (Wt)=c2, while, for l=1,

var (Wt)=c2E(e−W

t−1)�c2e−E(W

t−1)=c2e−b.

The state space for the conditional distribution of Wtgiven W

t−1has the following form:

Wt q�b−ce(1−l)Wt−1, if c�0,

∏b−ce(1−l)Wt−1

, if c∏0.

While the range of Wtdoes not depend on the value of W

t−1for l=1, the range does

depend on Wt−1for values of l<1, and this severely complicates the analysis.

Another important property is that the process {Wt} is uniformly ergodic for the case

l=1. Hence, there exist unique stationary distributions for both the log-mean and mean

at Belgorod State U

niversity on Decem



ownloaded from



782 R. A. D, W. T. M. D S. B. S

processes in this case. For 12∏l<1, there exists a stationary distribution, yet the unique-

ness of such a distribution is currently unknown. These results are summarised in thefollowing two propositions, the proofs of which are contained in the Appendix.

P 1. L et Yt~Po (eW

t), where W

t=c(Y

t−1−eW

t−1)e−lW

t−1, for 1

2∏l∏1 and

cN0. T hen the chain {Wt} is bounded in probability, and therefore has a stationary

distribution.

P 2. W hen l=1 and cN0 the process {Wt} satisfies Doeblin’s condition and

is strongly aperiodic. Hence, the process has a unique stationary distribution and isuniformly ergodic.

The result given above for l=1 extends readily to the case

Wt=b+ ∑

q

i=1ci(Yt−i−eW

t−i)e−Wt−i

if we consider the q-variate Markov chain (Wt, Wt−1

, . . . , Wt−q+1

).Proposition 2 is the starting point for establishing the limiting behaviour of maximumlikelihood estimators for these models. If the process {W

t} is started with its stationary

distribution, then the resulting observation process {Yt} will also be stationary and ergodic.

Without these two properties, there is no guarantee that maximum likelihood estimatorswill even be consistent.For the basic model, the conditional mean of the mean process is given by

E(mt|mt−1

)=E[exp{b+c(Yt−1−mt−1

)/mlt−1

}].

Using the moment generating function for the Poisson distribution, we obtain

E(mt|mt−1

)=exp (b) exp{mt−1

(ec/mlt−1−1−cm−l

t−1)}. (7)

If l=0, equation (7) becomes

E(mt|mt−1

)=exp (b) exp{mt−1

(ec−1−c)},

so that if c�0 the conditional means will evolve in an unstable fashion. Thus, whenl=0, E(m

t|mt−1

) will grow without bound whenever mtbecomes positive. In contrast,

for l=1, equation (7) becomes

E(mt|mt−1

)=exp (b−c) exp{mt−1

(ec/mt−1−1)},

which is bounded as mt�2. For other values, 0<l<1

2, the stability properties of the

process are less clear.

3. E

3·1. Maximum likelihood estimation

In this section we briefly discuss the use of conditional maximum likelihood to obtainparameter estimates and suggest a method for computing standard errors. Letd= (bT, tT )T, t= (wT, hT )T and define L

t(d)= log f (y

t|Ft−1

), where the conditional densityf of Y

tgiven F

t−1is Poisson. The loglikelihood can then be written as Wn

t=1Lt(d) which,

if we ignore terms which do not involve the parameters, becomes

L (d)= ∑n

t=1{YtWt(d)−eW

t(d)},

at Belgorod State U

niversity on Decem



ownloaded from




where

log (mt)=W

t(d)=xT

tb+ ∑

2

i=1ci(t)et−i

(d), et(d)= (Y

t−mt)/mlt. (8)

For brevity, we will often suppress the dependence of eton d.

Recursive expressions needed to calculate the first and second derivatives of the loglikeli-hood are straightforwardly derived and are available in a Colorado State Universitytechnical report by the authors. These are readily programmed and form the basis foroptimising the likelihood using a simple procedure such as Newton–Raphson iteration,which we have implemented in S-Plus code. We initialised the Newton–Raphson iterationswith the standard generalised linear model estimates without the autoregressive movingaverage terms together with zero initial values for e

t(t∏0). Convergence in the majority

of cases reported below, as measured by the first derivatives being less than 10−6 inmagnitude, occurred within 10 iterations from these starting conditions.A central limit theorem for the maximum likelihood estimators is currently not available

for the general model. This requires extension of the above propositions pertaining to theexistence and uniqueness of stationary distributions so that suitable ergodic propertiescan hold. Regardless of the technical issues involved, a reasonable choice for estimationof the covariance matrix of the estimators would be

VC=− A∂2L (d@ )∂d ∂dTB−1, (9)

where

∂2L∂d ∂dT

= ∑n

t=1Aetmlt ∂2Wt∂d ∂dT

−mt∂Wt

∂d∂Wt

∂dTB .The remaining expressions needed to calculate these derivatives and the asymptotic resultsfor these estimators in the simple model are given in the previously cited technical reportby the authors. In this case, the asymptotic distribution of the maximum likelihoodestimators is N(0, V −1 ), where

V= limn�2

1

n∑n

t=1eWt(d0)WtW Tt, (10)

with Wt=∂W

t(d0)/∂d. The argument is based on a linearised form of the loglikelihood

which is convex in the parameters and therefore amenable to a functional central limittheorem. Establishing this uses Proposition 2 and a standard martingale central limittheorem.

3·2. Simulation results

To assess the asymptotic properties of the parameter estimators, we simulate from twomodels and compare the results with the theory derived in our technical report. Thesemodels correspond to constant and linear trends, i.e.

Wt=b0+c(Y

t−1−eW

t−1)e−Wt−1

, (11)

Wt=b0+b1t/n+c(Y

t−1−eW

t−1)e−Wt−1

. (12)

at Belgorod State U

niversity on Decem



ownloaded from



784 R. A. D, W. T. M. D S. B. S

Table 1 contains the results for the model given by (11) for two choices of b0and c,

denoted by d1 and d2 , with a sample size of n=250 and N=1000 replications. A burn-inperiod of length 100 was used in the simulation of the realisations. Table 2 contains theresults for the model given by (12) for two combinations of (b0 , b1 , c)= (d1 , d2 , d3 ), whereagain the sample size is 250 and the number of replications is 1000. In both tables m@

d@j

isthe average of the N estimates of d

j, s@d@j

is the sample standard deviation of the N estimatesof dj, sd@j,i

is the estimate of the standard error of dj,ias computed by (9), and m@

s,d@j

is theaverage of the s

d@j,i

, where d= (bT , c)T.

Table 1. Simulation results for the estimation of param-eters in the model given by equation (11), with sample

size n=250 and N=1000 replications

Parametersm@d@j

s@d@j

m@d@j

±1·96s@d@j

/√N m@s,d@j

d1=1·5 1·4978 0·0387 (1·4954, 1·5001) 0·0374d2=0·25 0·2470 0·0582 (0·2434, 0·2506) 0·0583

d1=1·5 1·4990 0·0531 (1·4957, 1·5023) 0·0660d2=0·75 0·7435 0·0386 (0·7411, 0·7459) 0·0318

d1=3·0 3·0001 0·0170 (2·9990, 3·0012) 0·0176d2=0·25 0·2483 0·0618 (0·2445, 0·2521) 0·0613

d1=3·0 3·0000 0·0252 (2·9984, 3·0015) 0·0244d2=0·75 0·7349 0·0392 (0·7325, 0·7374) 0·0404

Table 2. Simulation results for the estimation of param-eters in the model given by equation (12), with sample

size n=250 and N=1000 replications

Parametersm@d@j

s@d@j

m@d@j

±1·96s@d@j

/√N m@s,d@j

d1=1 0·9951 0·1290 (0·9871, 1·0031) 0·1307d2=0·5 0·5044 0·1313 (0·4962, 0·5125) 0·1324d3=0·25 0·2448 0·0593 (0·2411, 0·2485) 0·0586

d1=1 0·9887 0·1609 (0·9788, 0·9987) 0·1669d2=−0·15 −0·1424 0·1746 (−0·1533,−0·1316) 0·1784d3=0·25 0·2476 0·0559 (0·2441, 0·2510) 0·0550

In all cases, the ‘true’ parameter value djis very close to the estimated value, m@

d@j

. Acomparison can also be made to evaluate the accuracy of the estimates of standard error.We estimate V

jjas defined in equation (10) by s@

d@j

and compare its value with the averageof the N estimates of standard error, m@

s,d@j

. Again, these values are very similar, supportingthe theory for maximum likelihood estimators derived in our technical report.Figure 1 contains plots of the estimated densities along with the appropriate normal

density for the first model, with b0=1·5 and c=0·25. The illustrated normal densities

have mean djand standard deviation s@

d@j

. As seen from these plots, which are typical ofother examples of both our models, the estimated and asymptotic densities are in verygood agreement.

at Belgorod State U

niversity on Decem



ownloaded from




(a)

10

8

6

4

2

0

6

4

2

0

(b)

1·40 1·45 1·50 1·55 1·60�β0

0·1 0·2 0·3 0·4�γ

Fig. 1. Comparison of empirical densities for the maximum likelihood estimators, solid lines,along with corresponding limiting normal densities, dashed lines, for the model of equation (11)

with b0=1·5 and c=0·025.

4. A S

4·1. Background

Davis et al. (1999, 2000) presented analyses of several datasets using the observation-driven models of § 2. These include the polio data introduced by Zeger (1988), the UKsudden infant death syndrome series considered by Campbell (1994), and a preliminaryanalysis of a series of daily counts of asthma presentations at the accident and emergencydepartment of Campbelltown Hospital located in the southwest metropolitan area ofSydney, Australia.The previous model for the Campbelltown asthma series included explanatory variables

for a Sunday effect, a Monday effect, an increasing linear trend in time and a seasonalpattern. The latter was modelled using Fourier series terms consisting of cos(2pkt/365)and sin (2pkt/365), for k=1, 2, 3, 4. The remaining serial dependence was modelled by ageneralised linear autoregressive model with nonzero coefficients at lags 1, 3, 7 and 10 forthe autoregressive component and no moving average component. However, some slightoverdispersion remained. This, together with the somewhat complex lag dependence struc-ture, points to the need for additional covariates or possibly a more flexible seasonalpattern. Here we extend that analysis with a more comprehensive model for the seasonaleffects and the pollution series.In addition, the observation-driven model fitted in Davis et al. (1999) did not address

a major practical question for which the data were originally collected, concerning theinfluence of air pollution levels on the number of daily asthma cases. Since meteorologicalconditions affect the pollution process, temperatures can have a direct effect on asthmaoccurrences, and, because the growth of fungal spores and dust mite level can be relatedto humidity and temperature, it is reasonable also to consider inclusion of meteorologicalvariables in the model (Samet et al., 1998). At the time of the analysis in Davis et al.(1999), only partial data were available on relevant pollution series.

4·2. Description of the data

Pollution measurements. The New South Wales Environmental Protection Authorityprovided all available pollution measurements for the Sydney metropolitan region com-

at Belgorod State U

niversity on Decem



ownloaded from



786 R. A. D, W. T. M. D S. B. S

mencing prior to 1 January 1990, the date at which the asthma data commenced, and upto 1999. Unfortunately, for the time period of our data the network coverage was rathersparse for southwest Sydney. After analysing all available records for completeness weselected the observations from the Lidcombe observing site for ozone and NO2 measure-ments. Lidcombe is to the north-east of Campbelltown but was considered to be sufficientlyclose to give representative readings for these pollutants. Unfortunately, the crucial nephol-ometer readings of particulate concentrations were not available at Lidcombe during therelevant time period. The two most complete records were at Rozelle and Kensington,both of which are considerably closer to downtown Sydney and the coastline. Particulatereadings from these two locations were averaged to produce a single series. In addition,two types of pollution series were used, namely daily average and daily maximum readingsbased on hourly measurements.

Meteorological data.Meteorological data were obtained from the Australian Bureau ofMeteorology at Liverpool and were considered to be relevant to the three hospitals con-sidered. We were particularly interested in the effects of moisture on asthma levels. Inexploratory analysis rainfall did not appear to play a major role, whereas humidity didwith an approximate lag of 14 days. Details of the construction and statistical significanceof this variable are provided in Davis et al. (2000).

Seasonal eVects. Figure 1 of Davis et al. (2000) shows evidence of a triple peak seasonalpattern in the time series of hospital admission counts. In Davis et al. (1999, 2000) theseasonal variation was modelled using a harmonic regression with several harmonics forthe annual frequency. This model may not be appropriate because the intensity of seasonalpeaks appears to vary considerably from year to year. In the analysis to follow we proposean alternative representation of the seasonal behaviour that allows us to test the constancyfrom year to year.The timing of the peaks appears to line up with the four terms in the school year with

the break between terms one and two occurring at varying times because of the timingof the Easter vacation. In the revised seasonal model we assume that there is a broadannual seasonal pattern that is the same in all years and is modelled using annual har-monics cos (2pt/365) and sin (2pt/365). To model the peaks, we used a beta densityfunction,

p(x)=q 1

B(a, b)xa−1 (1−x)b−1, xµ[0, 1],

0, x1[0, 1],

with a=2·5 and b=5. These parameters were chosen based on a preliminary data analysiscomparing the shapes of the peaks in all years and at three locations. Let T

ijbe the start

time for the jth school term in year i where time is chosen from t=1, . . . , 1461, the daysin our sample numbered sequentially. Then the peak function for the jth school term inyear i is

Pij(t)=p At−T

ij100 B .

In this formulation there are sixteen such functions, one each for four school terms infour years, each of which enters into the regression model with an individual coefficient.

Other model terms. Additional explanatory variables used in the model included an

at Belgorod State U

niversity on Decem



ownloaded from




overall linear trend over the four-year period and the aforementioned indicator variablesfor Sunday and Monday effects. As explained in Davis et al. (2000) there is a clear needin these data, with a fixed annual seasonal cycle, for serial dependence effects. Of interestis whether or not inclusion of a more flexible seasonal model, based on the school termpeak components, will decrease the serial dependence effects. We investigated a numberof options for specifying the autoregressive and moving average effects in the generalisedlinear autoregressive moving average model. In the final model a moving averagecomponent at lag 7 was all that was required.

4·3. Fitting the model

A number of models were fitted to the data to investigate the effects of the regressionvariables summarised above. Based on preliminary analysis it was clear that several vari-ables could be dropped. We retained the two annual harmonic terms, the Sunday andMonday effects, the linear trend, the same-day minimum temperature, the lagged com-posite humidity variableH

t, and the three same-day pollution variables, namely maximum

ozone, NO2 and particulates. We also included the 16 individual term peak components.In this model the term peak components for the school terms 3 and 4, the latter half ofthe calendar year, were not individually significant at the 5% level and were droppedfrom the model. We then performed a likelihood ratio test for the constancy of the termpeak effects for terms 1 and 2 across all four years. The likelihood ratio statistic was 29·9,which, based on an approximating x2

6distribution under the null hypothesis, gives a p-value

of 0·00004. Accordingly we retained the more flexible representation in which each yearallowed variation in the size of the term peaks. The coefficients for Tmin and Trend werethen not individually significant and were dropped from the model. We next investigatedthe impact of the pollution variables. The coefficients of the same-day values of maximumozone and particulates were not significant, at the 5% level, while that of NO2 wassignificant. None of the one-day lag effects of the three maximum pollution measurementswas significant. The final model is summarised in Table 3.

Table 3. Analysis of Sydney asthma time series

Variable Coef. t-ratio Variable Coef. t-ratio

Intercept 0·583 0·062 9·46 Term 1, 1990 0·200 0·056 3·54Sunday 0·197 0·056 3·53 Term 2, 1990 0·132 0·057 2·31Monday 0·230 0·055 4·20 Term 1, 1991 0·087 0·066 1·32Annual cosine −0·214 0·039 −5·54 Term 2, 1991 0·172 0·057 2·99Annual sine 0·176 0·040 4·35 Term 1, 1992 0·254 0·055 4·66Humidity H

t/20 0·169 0·055 3·09 Term 2, 1992 0·308 0·049 6·31

NO2 max −0·104 0·033 −3·16 Term 1, 1993 0·439 0·050 8·77, lag 7 0·042 0·018 2·32 Term 2, 1993 0·116 0·061 1·91

Coef., coefficient; , standard error; , moving average.

The fitted values from the model are shown in Fig. 2 along with the actual counts. Inthis final model, various test statistics reviewed in Davis et al. (1999, 2000) for the presenceof a latent process and the degree of autocorrelation indicated that there was no need toinclude additional autoregressive or moving average terms.

4·4. Discussion

The expanded and revised model for the Campbelltown asthma series allows for seasonalpatterns to be aligned with the school term dates and to vary in intensity from year to

at Belgorod State U

niversity on Decem



ownloaded from



788 R. A. D, W. T. M. D S. B. S

(a) 1990

6

4

2

0

8

6

4

2

0

10

8

6

4

2

0

12

8

4

0

(b) 1991

(c) 1992

0 100 200 300Day of year

0 100 200 300Day of year

0 100 200 300Day of year

0 100 200 300Day of year

Cou

nts

Cou

nts

Cou

nts

Cou

nts

(d) 1993

Fig. 2. Asthma counts, dotted lines, at Campbelltown Hospital with conditional means, solid lines, for years1990–1993.

year. These differences are highly statistically significant. Virally induced asthma occur-rences might be synchronised in part with the school terms and would not necessarilyoccur with the same intensity in the same terms across different years or across terms inthe same year. The use of a more flexible seasonal model leads to a simplification of thelag dependence structure compared with that in previous analyses; see Davis et al. (1999,Table 4). However, moving average dependence at lag 7 is positive and significant.Inferences about the key pollution and weather variables are adjusted for this serial depen-dence in the above analysis.The same analysis was repeated for data from the Liverpool and Lidcombe hospitals.

Similar results were obtained for these two sites, although at these places none of thepollution variables was statistically significant. Experience with these and other datasetsdemonstrates that for complex regressor sets and quite complex lag-dependence structuresit is feasible to fit the observation-driven models. The use of Newton–Raphson givesreliable and rapid convergence in all cases we have so far encountered.An obvious next step is to allow for estimation of the parameter l; in the simulation

studies we fixed l=1 and in the analysis of the asthma series above we set l=0·5.

A

We gratefully acknowledge the assistance of Louise Helby for providing the daily asthmadata, of Elly Spark of the Australian Bureau of Meteorology and Hiep Duc of the NewSouth Wales Environmental Protection Agency. We thank the editor and referees forsuggestions which have greatly improved the presentation. The research of R. A. Daviswas supported in part by the National Science Foundation.

at Belgorod State U

niversity on Decem



ownloaded from




A

Existence of stationary solutions

In this section, we establish the existence of a stationary solution for the process {Wt} for the

simple model (6). For the special case l=1, we prove that the stationary distribution is uniqueusing techniques fromMeyn & Tweedie (1993). We begin by stating a general result on the existenceof a stationary distribution for a Markov chain.

P A1. If {Xn} is a weak Feller chain with state-space X and if for any e>0 there

exists a compact set C5X such that P(x, Cc )<e, for all xµX, where P(x, Cc ) is the transitionprobability from x to Cc, then {X

n} is bounded in probability and there exists at least one stationary

distribution for the chain.

Proof. Assume that for any e>0 there exists a compact set C5X such that P(x, Cc )<e for allxµX. If Pk(x, . ) denotes the k-step transition probability of the chain starting from state x, thenPk(x, Cc )=∆P(y, Cc )Pk−1(x, dy)<e. Thus, the chain is bounded in probability. In fact, the tightnessof the k-step transition probabilities holds uniformly in x. It follows that the chain is bounded inprobability on average and hence, by Theorem 12.0.1(i) of Meyn & Tweedie (1993), there exists astationary distribution. %

Proof of Proposition 1. First note that the chain {Wt} is weak Feller. Define C)[−c, c]. Then

P(x, Cc )=pr (WtµCc |W

t−1=x)=pr{c(Y

t−1−eW

t−1)e−lW

t−1µ[−c, c]c |W

t−1=x}.

By Markov’s inequality, we obtain

P(x, Cc )∏q(c/c)2e−2lx var (Yt |Wt−1=x) (x�0),

( |c |/c)e−lxE( |Yt−1−ex | |W

t−1=x) (x<0),

∏q(c/c)2e(1−2l)x (x�0),

2( |c |/c)e(1−l)x (x<0),

∏q(c/c)2 (x�0),

2( |c |/c) (x<0).

Thus, given e>0, choose c large enough such that max (2( |c |/c), (c/c)2 )<e. The result follows fromProposition A1. %

We next turn to the proof of the uniqueness of a stationary distribution in the case l=1. Toaccomplish this we shall show that the process {W

t} is aperiodic and satisfies Doeblin’s condition,

namely that there exists a probability measure n with the property that, for some m�1, e>0 andd>0,

n(B)>e[Pm(x, B)�d, (A1)

for every xµX. It then follows from Theorem 16.2.3 of Meyn & Tweedie (1993) that {Wt} has a

unique stationary distribution and is uniformly ergodic.

Proof of Proposition 2. In order to establish Doeblin’s condition we consider two cases.Case 1: c<0. From (6) with l=1, one can see that W

t=b−c+cY

t−1eWt−1∏b−c. Define the

measure n to have unit point mass at {b−c}. In order to verify (A1), it suffices to only considerBorel sets B with b−cµB. Then, for all x∏b−c,

P(x, B)=pr (WtµB |W

t−1=x)�pr (W

t=b−c |W

t−1=x)

=pr (Yt−1=0 |W

t−1=x)=e−ex�e−eb−c,

which yields (A1) with m=1.Case 2: c>0. For c>0, W

thas a lower bound of b−c, and hence the state space for W

tis a

at Belgorod State U

niversity on Decem



ownloaded from



790 R. A. D, W. T. M. D S. B. S

subset of [b−c,2). As in Case 1, we will take the measure n to have unit mass at {b−c}. LetC=[b−c, max (e, b+c)], where e>0. Then, for all xµC and Borel sets B containing b−c,

P(x, B)=pr (WtµB |W

t−1=x)�pr (W

t=b−c |W

t−1=x)

=pr (Yt−1=0 |W

t−1=x)=e−ex�e−emax(e,b+c))d

1,

P2(x, B)�pr (Wt+1=b−c, W

t=b−c |W

t−1=x)�d2

1.

On the other hand, if x1C, then x>max (e, b+c) and we have

P(x, C)=pr (WtµC |W

t−1=x)�pr (b−c∏W

t∏b+c |W

t−1=x)

=pr ( |Wt−b |∏c |W

t−1=x)

�1−c−2 var (Wt|Wt−1=x) (by Chebyshev’s inequality)

=1−c−2c2e−x�1−e−max (e,b+c))d2.

Therefore,

P2(x, B)=pr (WtµB |W

t−2=x)�pr (W

tµB, W

t−1µC |W

t−2=x)

= ∑yµCpr (W

tµB, W

t−1=y |W

t−2=x)= ∑

yµCpr (W

tµB |W

t−1=y) pr (W

t−1=y |W

t−2=x)

�d1pr (W

t−1µC |W

t−2=x)�d

1d2.

Thus, Doeblin’s condition is also satisfied for the case c>0.

For both cases the chain {Wt} is also strongly aperiodic since

P(b−c, b−c)=pr (Wt=b−c |W

t−1=b−c)=pr (Y

t−1=0 |W

t−1=b−c)=e−eb−c>0.

As remarked earlier, we conclude that {Wt} must be uniformly ergodic. %

R

C, M. J. (1994). Time series regression for counts: an investigation into the relationship betweensudden infant death syndrome and environmental temperature. J. R. Statist. Soc. A 157, 191–208.C, K. S. & L, J. (1995). Monte Carlo EM estimation for time series models involving counts.J. Am. Statist. Assoc. 90, 242–52.C, D. R. (1981). Statistical analysis of time series: some recent developments. Scand. J. Statist. 8, 93–115.D, R. A., D,W. T.M. &W, Y. (1999). Modelling time series of count data. In Asymptotics,Nonparametrics, and T ime Series, Ed. S. Ghosh, pp. 63–114. New York: Marcel Dekker.D, R. A., D, W. T. M. & W, Y. (2000). On autocorrelation in a Poisson regression model.Biometrika 87, 491–506.D, P. J., L, K-Y. & Z, S. L. (1994). Analysis of L ongitudinal Data. Oxford: OxfordUniversity Press.D, J. & K, S. J. (2000). Time series analysis of non-Gaussian observations based on state spacemodels from both classical and Bayesian perspectives (with Discussion). J. R. Statist. Soc. B 62, 3–56.F, L. & T, G. (1994). Multivariate Statistical Modeling Based on Generalised L inear Models.New York: Springer-Verlag.J, R. C. & L, R. (2001). Estimating time series models for count data using efficient importancesampling. Allgemeines Statist. Arch. 85, 387–407.

M, S. P. & T, R. L. (1994). Markov Chains and Stochastic Stability. New York: Springer-Verlag.S, J., Z, S., K, J., X, J. & K, L. (1988). Does weather confound or modify theassociation of particulate air pollution with mortality? An analysis of the Philadelphia data, 1973–1980.Envir. Res. A 77, 9–19.Z, S. L. (1988). A regression model for time series of counts. Biometrika 75, 621–9.Z, S. L. & Q, B. (1988). Markov regression models for time series: a quasi-likelihood approach.Biometrics 44, 1019–31.

[Received December 2001. Revised February 2003]

at Belgorod State U

niversity on Decem



ownloaded from



Observation-driven models for Poisson counts

Documents

Transcript of Observation-driven models for Poisson counts