NARIMA models for Network Time Series - UCL · Models for Network Time Series Initially focus on...

NARIMA models for Network Time Series

Guy NasonJoint work with M. Knight (York), K. Leeming (Bristol) & M. Nunes (Lancaster)

School of MathematicsUniversity of Bristol

Copyright 2017: University of Bristol Nason 1

Example 1. Network Time Series: Mumps Data

Weekly cases of mumps disease in UK at county level for 2005.

t = 1, . . . ,T = 52 weeks. Counties p = 1, . . . ,P = 47.

Multivariate time series of dimension 52× 47.

Questions:What can we say about the data? Trend? Models?Can we forecast early 2006?


Cases of Mumps 2005


Network Time Series

Can augment mumps multivariate series with a network (graph).

Possibilities:link with people movement (transport corridors)link by weather patterns (wind)link by geography

We used a minimal spanning tree augmented by close town links.

Some network links more important than others.

Concept of edge distance ≡ series connection.

Don’t always have distances.


Mumps network: connecting counties

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Bristol

Bedford

Reading

Aylesbury

Cambridge

Chester

Middlesborough

Truro

Carlisle

Derby

Exeter Dorchester

Durham

Lewes

Chelmsford

Gloucester

Manchester

Winchester

Worcester

Hertford

Kingston Upon Hull

Newport

Maidstone

Lancaster

Leicester

Lincoln

London

Liverpool

Norwich

York

Northampton

Morpeth

Nottingham

Oxford

Shrewsbury

Taunton

Sheffield

Stafford

Ipswich

Guildford

Newcastle

RhayaderWarwick

Birmingham

Chichester

Leeds

Devizes


Example 2. Foot and Mouth Epidemic (jittered)

Slide removed for legal reasons.


Models for Network Time Series

Initially focus on simple models.

Want to model dependence between value of node i at time t tonode i at earlier times;neighbours of node i at earlier times.

Want to cope with neighbours that drop in/out, change theirneighbourhood (“cow effect”) in dynamic networks.

Evolution, not revolution


Notation

Have set of nodes K = {1, . . . ,K}.

Nodes, i , j ∈ K, connected by an (undirected) edge denoted i ! j .

Edge set E = {(i , j) : i ! j ; i , j ,∈ K}.

Sometimes have distance set D = {d(i , j) : (i , j) ∈ E}.

Graph is written G = (K, E) or G = (K, E ,D).

Neighbourhood set. Let A ⊂ K. Then neighbourhood set of A is

N (A) = {j ∈ K/A : j ! i , i ∈ A}.


Rth stage neighbours

Define r th-stage neighbours of node i ∈ K by

N (r)(i) = N{N (r−1)(i)}/ ∪r−1q=1 N

(q)(i),

where N (1)(i) = N ({i}) and for r = 2,3, . . ..

r th-stage neighbours are all neighbours of r − 1th-stage neighbours,that are not already neighbours or node i .

Might be empty!

Also, define N (0)(i) = ∅, the empty set.


Network Time Series

Consider observations taken at network nodes at times t1, . . . , tT .

Initially, focus on tm = m ∈ N.

Have multivariate time series {Xi,t}Tt=1;i∈K.

A network time series is X = ({Xi,t}Tt=1,i∈K(G),G).

Sometimes, additionally have values on edges too.


Example: mumps series for neighbouring counties

0 10 20 30 40 50

010

2030

4050

Time (weeks)

Mum

ps C

ases

AvonSomerset


Cross-correlation analysis of Avon and Somerset

0 2 4 6 8 10 12 14

−0.

20.

20.

61.

0

Lag

AC

F

Avon

0 2 4 6 8 10 12 14

−0.

20.

20.

61.

0

Lag

Avon & Somerset

−14 −10 −8 −6 −4 −2 0

−0.

20.

20.

61.

0

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12 14

−0.

20.

20.

61.

0

Lag

Somerset

These plots mightsuggest ARMAstructure


Models for Network Time Series: NARIMA

Suppose X is network time series.

A network autoregressive process of order p and neighbourhood ordervector s of length p, denoted NAR(p, s), is given by

Xi,t =

p∑j=1

αjXi,t−j +

sj∑r=0

∑q∈N (r)(i)

βj,r ,qXq,t−j

+ εi,t , (1)

where {εi,t} are a set of mutually uncorrelated random variables withmean zero and variance of σ2.

Can get elaborate for larger p, and large sets of neighbours.


NAR (Integrated) Moving Average processes:NARIMA(p, s;d ,q).

Like NAR but:

extra∑q

`=1 η`εi,t−` moving average term (extend to cross-correlated).

or model Wi,t = ∇dXi,t , where ∇ is time-differencing operator

or, as here, use more general differencing like operator, D (below).


Remarks on NAR(IMA) processes

i th node value at t depends directly on past node i values via αj ;also depends on neighbours (and neighbours of neighbours) viaβj,r ,q in the past;temporal stationarity (as α, β do not depend on time);a kind of spatial homogeneity (as α, β do not depend on i);(although exact conditions need to be worked out for stationarity).


Examples

NARMA{p, (0, . . . ,0)p;q} a model consisting of K regular ARMA(p,q)processes, one for each node.

NAR(p, s) for fixed model equivalent to a vector autoregressive(VAR) model of order p with set of specific constraints onparameters.

Later we will use NARIMA models after pre-processing to removefirst-order spatial effects.

Interestingly, this also seems to reduce temporal correlation.


Inspirations

VAR models all variables and all cross-terms in: at lags. Often manyparameters, dimension reduction required.

SAR, CAR models + Network Markov-random fields. no timedependence, values of spatial locations influence valuesat other locations in same time period.

STCAR models, e.g. Mariella and Tarantino (2010). CAR model withtime dependence, but particular parametric forms formodel structure. Each time point a separate CAR model,ie. explicit spatial interactions at fixed time. Designed formany spatial locations and few time points? Identifiable?

susceptible-infected-recovered (SIR) network models usually countbased, three special stochastic processes/rates ofinfection, recovery etc.


Network Vector Autoregression (similar model)

Similar model, also called NAR,

Xi,t = β0 + Z Ti γ + β1n−1

i

K∑j=1

ai,jXj,t−1 + β2Xi,t−1 + εi,t , (2)

where ai,j = 1 iff i ! j or 0, else, (adjacency matrix), ni =∑

j 6=i ai,j .

Comparison with (1):1 βj,1,q(i) = n−1

i ai,q, sj = 1,2 only one-stage neighbours (more parameters . . . )3 weights depend strongly on adjacency matrix4 covariate Zi , but no IMA in ARIMA.

Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017) Network Vector Autoregression,Annals of Statistics, 45 1096–1123.


The “Cow Effect”

Nodes that appear or disappear, or both, repeatedly.

Tricky to handle in STCAR-type models: need reversible jump steps.

VAR models oblivious: no native concept of ‘neighbourhood’.

E.g. foot and mouth epidemic.

Cow herd begins as neighbours to other herds. Herd gets moved(isolated, destroyed by control measures, vaccinated). Herd gets soldand moved to be new neighbours to new herds (reappears).

Multivariate series not affected by movement, but place in topologychanges.

Multivariate series goes ragged if herds appear/disappear completely.


NAR(1, [1]) example

To explain key features and concepts.

Model is:Xi,t = αXi,t−1 +

∑q∈N (1)(i)

βqXq,t−1 + εi,t , (3)

for i = 1, . . . ,K . Can drop the j , r subscripts in this simpler model.

Network time series: important modelling choices required.

E.g. include distance information into specification of βq.


Inverse-distance weight model for {βq}

For example, define weights:

wj(i) = d(i , j)−1/∑

k∈N (i)

d(i , k)−1, (4)

for j ∈ N (i).

Then parametrise:βq = βwq(i), (5)

for q ∈ N (i) and β ∈ R.


More on weights

Weights can be time-dependent to take account of the cow effect.

This is a specific form of constrained VAR model.

Weight value depends heavily on neighbourhood, e.g.

Two sets of distances: 2,10,10,10,10 and 2,10,2,2,2.

Weights are 242 = 1

21 and 1042 = 5

21 in first setor218 = 1

9 and 1018 = 5

9 in second.

Points in ‘middle of nowhere’ get less weight.


Fitting the NAR(1, [1]) model

We can fit the model in R by

model1 <- nar(vts=mumpsPcor, net=townnet2)

which fits using least-squares (or ML, or Bayesian, or fiducial).

We use mumps rates (cases normalized by population).

Obtain: α̂ ≈ 0.682 and β̂ ≈ 0.263.

Note: statistics formally equivalent to VAR, conditioned on network.

Also, have a 51× 47 residual vector, ε which needs to be checked.


One view of residuals for model1

0 10 20 30 40 50

−20

020

40

Time

Res

idua

l for

Bed

ford

shire

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●●

●

●●

●

●

●

●●●●

●●

●

●

●

●

●●

0 10 20 30 40 50

−20

020

40

TimeR

esid

ual f

or B

ucki

ngha

msh

ire

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●●●

●●●

●●●

●

●

●

●

●●●

●

●

●

●●●●

●

●

●●

0 10 20 30 40 50

−20

020

40

Time

Res

idua

l for

Cam

brid

gesh

ire

●●●●●

●

●

●●

●

●

●

●

●

●●

●●●

●

●

●●

●

●

●●

●

●●●

●

●●●

●●●●●

●●●

●●●

●

●●●●

0 10 20 30 40 50

−20

020

40

Time

Res

idua

l for

Che

shire

●

●●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●●●

●●

●

●

●

●●

●●

Variance notconstant over time.

So model2Yi,t = log(1 + Xi,t).

Givesα̂ ≈ 0.647,β̂ ≈ 0.330.


Bayes DLM posterior distribution of parameters.

0

50

100

150

0.3 0.4 0.5 0.6 0.7

0.3

0.4

0.5

0.6

0.7

alpha

beta

× is least squares

♦ is max. likelihood

Contour is BayesianDLM posteriorcomputed usingarms function fromdlm package.


model2 residuals

0 10 20 30 40 50

−1.

5−

0.5

0.5

1.5

Time

Res

idua

l for

Bed

ford

shire

●

●

●

●

●

●

●

●●●

●●

●

●

●●●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

0 10 20 30 40 50

−1.

5−

0.5

0.5

1.5

TimeR

esid

ual f

or B

ucki

ngha

msh

ire

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

0 10 20 30 40 50

−1.

5−

0.5

0.5

1.5

Time

Res

idua

l for

Cam

brid

gesh

ire

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

0 10 20 30 40 50

−1.

5−

0.5

0.5

1.5

Time

Res

idua

l for

Che

shire

●

●●●

●

●●●

●

●

●●●●

●●●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●


model2 cross-acf residuals

0 2 4 6 8 10 12 14

−0.

50.

00.

51.

0

Lag

AC

F

Avon

0 2 4 6 8 10 12 14

−0.

50.

00.

51.

0

Lag

Avon & Somerset

−14 −10 −8 −6 −4 −2 0

−0.

50.

00.

51.

0

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12 14

−0.

50.

00.

51.

0

Lag

Somerset

Still some ARstructure

Model usingNAR(2, [1,0])

α̂1 ≈ 0.394α̂2 ≈ 0.381β̂ ≈ 0.204.


NAR(2, [1,0]) cross-acf residuals

0 2 4 6 8 10 12

−0.

40.

00.

40.

8

Lag

AC

F

Avon

0 2 4 6 8 10 12

−0.

40.

00.

40.

8

Lag

Avon & Somerset

−12 −10 −8 −6 −4 −2 0

−0.

40.

00.

40.

8

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12

−0.

40.

00.

40.

8

Lag

Somerset

Much better, nearlywhite noise.

How about addingfurther neighbours atlag 2, i.e.NAR(2, [1,1])?


Comparisons

Using BIC model selection

Model Mean SS # Parm MPENAR(2, [2, 0]) 0.353 4 38.5VAR(2, [1, 0]) 0.283 316 46.7

Separate AR(2) 0.347 94 52.1

MPE = Mean Prediction Error.


NAR/VAR critiqueVAR situation uses identical nodes to NAR in different way.

VARI.e. "oracle" pre-specification of nodes to VAR.Node a,b relationship asymmetric in VAR (Aa,b 6= Ab,a, unconstrained).

NARNAR has far fewer parameters.The parameter β is the same for (a,b) as (b,a) . . .. . . and, in fact, for any pair of stage-1 neighbours, e.g.

Additional partial symmetry in treatment of neighbour parameters.E.g. between a,b the distance d(a,b) = d(b,a).The weights per node, wa(b) 6= wb(a), in general.

However, weights often similar in similarly dense regions.


Trend Removal = “Network Differencing”

1

5

In regular time series we oftendifference to remove trend (left).

E.g. ∇Xt removes ‘linear trend’,

∇2Xt removes ‘quadratic’ trend,and so on.

Often (multivariate/network) trendin a network time series.

VITAL. Over space and time.


(Spatial) Trend Removal by Network Lifting

Use a recent method NetTree which is a kind of lifting transform.

NetTree is an example of ‘lifting one coefficient at a time’.

NetTree is a wavelet transform on a graph.

Turn (almost) every Xi,t into network ‘wavelet’ coefficient di,t .

Few Xi,t get turned into father coefficients (trend summary).

Wavelets are well-known for their ability to detrend (and decorrelate).

See Jansen, Nason and Silverman (2009).


NetTree at ONE time point.

Let value at node i be ci .

Identify network node, i , to turn into lifting (wavelet) coefficient.

Use inter-node distances d(i , j) for j ∈ N (i).

So, points further away have less weight.

Form lifting coefficient: di = ci −∑

j∈N (i) wj(i)cj .


NetTree 2. Simple Example

4

4

2

7

53

3

4

1

2

3

4

5

6


NetTree 3.

Considering 5 nodes labeled 1, 2, 3, 4 and 5.

Values c1 = 2, c2 = 3, c3 = 7, c4 = 5, c5 = 4.

Want to lift node 2 to form wavelet coefficient.

Inter-node distances: d(2,1) = 4, d(2,3) = 3, d(2,4) = 6, d(2,5) = 4.

Inverses: d(2,1)−1 = 14 , d(2,3)−1 = 1

3 , d(2,4)−1 = 16 , d(2,5)−1 = 1

4 .∑j∈N (i) d(i , j)−1 = 1

4 + 13 + 1

6 + 14 = 1.

So wj(i) = d(i , j)−1 in this case.


NetTree 4: Forming Lifted Coefficient

Neighbour set N (2) = {1,3,4,5}.

Formula is

d2 = c2 −∑

j∈N (2)

wj(2)cj

= 3− 14· 2− 1

3· 7− 1

6· 5− 1

4· 4

= −53.


NetTree 5. After Wavelet Coefficient Formed

−5/3

4

2

7

5

3

4

1

2

3

4

5

6

4


NetTree 6: Update Step and Relinkage

‘Power’ from the removed coefficient is redistributed to neighbours.

This keeps the ‘power’ constant over all locations.

We also need to remove the coefficient and relink the graph.

Clearly, if neighbour values are similar then lifting coefficient is small.

If they are the same then coefficient is zero

Lifting achieves good detrending.


NetTree 6. At End of Step

3.52

1

2

3

4

5

−5/3

1.59

6.63

4.58


NetTree: Whole Algorithm

Start with c1, . . . , cK : values at each node

1 Pick node with smallest ‘area’, call i .2 Form lifting coefficient at node i

Repeat steps 1, 2 until few Nc (typically 2) left.

Leaves Nc scaling function or ‘mean’ coefficients.


Mumps on Network: Week 1

52

9

15

9

1

38

9

31

17

21

82 8

35

0

1138

18

40

5

10

13

0

12

26

17

7

34

72

3

2

9

6

15

4

1

6

14

12

3

13

34

9716

31

12

34

1


LIFTING coefficients on Network: Week 1

37.4

1.7

1.9

−3.7

−22

27.8

−17.5

−51

−11.7

−13.5

42.1 −20.8

26.7

−12.5

−9.447.8

−16.8

33.3

−0.7

−8.5

6.9

−40

−12.9

−2

1.7

1.7

18.6

40.7

1.3

11.4

3.1

−20.9

−1.3

−6.7

−33.7

−39.3

3

−16.6

−2.4

3.2

11.2

68.3−0.9

19.8

−6.2

19.8

26.7

11.4

26.7


Cross-correlations BEFORE lifting

0 2 4 6 8 10 12 14

−0.

20.

20.

61.

0

Lag

AC

F

Avon

0 2 4 6 8 10 12 14

−0.

20.

20.

61.

0

Lag

Avon & Somerset

−14 −10 −8 −6 −4 −2 0

−0.

20.

20.

61.

0

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12 14

−0.

20.

20.

61.

0

Lag

Somerset


Cross-correlations AFTER lifting (EVERY time step)

0 2 4 6 8 10 12

−0.

20.

20.

61.

0

Lag

AC

F

Avon

0 2 4 6 8 10 12

−0.

20.

20.

61.

0

Lag

Avon & Somerset

−12 −10 −8 −6 −4 −2 0

−0.

20.

20.

61.

0

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12

−0.

20.

20.

61.

0

Lag

Somerset


Cross-correlations after temporal differencing

0 2 4 6 8 10 12 14

−0.

50.

00.

51.

0

Lag

AC

F

Avon

0 2 4 6 8 10 12 14

−0.

50.

00.

51.

0

Lag

Avon & Somerset

−14 −10 −8 −6 −4 −2 0

−0.

50.

00.

51.

0

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12 14

−0.

50.

00.

51.

0

Lag

Somerset


Significant Decorrelation

Do Lifting at each time step across the network

Get new time series:

wavelet coefficients series: K − Nc series;scaling coefficients series: Nc

Massive and Welcome Decorrelation: but due to trend?

Can be used to help improve forecasting (see Nunes, Knight andNason, 2015).


Trend Removal (week 6)

−50

0

50

100

−150 −100 −50 0 50 100

−100

−50

0

50

100

150

Devon

London

North Yorkshire

Wales

Mumps trend

−50

0

50

100

−150 −100 −50 0 50 100

−100

−50

0

50

100

150

Devon

London

North Yorkshire

Wales

Detrended post lifting


Trend Removal (week 6)

−100 −50 0 50 100 150

0.00

00.

005

0.01

00.

015

Data/Coefficient values

Den

sity

Density plot

Solid: mumps

Dashed: detrendedvalues.


Benefits of Detrending for Modelling

0 2 4 6 8 10 12 14

−0.

40.

00.

40.

8

Lag

AC

F

Avon

0 2 4 6 8 10 12 14

−0.

40.

00.

40.

8

Lag

Avon & Somerset

−14 −10 −8 −6 −4 −2 0

−0.

40.

00.

40.

8

Lag

AC

F

Somerset & Avon

0 2 4 6 8 10 12 14

−0.

40.

00.

40.

8

Lag

Somerset

Detrended model.

Residuals fromNAR(1,0) model.

Similar residuals toNAR(2, [1,0]) model.


Benefits of Detrending for Modelling

Proper detrending enables simpler stochastic models to be fitted.

We also get very useful information from trend.


Overall discussion

Huge potential for Network Time Series and models.Huge potential to exploit network structure.Vast array of theoretical questionsMake good use of what we already knowBuilt suite of network models and tools for R


Acknowledgements

Mumps data kindly supplied by Douglas Harding and DanielaDeAngelis of the UK Health Protection Agency.


References

Mariella, L. and Tarantino, M. (2010) Spatial Temporal Conditional Auto-regressiveModel: a New Autoregressive Matrix. Austrian Journal of Statistics, 39, 223–244.

Knight, M.I., Nunes, M.A. and Nason, G.P. (2016) Modelling, detrending anddecorrelation of network time series. arXiv:1603.0322v1.

Nunes, M.A., Knight, M.I. and Nason, G.P. (2015) Modelling and prediction of timeseries arising on a graph. in Modeling and Stochastic Learning for Forecasting in HighDimensions, Lecture Notes in Statistics, 217, Antoniadis, A., Poggi, J.-M. and Brossat,X. (eds), 183–192.

Jansen, M., Nason, G.P. & Silverman, B.W. (2009) Multiscale methods for data ongraphs & irregular multidimensional situations. J. Roy. Statist. Soc. Series B, 71,97–126.

Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017) Network Vector Autoregression,Annals of Statistics, 45 1096–1123.


NARIMA models for Network Time Series - UCL · Models for Network Time Series Initially focus on...

Documents

Transcript of NARIMA models for Network Time Series - UCL · Models for Network Time Series Initially focus on...