NARIMA models for Network Time Series - UCL · Models for Network Time Series Initially focus on...
Transcript of NARIMA models for Network Time Series - UCL · Models for Network Time Series Initially focus on...
NARIMA models for Network Time Series
Guy NasonJoint work with M. Knight (York), K. Leeming (Bristol) & M. Nunes (Lancaster)
School of MathematicsUniversity of Bristol
Copyright 2017: University of Bristol Nason 1
Example 1. Network Time Series: Mumps Data
Weekly cases of mumps disease in UK at county level for 2005.
t = 1, . . . ,T = 52 weeks. Counties p = 1, . . . ,P = 47.
Multivariate time series of dimension 52× 47.
Questions:What can we say about the data? Trend? Models?Can we forecast early 2006?
Copyright 2017: University of Bristol Nason 2
Cases of Mumps 2005
Copyright 2017: University of Bristol Nason 3
Network Time Series
Can augment mumps multivariate series with a network (graph).
Possibilities:link with people movement (transport corridors)link by weather patterns (wind)link by geography
We used a minimal spanning tree augmented by close town links.
Some network links more important than others.
Concept of edge distance ≡ series connection.
Don’t always have distances.
Copyright 2017: University of Bristol Nason 4
Mumps network: connecting counties
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Bristol
Bedford
Reading
Aylesbury
Cambridge
Chester
Middlesborough
Truro
Carlisle
Derby
Exeter Dorchester
Durham
Lewes
Chelmsford
Gloucester
Manchester
Winchester
Worcester
Hertford
Kingston Upon Hull
Newport
Maidstone
Lancaster
Leicester
Lincoln
London
Liverpool
Norwich
York
Northampton
Morpeth
Nottingham
Oxford
Shrewsbury
Taunton
Sheffield
Stafford
Ipswich
Guildford
Newcastle
RhayaderWarwick
Birmingham
Chichester
Leeds
Devizes
Copyright 2017: University of Bristol Nason 5
Example 2. Foot and Mouth Epidemic (jittered)
Slide removed for legal reasons.
Copyright 2017: University of Bristol Nason 6
Models for Network Time Series
Initially focus on simple models.
Want to model dependence between value of node i at time t tonode i at earlier times;neighbours of node i at earlier times.
Want to cope with neighbours that drop in/out, change theirneighbourhood (“cow effect”) in dynamic networks.
Evolution, not revolution
Copyright 2017: University of Bristol Nason 7
Notation
Have set of nodes K = {1, . . . ,K}.
Nodes, i , j ∈ K, connected by an (undirected) edge denoted i ! j .
Edge set E = {(i , j) : i ! j ; i , j ,∈ K}.
Sometimes have distance set D = {d(i , j) : (i , j) ∈ E}.
Graph is written G = (K, E) or G = (K, E ,D).
Neighbourhood set. Let A ⊂ K. Then neighbourhood set of A is
N (A) = {j ∈ K/A : j ! i , i ∈ A}.
Copyright 2017: University of Bristol Nason 8
Rth stage neighbours
Define r th-stage neighbours of node i ∈ K by
N (r)(i) = N{N (r−1)(i)}/ ∪r−1q=1 N
(q)(i),
where N (1)(i) = N ({i}) and for r = 2,3, . . ..
r th-stage neighbours are all neighbours of r − 1th-stage neighbours,that are not already neighbours or node i .
Might be empty!
Also, define N (0)(i) = ∅, the empty set.
Copyright 2017: University of Bristol Nason 9
Network Time Series
Consider observations taken at network nodes at times t1, . . . , tT .
Initially, focus on tm = m ∈ N.
Have multivariate time series {Xi,t}Tt=1;i∈K.
A network time series is X = ({Xi,t}Tt=1,i∈K(G),G).
Sometimes, additionally have values on edges too.
Copyright 2017: University of Bristol Nason 10
Example: mumps series for neighbouring counties
0 10 20 30 40 50
010
2030
4050
Time (weeks)
Mum
ps C
ases
AvonSomerset
Copyright 2017: University of Bristol Nason 11
Cross-correlation analysis of Avon and Somerset
0 2 4 6 8 10 12 14
−0.
20.
20.
61.
0
Lag
AC
F
Avon
0 2 4 6 8 10 12 14
−0.
20.
20.
61.
0
Lag
Avon & Somerset
−14 −10 −8 −6 −4 −2 0
−0.
20.
20.
61.
0
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12 14
−0.
20.
20.
61.
0
Lag
Somerset
These plots mightsuggest ARMAstructure
Copyright 2017: University of Bristol Nason 12
Models for Network Time Series: NARIMA
Suppose X is network time series.
A network autoregressive process of order p and neighbourhood ordervector s of length p, denoted NAR(p, s), is given by
Xi,t =
p∑j=1
αjXi,t−j +
sj∑r=0
∑q∈N (r)(i)
βj,r ,qXq,t−j
+ εi,t , (1)
where {εi,t} are a set of mutually uncorrelated random variables withmean zero and variance of σ2.
Can get elaborate for larger p, and large sets of neighbours.
Copyright 2017: University of Bristol Nason 13
NAR (Integrated) Moving Average processes:NARIMA(p, s;d ,q).
Like NAR but:
extra∑q
`=1 η`εi,t−` moving average term (extend to cross-correlated).
or model Wi,t = ∇dXi,t , where ∇ is time-differencing operator
or, as here, use more general differencing like operator, D (below).
Copyright 2017: University of Bristol Nason 14
Remarks on NAR(IMA) processes
i th node value at t depends directly on past node i values via αj ;also depends on neighbours (and neighbours of neighbours) viaβj,r ,q in the past;temporal stationarity (as α, β do not depend on time);a kind of spatial homogeneity (as α, β do not depend on i);(although exact conditions need to be worked out for stationarity).
Copyright 2017: University of Bristol Nason 15
Examples
NARMA{p, (0, . . . ,0)p;q} a model consisting of K regular ARMA(p,q)processes, one for each node.
NAR(p, s) for fixed model equivalent to a vector autoregressive(VAR) model of order p with set of specific constraints onparameters.
Later we will use NARIMA models after pre-processing to removefirst-order spatial effects.
Interestingly, this also seems to reduce temporal correlation.
Copyright 2017: University of Bristol Nason 16
Inspirations
VAR models all variables and all cross-terms in: at lags. Often manyparameters, dimension reduction required.
SAR, CAR models + Network Markov-random fields. no timedependence, values of spatial locations influence valuesat other locations in same time period.
STCAR models, e.g. Mariella and Tarantino (2010). CAR model withtime dependence, but particular parametric forms formodel structure. Each time point a separate CAR model,ie. explicit spatial interactions at fixed time. Designed formany spatial locations and few time points? Identifiable?
susceptible-infected-recovered (SIR) network models usually countbased, three special stochastic processes/rates ofinfection, recovery etc.
Copyright 2017: University of Bristol Nason 17
Network Vector Autoregression (similar model)
Similar model, also called NAR,
Xi,t = β0 + Z Ti γ + β1n−1
i
K∑j=1
ai,jXj,t−1 + β2Xi,t−1 + εi,t , (2)
where ai,j = 1 iff i ! j or 0, else, (adjacency matrix), ni =∑
j 6=i ai,j .
Comparison with (1):1 βj,1,q(i) = n−1
i ai,q, sj = 1,2 only one-stage neighbours (more parameters . . . )3 weights depend strongly on adjacency matrix4 covariate Zi , but no IMA in ARIMA.
Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017) Network Vector Autoregression,Annals of Statistics, 45 1096–1123.
Copyright 2017: University of Bristol Nason 18
The “Cow Effect”
Nodes that appear or disappear, or both, repeatedly.
Tricky to handle in STCAR-type models: need reversible jump steps.
VAR models oblivious: no native concept of ‘neighbourhood’.
E.g. foot and mouth epidemic.
Cow herd begins as neighbours to other herds. Herd gets moved(isolated, destroyed by control measures, vaccinated). Herd gets soldand moved to be new neighbours to new herds (reappears).
Multivariate series not affected by movement, but place in topologychanges.
Multivariate series goes ragged if herds appear/disappear completely.
Copyright 2017: University of Bristol Nason 19
NAR(1, [1]) example
To explain key features and concepts.
Model is:Xi,t = αXi,t−1 +
∑q∈N (1)(i)
βqXq,t−1 + εi,t , (3)
for i = 1, . . . ,K . Can drop the j , r subscripts in this simpler model.
Network time series: important modelling choices required.
E.g. include distance information into specification of βq.
Copyright 2017: University of Bristol Nason 20
Inverse-distance weight model for {βq}
For example, define weights:
wj(i) = d(i , j)−1/∑
k∈N (i)
d(i , k)−1, (4)
for j ∈ N (i).
Then parametrise:βq = βwq(i), (5)
for q ∈ N (i) and β ∈ R.
Copyright 2017: University of Bristol Nason 21
More on weights
Weights can be time-dependent to take account of the cow effect.
This is a specific form of constrained VAR model.
Weight value depends heavily on neighbourhood, e.g.
Two sets of distances: 2,10,10,10,10 and 2,10,2,2,2.
Weights are 242 = 1
21 and 1042 = 5
21 in first setor218 = 1
9 and 1018 = 5
9 in second.
Points in ‘middle of nowhere’ get less weight.
Copyright 2017: University of Bristol Nason 22
Fitting the NAR(1, [1]) model
We can fit the model in R by
model1 <- nar(vts=mumpsPcor, net=townnet2)
which fits using least-squares (or ML, or Bayesian, or fiducial).
We use mumps rates (cases normalized by population).
Obtain: α̂ ≈ 0.682 and β̂ ≈ 0.263.
Note: statistics formally equivalent to VAR, conditioned on network.
Also, have a 51× 47 residual vector, ε which needs to be checked.
Copyright 2017: University of Bristol Nason 23
One view of residuals for model1
0 10 20 30 40 50
−20
020
40
Time
Res
idua
l for
Bed
ford
shire
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●
●●
●
●
●
●
●●
0 10 20 30 40 50
−20
020
40
TimeR
esid
ual f
or B
ucki
ngha
msh
ire
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●●
●●●
●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●●
0 10 20 30 40 50
−20
020
40
Time
Res
idua
l for
Cam
brid
gesh
ire
●●●●●
●
●
●●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●●●
●
●●●
●●●●●
●●●
●●●
●
●●●●
0 10 20 30 40 50
−20
020
40
Time
Res
idua
l for
Che
shire
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●●
●●
●
●
●
●●
●●
Variance notconstant over time.
So model2Yi,t = log(1 + Xi,t).
Givesα̂ ≈ 0.647,β̂ ≈ 0.330.
Copyright 2017: University of Bristol Nason 24
Bayes DLM posterior distribution of parameters.
0
50
100
150
0.3 0.4 0.5 0.6 0.7
0.3
0.4
0.5
0.6
0.7
alpha
beta
× is least squares
♦ is max. likelihood
Contour is BayesianDLM posteriorcomputed usingarms function fromdlm package.
Copyright 2017: University of Bristol Nason 25
model2 residuals
0 10 20 30 40 50
−1.
5−
0.5
0.5
1.5
Time
Res
idua
l for
Bed
ford
shire
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
0 10 20 30 40 50
−1.
5−
0.5
0.5
1.5
TimeR
esid
ual f
or B
ucki
ngha
msh
ire
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0 10 20 30 40 50
−1.
5−
0.5
0.5
1.5
Time
Res
idua
l for
Cam
brid
gesh
ire
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 10 20 30 40 50
−1.
5−
0.5
0.5
1.5
Time
Res
idua
l for
Che
shire
●
●●●
●
●●●
●
●
●●●●
●●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
Copyright 2017: University of Bristol Nason 26
model2 cross-acf residuals
0 2 4 6 8 10 12 14
−0.
50.
00.
51.
0
Lag
AC
F
Avon
0 2 4 6 8 10 12 14
−0.
50.
00.
51.
0
Lag
Avon & Somerset
−14 −10 −8 −6 −4 −2 0
−0.
50.
00.
51.
0
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12 14
−0.
50.
00.
51.
0
Lag
Somerset
Still some ARstructure
Model usingNAR(2, [1,0])
α̂1 ≈ 0.394α̂2 ≈ 0.381β̂ ≈ 0.204.
Copyright 2017: University of Bristol Nason 27
NAR(2, [1,0]) cross-acf residuals
0 2 4 6 8 10 12
−0.
40.
00.
40.
8
Lag
AC
F
Avon
0 2 4 6 8 10 12
−0.
40.
00.
40.
8
Lag
Avon & Somerset
−12 −10 −8 −6 −4 −2 0
−0.
40.
00.
40.
8
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12
−0.
40.
00.
40.
8
Lag
Somerset
Much better, nearlywhite noise.
How about addingfurther neighbours atlag 2, i.e.NAR(2, [1,1])?
Copyright 2017: University of Bristol Nason 28
Comparisons
Using BIC model selection
Model Mean SS # Parm MPENAR(2, [2, 0]) 0.353 4 38.5VAR(2, [1, 0]) 0.283 316 46.7
Separate AR(2) 0.347 94 52.1
MPE = Mean Prediction Error.
Copyright 2017: University of Bristol Nason 29
NAR/VAR critiqueVAR situation uses identical nodes to NAR in different way.
VARI.e. "oracle" pre-specification of nodes to VAR.Node a,b relationship asymmetric in VAR (Aa,b 6= Ab,a, unconstrained).
NARNAR has far fewer parameters.The parameter β is the same for (a,b) as (b,a) . . .. . . and, in fact, for any pair of stage-1 neighbours, e.g.
Additional partial symmetry in treatment of neighbour parameters.E.g. between a,b the distance d(a,b) = d(b,a).The weights per node, wa(b) 6= wb(a), in general.
However, weights often similar in similarly dense regions.
Copyright 2017: University of Bristol Nason 30
Trend Removal = “Network Differencing”
1
5
In regular time series we oftendifference to remove trend (left).
E.g. ∇Xt removes ‘linear trend’,
∇2Xt removes ‘quadratic’ trend,and so on.
Often (multivariate/network) trendin a network time series.
VITAL. Over space and time.
Copyright 2017: University of Bristol Nason 31
(Spatial) Trend Removal by Network Lifting
Use a recent method NetTree which is a kind of lifting transform.
NetTree is an example of ‘lifting one coefficient at a time’.
NetTree is a wavelet transform on a graph.
Turn (almost) every Xi,t into network ‘wavelet’ coefficient di,t .
Few Xi,t get turned into father coefficients (trend summary).
Wavelets are well-known for their ability to detrend (and decorrelate).
See Jansen, Nason and Silverman (2009).
Copyright 2017: University of Bristol Nason 32
NetTree at ONE time point.
Let value at node i be ci .
Identify network node, i , to turn into lifting (wavelet) coefficient.
Use inter-node distances d(i , j) for j ∈ N (i).
So, points further away have less weight.
Form lifting coefficient: di = ci −∑
j∈N (i) wj(i)cj .
Copyright 2017: University of Bristol Nason 33
NetTree 2. Simple Example
4
4
2
7
53
3
4
1
2
3
4
5
6
Copyright 2017: University of Bristol Nason 34
NetTree 3.
Considering 5 nodes labeled 1, 2, 3, 4 and 5.
Values c1 = 2, c2 = 3, c3 = 7, c4 = 5, c5 = 4.
Want to lift node 2 to form wavelet coefficient.
Inter-node distances: d(2,1) = 4, d(2,3) = 3, d(2,4) = 6, d(2,5) = 4.
Inverses: d(2,1)−1 = 14 , d(2,3)−1 = 1
3 , d(2,4)−1 = 16 , d(2,5)−1 = 1
4 .∑j∈N (i) d(i , j)−1 = 1
4 + 13 + 1
6 + 14 = 1.
So wj(i) = d(i , j)−1 in this case.
Copyright 2017: University of Bristol Nason 35
NetTree 4: Forming Lifted Coefficient
Neighbour set N (2) = {1,3,4,5}.
Formula is
d2 = c2 −∑
j∈N (2)
wj(2)cj
= 3− 14· 2− 1
3· 7− 1
6· 5− 1
4· 4
= −53.
Copyright 2017: University of Bristol Nason 36
NetTree 5. After Wavelet Coefficient Formed
−5/3
4
2
7
5
3
4
1
2
3
4
5
6
4
Copyright 2017: University of Bristol Nason 37
NetTree 6: Update Step and Relinkage
‘Power’ from the removed coefficient is redistributed to neighbours.
This keeps the ‘power’ constant over all locations.
We also need to remove the coefficient and relink the graph.
Clearly, if neighbour values are similar then lifting coefficient is small.
If they are the same then coefficient is zero
Lifting achieves good detrending.
Copyright 2017: University of Bristol Nason 38
NetTree 6. At End of Step
3.52
1
2
3
4
5
−5/3
1.59
6.63
4.58
Copyright 2017: University of Bristol Nason 39
NetTree: Whole Algorithm
Start with c1, . . . , cK : values at each node
1 Pick node with smallest ‘area’, call i .2 Form lifting coefficient at node i
Repeat steps 1, 2 until few Nc (typically 2) left.
Leaves Nc scaling function or ‘mean’ coefficients.
Copyright 2017: University of Bristol Nason 40
Mumps on Network: Week 1
52
9
15
9
1
38
9
31
17
21
82 8
35
0
1138
18
40
5
10
13
0
12
26
17
7
34
72
3
2
9
6
15
4
1
6
14
12
3
13
34
9716
31
12
34
1
Copyright 2017: University of Bristol Nason 41
LIFTING coefficients on Network: Week 1
37.4
1.7
1.9
−3.7
−22
27.8
−17.5
−51
−11.7
−13.5
42.1 −20.8
26.7
−12.5
−9.447.8
−16.8
33.3
−0.7
−8.5
6.9
−40
−12.9
−2
1.7
1.7
18.6
40.7
1.3
11.4
3.1
−20.9
−1.3
−6.7
−33.7
−39.3
3
−16.6
−2.4
3.2
11.2
68.3−0.9
19.8
−6.2
19.8
26.7
11.4
26.7
Copyright 2017: University of Bristol Nason 42
Cross-correlations BEFORE lifting
0 2 4 6 8 10 12 14
−0.
20.
20.
61.
0
Lag
AC
F
Avon
0 2 4 6 8 10 12 14
−0.
20.
20.
61.
0
Lag
Avon & Somerset
−14 −10 −8 −6 −4 −2 0
−0.
20.
20.
61.
0
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12 14
−0.
20.
20.
61.
0
Lag
Somerset
Copyright 2017: University of Bristol Nason 43
Cross-correlations AFTER lifting (EVERY time step)
0 2 4 6 8 10 12
−0.
20.
20.
61.
0
Lag
AC
F
Avon
0 2 4 6 8 10 12
−0.
20.
20.
61.
0
Lag
Avon & Somerset
−12 −10 −8 −6 −4 −2 0
−0.
20.
20.
61.
0
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12
−0.
20.
20.
61.
0
Lag
Somerset
Copyright 2017: University of Bristol Nason 44
Cross-correlations after temporal differencing
0 2 4 6 8 10 12 14
−0.
50.
00.
51.
0
Lag
AC
F
Avon
0 2 4 6 8 10 12 14
−0.
50.
00.
51.
0
Lag
Avon & Somerset
−14 −10 −8 −6 −4 −2 0
−0.
50.
00.
51.
0
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12 14
−0.
50.
00.
51.
0
Lag
Somerset
Copyright 2017: University of Bristol Nason 45
Significant Decorrelation
Do Lifting at each time step across the network
Get new time series:
wavelet coefficients series: K − Nc series;scaling coefficients series: Nc
Massive and Welcome Decorrelation: but due to trend?
Can be used to help improve forecasting (see Nunes, Knight andNason, 2015).
Copyright 2017: University of Bristol Nason 46
Trend Removal (week 6)
−50
0
50
100
−150 −100 −50 0 50 100
−100
−50
0
50
100
150
Devon
London
North Yorkshire
Wales
Mumps trend
−50
0
50
100
−150 −100 −50 0 50 100
−100
−50
0
50
100
150
Devon
London
North Yorkshire
Wales
Detrended post lifting
Copyright 2017: University of Bristol Nason 47
Trend Removal (week 6)
−100 −50 0 50 100 150
0.00
00.
005
0.01
00.
015
Data/Coefficient values
Den
sity
Density plot
Solid: mumps
Dashed: detrendedvalues.
Copyright 2017: University of Bristol Nason 48
Benefits of Detrending for Modelling
0 2 4 6 8 10 12 14
−0.
40.
00.
40.
8
Lag
AC
F
Avon
0 2 4 6 8 10 12 14
−0.
40.
00.
40.
8
Lag
Avon & Somerset
−14 −10 −8 −6 −4 −2 0
−0.
40.
00.
40.
8
Lag
AC
F
Somerset & Avon
0 2 4 6 8 10 12 14
−0.
40.
00.
40.
8
Lag
Somerset
Detrended model.
Residuals fromNAR(1,0) model.
Similar residuals toNAR(2, [1,0]) model.
Copyright 2017: University of Bristol Nason 49
Benefits of Detrending for Modelling
Proper detrending enables simpler stochastic models to be fitted.
We also get very useful information from trend.
Copyright 2017: University of Bristol Nason 50
Overall discussion
Huge potential for Network Time Series and models.Huge potential to exploit network structure.Vast array of theoretical questionsMake good use of what we already knowBuilt suite of network models and tools for R
Copyright 2017: University of Bristol Nason 51
Acknowledgements
Mumps data kindly supplied by Douglas Harding and DanielaDeAngelis of the UK Health Protection Agency.
Copyright 2017: University of Bristol Nason 52
References
Mariella, L. and Tarantino, M. (2010) Spatial Temporal Conditional Auto-regressiveModel: a New Autoregressive Matrix. Austrian Journal of Statistics, 39, 223–244.
Knight, M.I., Nunes, M.A. and Nason, G.P. (2016) Modelling, detrending anddecorrelation of network time series. arXiv:1603.0322v1.
Nunes, M.A., Knight, M.I. and Nason, G.P. (2015) Modelling and prediction of timeseries arising on a graph. in Modeling and Stochastic Learning for Forecasting in HighDimensions, Lecture Notes in Statistics, 217, Antoniadis, A., Poggi, J.-M. and Brossat,X. (eds), 183–192.
Jansen, M., Nason, G.P. & Silverman, B.W. (2009) Multiscale methods for data ongraphs & irregular multidimensional situations. J. Roy. Statist. Soc. Series B, 71,97–126.
Zhu, X., Pan, R., Li, G., Liu, Y. and Wang, H. (2017) Network Vector Autoregression,Annals of Statistics, 45 1096–1123.
Copyright 2017: University of Bristol Nason 53