Diffusion Convolutional Recurrent Neural Network Data ...yaguang/papers/dcrnn_slides.pdfYaguang Li...

AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Yaguang Li

Joint work with

Rose Yu, Cyrus Shahabi, Yan Liu

5/30/2018


Introduction

5/30/2018

Traffic congesting is wasteful of time, money and energy – Traffic congestion costs Americans $124 billion+ direct/indirect loss in 2013.

Accurate traffic forecasting could substantially improve route planning and mitigate traffic congestion.

+ https://www.forbes.com/sites/federicoguerrini/2014/10/14/traffic-congestion-costs-americans-124-billion-a-year-report-says/* Image from LaLa Land: http://variety.com/2017/film/awards/la-la-land-hollywood-musical-trend-2017-1201981880/


7:00AM

The Problem – Existing Solutions Introduction

5/30/2018

Route 1

Route 2

Destination

Route 2: Let’s see.

Route 1: Best route according to current traffic conditions


7:15AM

The Problem – Existing Solutions

Evolution of traffic over time

Predictive vs. Real-Time Path-PlanningIntroduction

5/30/2018

Route 1

Route 2

Destination


7:30AM

The Problem – Existing Solutions

Stuck in traffic

Introduction

5/30/2018

Route 1

Route 2

Destination

Traffic forecasting enables better route planning.


Traffic Prediction

5/30/2018

Input: road network and past T’ traffic speed observed at sensors

Output: traffic speed for the next T steps

7:00 AM 8:00 AM

Input: Observations Output: Predictions

...

... 8:10AM, 8:20AM, …, 9:00 AM


Challenges for Traffic Prediction

5/30/2018

ComplexSpatial Dependency

Spee

d (

mile

/h)

Non-linear, non-stationary Temporal Dynamics

Sensor 1 Sensor 2

Sensor 3


Related Work

5/30/2018

Traffic Prediction without spatial dependency modeling– Simulation and queuing theory [Drew 1968]

– Kalman Filter: [Okutani et al. TRB’83] [Wang et al. TRB’05]

– ARIMA: [Williams et al. TRB’98] [Pan et al. ICDM’12]

– Support Vector Regression (SVR): [Muller et al, ICANN' 97] [Wu et al. ITS ‘04]

– Gaussian process [Xie et al. TRB’10] [Zhou et al. SIGMOD’15]

– Recurrent neural networks and deep learning: [Lv et al ITS ’15] [Ma et al. TRC’15] [Li et al SDM’17]

Model each sensor independentlyFail to capture spatial correlation


Related Work

5/30/2018

Traffic Prediction with spatial dependency modeling– Vector ARIMA [Williams and Hoel JTE’03], [Chandra et al. ITS’09]

– Spatiotemporal ARIMA [Kamarianakis et al., TRB’03] [Min and Wynter, TRC’11]

– k-Nearest Neighbor [Li et al. ITS’12] [Rice et al. ITS’13]

– Latent Space Model [Deng et al.KDD’ 17]

– Convolutional Neural Network [Ma et al. ITS’17]

Either assume linear temporal dependencyor fail to capture the non-Euclidean spatial dependency


Big picture

5/30/2018

Model spatial dependency with proposed diffusion convolution.

Model temporal dependency with augmented recurrent neural network

* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018.


Spatial Dependency Modeling

5/30/2018

Model spatial dependency with Convolutional Neural Networks (CNN) – CNN extracts meaningful spatial patterns using filters.

– State-of-the-art results on image related tasks

Image

Convolutional Filter

* Y LeCun et al. Gradient-based learning applied to document recognition. Proc. IEEE 1998

+ Image from: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

CNN is only applicable to Euclidean grid graph.


Spatial Dependency in Traffic Prediction

5/30/2018

Spatial dependency among traffic flow

Sensor 1 Sensor 2

Sensor 3

Close in Euclidean space

Similartraffic speed

is non-Euclidean and directed

𝑑𝑖𝑠𝑡𝑛𝑒𝑡 𝑣𝑖 → 𝑣𝑗 ≠ 𝑑𝑖𝑠𝑡𝑛𝑒𝑡 𝑣𝑖 → 𝑣𝑗


Spatial Dependency Modeling

Model the network of traffic sensors, i.e., loop detectors, as a directed graph– Graph 𝓖 = (𝐕, 𝑨)

– Vertices 𝑽: o sensors

– Adjacency matrix 𝑨: → weight between vertices

5/30/2018

𝐴𝑖𝑗 = exp −distnet 𝑣𝑖 , 𝑣𝑗

2

𝜎2if distnet 𝑣𝑖 , 𝑣𝑗 ≤ 𝜅

distnet 𝑣𝑖 , 𝑣𝑗 : road network distance from 𝑣𝑖 to 𝑣𝑗 ,

𝜅: threshold to ensure sparsity, 𝜎2 variance of all pairwise road network distances


Problem Statement

5/30/2018

Graph signal: 𝑿𝐭 ∈ ℝ|𝑉|×𝑃, observation on 𝓖 at time 𝑡– 𝑽 : number of vertices

– 𝑃 : feature dimension of each vertex.

Problem Statement: Learn a function 𝑔(·) to map 𝑇′ historical graph signals to future 𝑇 graph signals

… …

𝑿𝑡−𝑇′+1 𝑿𝑡 𝑿𝑡+1 𝑿𝑡+𝑇

𝑔 .


Generalize Convolution to Graph

5/30/2018

Convolution as a weighted combination of neighborhood vertices.

MaxMin

Filter weight

ℎi𝑙+1 =

𝑗∈𝒩𝑖

𝑉

𝑤𝑖𝑗𝑙 ℎ𝑗

𝑙

𝒩𝑖: neighbor of vertex 𝑖

𝑤𝑖𝑗𝑙 : filter weight of vertex 𝑗

centered at vertex 𝑖 layer 𝑙

ℎ𝑗𝑙: feature of vertex 𝑗 in layer 𝑙

ℎ𝑗1 = 𝑋𝑗,:, i.e., the input.

Convolutional filter on graph centered at 𝑣6

Learning complexity is too high: 𝑂 𝑉 ⋅ |𝒩|

ℎ1𝑙

ℎ2𝑙

ℎ3𝑙

ℎ7𝑙 ℎ8

𝑙

ℎ9𝑙

ℎ10𝑙

ℎ5𝑙

ℎ6𝑙

ℎ4𝑙

𝑤6,4𝑙



5/30/2018

Diffusion convolution filter: combination of diffusion processes with different steps on the graph.

MaxMin

Filter weight

= 𝜃0 + 𝜃1 + 𝜃2 + … + 𝜃𝐾

0 Step Diffusion

1 Step Diffusion

2 Step Diffusion

K Step Diffusion

Example diffusion filterCentered at

𝑿:,𝑝 ⋆𝒢 𝑓𝜃 =

𝑘=0

𝐾−1

𝜃𝑘 𝑫𝑶−𝟏𝑨

𝑘𝑿:,𝑝

Transition matrices of the diffusion process

Learning complexity: 𝑂 𝐾

⋆𝒢 : diffusion convolution, 𝐷𝑜: diagonal out-degree matrix.



5/30/2018

Diffusion convolution filter: combination of diffusion processes with different steps on the graph.

MaxMin

Weight

= 𝜃0 + 𝜃1 + 𝜃2 + … + 𝜃𝐾

0 Step Diffusion

1 Step Diffusion

2 Step Diffusion

K Step Diffusion

Example diffusion filterCentered at


𝑘=0

𝐾−1

𝜃𝑘,1 𝑫𝑶−𝟏𝑨

𝑘+ 𝜃𝑘,2 𝑫𝑰

−𝟏𝑨⊺ 𝑘𝑿:,𝑝

Dual directional diffusion to model upstream and downstream separately

⋆𝒢 : diffusion convolution, 𝐷𝑜: diagonal out-degree matrix, 𝐷𝐼: diagonal in-degree matrix


Advantage of Diffusion Convolution

5/30/2018

Efficient

– Learning complexity: 𝑂 𝐾

– Time complexity: 𝑂 𝐾 𝐸 , 𝐸 number of edges

Expressive– Many popular convolution operations, including the ChebNet [Defferrard et

al., NIPS ’16], can be represented using the diffusion convolution [Li et al. ICLR ‘18].


𝑘=0

𝐾−1

𝜃𝑘,1 𝑫𝑶−𝟏𝑨

𝑘+ 𝜃𝑘,2 𝑫𝑰

−𝟏𝑨⊺ 𝑘𝑿:,𝑝

+ Defferrard, M., Bresson, X., Vandergheynst, P., Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, NIPS, 2016* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR, 2018

⋆𝒢 : diffusion convolution, 𝐷𝑜: diagonal out-degree matrix, 𝐷𝐼: diagonal in-degree matrix


Big picture

5/30/2018

Model spatial dependency with proposed diffusion convolution

Model temporal dependency with augmented recurrent neural network



Model Temporal Dynamics using Recurrent Neural Networks

5/30/2018

Recurrent Neural Networks (RNN)– Non-linear, non-stationary auto-regression

– State-of-the-art performance in sequence modeling

Popular example of RNN– Long Short-Term Memory unit (LSTM)

– Gated Recurrent Unit (GRU)

RNN

x

ℎ ℎ1

RNN

ℎ2

RNN

ℎ3

RNN

x1 x3x2

Unroll

DCGRUGRUDiffusion

Convolution+


Model Temporal Dynamics using Recurrent Neural Network

Multi-step ahead prediction with RNN

5/30/2018

ොx4

DCGRU

ොx5

DCGRU

ොx6

DCGRU

Previous modeloutput is fed into the network

ොx4 ොx5

Error Propagation

ො𝑥

𝑥

Model prediction

Observation or ground truth

DCGRU

𝑥1

DCGRU

x2

DCGRU

𝑥3

Teach the model to deal with its own error.Current Time


Improve Multi-step ahead Forecasting

Traffic prediction as a sequence to sequence learning problem– Encoder-decoder framework

5/30/2018

DCGRU

𝑥1

DCGRU

x2

DCGRU

𝑥3

Encoder

ොx4

DCGRU

ොx5

DCGRU

ොx6

DCGRU

Decoder

<GO> 𝑥4 𝑥5

ො𝑥

𝑥

Model prediction


Current Time

𝑥4 𝑥5 𝑥6

Backprop errors from multiple steps.

* Sutskever et al. Sequence to sequence learning with neural networks, NIPS 2014

Ground truth becomes unavailable in testing.

𝛿4 𝛿5 𝛿6

𝑥1, 𝑥2, 𝑥3 → 𝑥4

𝑥1, 𝑥2, 𝑥3 → 𝑥4, 𝑥5, 𝑥6



Improve multi-step ahead forecasting with scheduled sampling

* Bengio,Samy, et al. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS 2015

5/30/2018

DCGRU

x1

DCGRU

x2

DCGRU

x3

ොx4

DCGRU

ොx5

DCGRU

ොx6

DCGRU

<GO>

x4ොx4 x5ොx5

Scheduled sampling: Choose to use the previous ground truth or model prediction by flipping a coin

ො𝑥

𝑥

Model prediction


Encoder Decoder

Current Time



Improve multi-step ahead forecasting with scheduled sampling– Curriculum learning: gradually enables the model to deal with its own error.

* Bengio,Samy, et al. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS 2015

5/30/2018

ොxt+1

DCGRU

xtොx𝑡p

rob

abili

ty

# iterationො𝑥

𝑥

Model prediction


Easy: Only feed ground truth

Hard: Only feed model prediction


Diffusion Convolutional Recurrent Neural Network

5/30/2018

Diffusion Convolutional Recurrent Neural Network (DCRNN)– Model spatial dependency with diffusion convolution

– Sequence to sequence learning with encoder-decoder framework

– Improve multi-step ahead forecasting with scheduled sampling

* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018


Experiments

5/30/2018

Datasets

METR-LA:– 207 traffic sensors in Los Angeles

– 4 months in 2012

– 6.5M observations

PEMS-BAY:– 345 traffic sensors in Bay Area

– 6 months in 2017

– 17M observations


Experiments

5/30/2018

Baselines– Historical Average (HA)

– Autoregressive Integrated Moving Average (ARIMA)

– Support Vector Regression (SVR)

– Vector Auto-Regression (VAR)

– Feed forward Neural network (FNN)

– Fully connected LSTM with Sequence to Sequence framework (FC-LSTM)

Task

– Multi-step ahead traffic speed forecasting


Experimental Results

DCRNN achieves the best performance for all forecastinghorizons for both datasets

5/30/2018

1.00

2.00

3.00

4.00

5.00

6.00

7.00

15 Min 30 Min 1 Hour

Mea

n A

bso

lute

Err

or

(MA

E)

METR-LA

HA ARIMA VAR SVR FNN FC-LSTM DCRNN

1.00

1.50

2.00

2.50

3.00

3.50


Mea

n A

bso

lute

Err

or

(MA

E)

PEMS-BAY

HA ARIMA VAR SVR FNN FC-LSTM DCRNN


Effects of Spatiotemporal Dependency Modeling

5/30/2018

w/o temporal: removing sequence to sequence learning.

w/o spatial: remove the diffusion convolution.

1.5

2

2.5

3

3.5

4

4.5

5


Mea

n A

bso

lute

Err

or

(MA

E)

DCRNN w/o Temporal DCRNN w/o Spatial DCRNN

Removing either spatial or temporal modeling results in significantly worse results.

METR-LA


Example: Prediction Results

5/30/2018

DCRNN is more likely to accurately predict abrupt changes in the traffic speed than the best baseline method.

mile

/h

Example Prediction Results on METR-LA


Example: Filter Visualization

5/30/2018

Visualization of learned filters– Filters are localized around the center.

– Weights diffuse alongside the road network.

Max

Min

0

Learned filters centered at different verticices.center

METR-LA


Summary

5/30/2018

Propose diffusion convolution to model the spatial dependency of traffic flow.

Propose Diffusion Convolutional Recurrent Neural Network (DCRNN) that captures both spatial and temporal dependencies.

DCRNN obtains consistent improvement over state-of-the-art baseline methods.

https://github.com/liyaguang/DCRNN https://arxiv.org/1707.01926

ICLR 2018


https://github.com/liyaguang/DCRNN

https://arxiv.org/1707.01926

AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting 5/30/2018

Thank You!

Q & A

Diffusion Convolutional Recurrent Neural Network Data ...yaguang/papers/dcrnn_slides.pdfYaguang Li...

Documents

Transcript of Diffusion Convolutional Recurrent Neural Network Data ...yaguang/papers/dcrnn_slides.pdfYaguang Li...