Diffusion Convolutional Recurrent Neural Network Data ...yaguang/papers/dcrnn_slides.pdfYaguang Li...
Transcript of Diffusion Convolutional Recurrent Neural Network Data ...yaguang/papers/dcrnn_slides.pdfYaguang Li...
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Page 1
Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Yaguang Li
Joint work with
Rose Yu, Cyrus Shahabi, Yan Liu
5/30/2018
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Introduction
Page 25/30/2018
Traffic congesting is wasteful of time, money and energy – Traffic congestion costs Americans $124 billion+ direct/indirect loss in 2013.
Accurate traffic forecasting could substantially improve route planning and mitigate traffic congestion.
+ https://www.forbes.com/sites/federicoguerrini/2014/10/14/traffic-congestion-costs-americans-124-billion-a-year-report-says/* Image from LaLa Land: http://variety.com/2017/film/awards/la-la-land-hollywood-musical-trend-2017-1201981880/
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
7:00AM
The Problem – Existing Solutions Introduction
5/30/2018 Page 3
Route 1
Route 2
Destination
Route 2: Let’s see.
Route 1: Best route according to current traffic conditions
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
7:15AM
The Problem – Existing Solutions
Evolution of traffic over time
Predictive vs. Real-Time Path-PlanningIntroduction
Page 45/30/2018
Route 1
Route 2
Destination
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
7:30AM
The Problem – Existing Solutions
Stuck in traffic
Introduction
Page 55/30/2018
Route 1
Route 2
Destination
Traffic forecasting enables better route planning.
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Traffic Prediction
Page 65/30/2018
Input: road network and past T’ traffic speed observed at sensors
Output: traffic speed for the next T steps
7:00 AM 8:00 AM
Input: Observations Output: Predictions
...
... 8:10AM, 8:20AM, …, 9:00 AM
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Challenges for Traffic Prediction
Page 75/30/2018
ComplexSpatial Dependency
Spee
d (
mile
/h)
Non-linear, non-stationary Temporal Dynamics
Sensor 1 Sensor 2
Sensor 3
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Related Work
Page 85/30/2018
Traffic Prediction without spatial dependency modeling– Simulation and queuing theory [Drew 1968]
– Kalman Filter: [Okutani et al. TRB’83] [Wang et al. TRB’05]
– ARIMA: [Williams et al. TRB’98] [Pan et al. ICDM’12]
– Support Vector Regression (SVR): [Muller et al, ICANN' 97] [Wu et al. ITS ‘04]
– Gaussian process [Xie et al. TRB’10] [Zhou et al. SIGMOD’15]
– Recurrent neural networks and deep learning: [Lv et al ITS ’15] [Ma et al. TRC’15] [Li et al SDM’17]
Model each sensor independentlyFail to capture spatial correlation
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Related Work
Page 95/30/2018
Traffic Prediction with spatial dependency modeling– Vector ARIMA [Williams and Hoel JTE’03], [Chandra et al. ITS’09]
– Spatiotemporal ARIMA [Kamarianakis et al., TRB’03] [Min and Wynter, TRC’11]
– k-Nearest Neighbor [Li et al. ITS’12] [Rice et al. ITS’13]
– Latent Space Model [Deng et al.KDD’ 17]
– Convolutional Neural Network [Ma et al. ITS’17]
Either assume linear temporal dependencyor fail to capture the non-Euclidean spatial dependency
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Big picture
Page 105/30/2018
Model spatial dependency with proposed diffusion convolution.
Model temporal dependency with augmented recurrent neural network
* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018.
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Spatial Dependency Modeling
Page 115/30/2018
Model spatial dependency with Convolutional Neural Networks (CNN) – CNN extracts meaningful spatial patterns using filters.
– State-of-the-art results on image related tasks
Image
Convolutional Filter
* Y LeCun et al. Gradient-based learning applied to document recognition. Proc. IEEE 1998
+ Image from: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
CNN is only applicable to Euclidean grid graph.
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Spatial Dependency in Traffic Prediction
Page 125/30/2018
Spatial dependency among traffic flow
Sensor 1 Sensor 2
Sensor 3
Close in Euclidean space
Similartraffic speed
is non-Euclidean and directed
𝑑𝑖𝑠𝑡𝑛𝑒𝑡 𝑣𝑖 → 𝑣𝑗 ≠ 𝑑𝑖𝑠𝑡𝑛𝑒𝑡 𝑣𝑖 → 𝑣𝑗
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Spatial Dependency Modeling
Page 13
Model the network of traffic sensors, i.e., loop detectors, as a directed graph– Graph 𝓖 = (𝐕, 𝑨)
– Vertices 𝑽: o sensors
– Adjacency matrix 𝑨: → weight between vertices
5/30/2018
𝐴𝑖𝑗 = exp −distnet 𝑣𝑖 , 𝑣𝑗
2
𝜎2if distnet 𝑣𝑖 , 𝑣𝑗 ≤ 𝜅
distnet 𝑣𝑖 , 𝑣𝑗 : road network distance from 𝑣𝑖 to 𝑣𝑗 ,
𝜅: threshold to ensure sparsity, 𝜎2 variance of all pairwise road network distances
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Problem Statement
Page 145/30/2018
Graph signal: 𝑿𝐭 ∈ ℝ|𝑉|×𝑃, observation on 𝓖 at time 𝑡– 𝑽 : number of vertices
– 𝑃 : feature dimension of each vertex.
Problem Statement: Learn a function 𝑔(·) to map 𝑇′ historical graph signals to future 𝑇 graph signals
… …
𝑿𝑡−𝑇′+1 𝑿𝑡 𝑿𝑡+1 𝑿𝑡+𝑇
𝑔 .
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Generalize Convolution to Graph
Page 155/30/2018
Convolution as a weighted combination of neighborhood vertices.
MaxMin
Filter weight
ℎi𝑙+1 =
𝑗∈𝒩𝑖
𝑉
𝑤𝑖𝑗𝑙 ℎ𝑗
𝑙
𝒩𝑖: neighbor of vertex 𝑖
𝑤𝑖𝑗𝑙 : filter weight of vertex 𝑗
centered at vertex 𝑖 layer 𝑙
ℎ𝑗𝑙: feature of vertex 𝑗 in layer 𝑙
ℎ𝑗1 = 𝑋𝑗,:, i.e., the input.
Convolutional filter on graph centered at 𝑣6
Learning complexity is too high: 𝑂 𝑉 ⋅ |𝒩|
ℎ1𝑙
ℎ2𝑙
ℎ3𝑙
ℎ7𝑙 ℎ8
𝑙
ℎ9𝑙
ℎ10𝑙
ℎ5𝑙
ℎ6𝑙
ℎ4𝑙
𝑤6,4𝑙
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Generalize Convolution to Graph
Page 165/30/2018
Diffusion convolution filter: combination of diffusion processes with different steps on the graph.
MaxMin
Filter weight
= 𝜃0 + 𝜃1 + 𝜃2 + … + 𝜃𝐾
0 Step Diffusion
1 Step Diffusion
2 Step Diffusion
K Step Diffusion
Example diffusion filterCentered at
𝑿:,𝑝 ⋆𝒢 𝑓𝜃 =
𝑘=0
𝐾−1
𝜃𝑘 𝑫𝑶−𝟏𝑨
𝑘𝑿:,𝑝
Transition matrices of the diffusion process
Learning complexity: 𝑂 𝐾
⋆𝒢 : diffusion convolution, 𝐷𝑜: diagonal out-degree matrix.
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Generalize Convolution to Graph
Page 175/30/2018
Diffusion convolution filter: combination of diffusion processes with different steps on the graph.
MaxMin
Weight
= 𝜃0 + 𝜃1 + 𝜃2 + … + 𝜃𝐾
0 Step Diffusion
1 Step Diffusion
2 Step Diffusion
K Step Diffusion
Example diffusion filterCentered at
𝑿:,𝑝 ⋆𝒢 𝑓𝜃 =
𝑘=0
𝐾−1
𝜃𝑘,1 𝑫𝑶−𝟏𝑨
𝑘+ 𝜃𝑘,2 𝑫𝑰
−𝟏𝑨⊺ 𝑘𝑿:,𝑝
Dual directional diffusion to model upstream and downstream separately
⋆𝒢 : diffusion convolution, 𝐷𝑜: diagonal out-degree matrix, 𝐷𝐼: diagonal in-degree matrix
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Advantage of Diffusion Convolution
Page 185/30/2018
Efficient
– Learning complexity: 𝑂 𝐾
– Time complexity: 𝑂 𝐾 𝐸 , 𝐸 number of edges
Expressive– Many popular convolution operations, including the ChebNet [Defferrard et
al., NIPS ’16], can be represented using the diffusion convolution [Li et al. ICLR ‘18].
𝑿:,𝑝 ⋆𝒢 𝑓𝜃 =
𝑘=0
𝐾−1
𝜃𝑘,1 𝑫𝑶−𝟏𝑨
𝑘+ 𝜃𝑘,2 𝑫𝑰
−𝟏𝑨⊺ 𝑘𝑿:,𝑝
+ Defferrard, M., Bresson, X., Vandergheynst, P., Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, NIPS, 2016* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR, 2018
⋆𝒢 : diffusion convolution, 𝐷𝑜: diagonal out-degree matrix, 𝐷𝐼: diagonal in-degree matrix
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Big picture
Page 195/30/2018
Model spatial dependency with proposed diffusion convolution
Model temporal dependency with augmented recurrent neural network
* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018.
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Model Temporal Dynamics using Recurrent Neural Networks
Page 205/30/2018
Recurrent Neural Networks (RNN)– Non-linear, non-stationary auto-regression
– State-of-the-art performance in sequence modeling
Popular example of RNN– Long Short-Term Memory unit (LSTM)
– Gated Recurrent Unit (GRU)
RNN
x
ℎ ℎ1
RNN
ℎ2
RNN
ℎ3
RNN
x1 x3x2
Unroll
DCGRUGRUDiffusion
Convolution+
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Model Temporal Dynamics using Recurrent Neural Network
Page 21
Multi-step ahead prediction with RNN
5/30/2018
ොx4
DCGRU
ොx5
DCGRU
ොx6
DCGRU
Previous modeloutput is fed into the network
ොx4 ොx5
Error Propagation
ො𝑥
𝑥
Model prediction
Observation or ground truth
DCGRU
𝑥1
DCGRU
x2
DCGRU
𝑥3
Teach the model to deal with its own error.Current Time
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Improve Multi-step ahead Forecasting
Page 22
Traffic prediction as a sequence to sequence learning problem– Encoder-decoder framework
5/30/2018
DCGRU
𝑥1
DCGRU
x2
DCGRU
𝑥3
Encoder
ොx4
DCGRU
ොx5
DCGRU
ොx6
DCGRU
Decoder
<GO> 𝑥4 𝑥5
ො𝑥
𝑥
Model prediction
Observation or ground truth
Current Time
𝑥4 𝑥5 𝑥6
Backprop errors from multiple steps.
* Sutskever et al. Sequence to sequence learning with neural networks, NIPS 2014
Ground truth becomes unavailable in testing.
𝛿4 𝛿5 𝛿6
𝑥1, 𝑥2, 𝑥3 → 𝑥4
𝑥1, 𝑥2, 𝑥3 → 𝑥4, 𝑥5, 𝑥6
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Improve Multi-step ahead Forecasting
Page 23
Improve multi-step ahead forecasting with scheduled sampling
* Bengio,Samy, et al. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS 2015
5/30/2018
DCGRU
x1
DCGRU
x2
DCGRU
x3
ොx4
DCGRU
ොx5
DCGRU
ොx6
DCGRU
<GO>
x4ොx4 x5ොx5
Scheduled sampling: Choose to use the previous ground truth or model prediction by flipping a coin
ො𝑥
𝑥
Model prediction
Observation or ground truth
Encoder Decoder
Current Time
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Improve Multi-step ahead Forecasting
Page 24
Improve multi-step ahead forecasting with scheduled sampling– Curriculum learning: gradually enables the model to deal with its own error.
* Bengio,Samy, et al. Scheduled sampling for sequence prediction with recurrent neural networks. NIPS 2015
5/30/2018
ොxt+1
DCGRU
xtොx𝑡p
rob
abili
ty
# iterationො𝑥
𝑥
Model prediction
Observation or ground truth
Easy: Only feed ground truth
Hard: Only feed model prediction
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Diffusion Convolutional Recurrent Neural Network
Page 255/30/2018
Diffusion Convolutional Recurrent Neural Network (DCRNN)– Model spatial dependency with diffusion convolution
– Sequence to sequence learning with encoder-decoder framework
– Improve multi-step ahead forecasting with scheduled sampling
* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Experiments
Page 265/30/2018
Datasets
METR-LA:– 207 traffic sensors in Los Angeles
– 4 months in 2012
– 6.5M observations
PEMS-BAY:– 345 traffic sensors in Bay Area
– 6 months in 2017
– 17M observations
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Experiments
Page 275/30/2018
Baselines– Historical Average (HA)
– Autoregressive Integrated Moving Average (ARIMA)
– Support Vector Regression (SVR)
– Vector Auto-Regression (VAR)
– Feed forward Neural network (FNN)
– Fully connected LSTM with Sequence to Sequence framework (FC-LSTM)
Task
– Multi-step ahead traffic speed forecasting
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Experimental Results
Page 28
DCRNN achieves the best performance for all forecastinghorizons for both datasets
5/30/2018
1.00
2.00
3.00
4.00
5.00
6.00
7.00
15 Min 30 Min 1 Hour
Mea
n A
bso
lute
Err
or
(MA
E)
METR-LA
HA ARIMA VAR SVR FNN FC-LSTM DCRNN
1.00
1.50
2.00
2.50
3.00
3.50
15 Min 30 Min 1 Hour
Mea
n A
bso
lute
Err
or
(MA
E)
PEMS-BAY
HA ARIMA VAR SVR FNN FC-LSTM DCRNN
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Effects of Spatiotemporal Dependency Modeling
Page 295/30/2018
w/o temporal: removing sequence to sequence learning.
w/o spatial: remove the diffusion convolution.
1.5
2
2.5
3
3.5
4
4.5
5
15 Min 30 Min 1 Hour
Mea
n A
bso
lute
Err
or
(MA
E)
DCRNN w/o Temporal DCRNN w/o Spatial DCRNN
Removing either spatial or temporal modeling results in significantly worse results.
METR-LA
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Example: Prediction Results
Page 305/30/2018
DCRNN is more likely to accurately predict abrupt changes in the traffic speed than the best baseline method.
mile
/h
Example Prediction Results on METR-LA
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Example: Filter Visualization
Page 315/30/2018
Visualization of learned filters– Filters are localized around the center.
– Weights diffuse alongside the road network.
Max
Min
0
Learned filters centered at different verticices.center
METR-LA
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
Summary
Page 325/30/2018
Propose diffusion convolution to model the spatial dependency of traffic flow.
Propose Diffusion Convolutional Recurrent Neural Network (DCRNN) that captures both spatial and temporal dependencies.
DCRNN obtains consistent improvement over state-of-the-art baseline methods.
https://github.com/liyaguang/DCRNN https://arxiv.org/1707.01926
ICLR 2018
* Li, Yaguang et al. Diffusion Convolutional Recurrent Neural Network: Data-driven Traffic Forecasting, ICLR 2018.
AuthorAuthorYaguang Li (USC) Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Page 335/30/2018
Thank You!
Q & A