arXiv:2002.02374v1 [stat.ML] 6 Feb 2020 · diagram. Deterministic tra c ow model usually consist of...
Transcript of arXiv:2002.02374v1 [stat.ML] 6 Feb 2020 · diagram. Deterministic tra c ow model usually consist of...
Macroscopic Traffic Flow Modeling with Physics Regularized GaussianProcess: A New Insight into Machine Learning Applications
Yun Yuana, Xianfeng Terry Yang*a,, Zhao Zhanga, Shandian Zheb
aDepartment of Civil & Environmental Engineering, University of Utah, Salt Lake City, UT 84112, USAbSchool of Computing, University of Utah, Salt Lake City, UT 84112, USA
Abstract
Despite the wide implementation of machine learning (ML) techniques in traffic flow modeling recently, those
data-driven approaches often fall short of accuracy in the cases with a small or noisy dataset. To address
this issue, this study presents a new modeling framework, named physics regularized machine learning
(PRML), to encode classical traffic flow models (referred as physical models) into the ML architecture and
to regularize the ML training process. More specifically, a stochastic physics regularized Gaussian process
(PRGP) model is developed and a Bayesian inference algorithm is used to estimate the mean and kernel
of the PRGP. A physical regularizer based on macroscopic traffic flow models is also developed to augment
the estimation via a shadow GP and an enhanced latent force model is used to encode physical knowledge
into stochastic processes. Based on the posterior regularization inference framework, an efficient stochastic
optimization algorithm is also developed to maximize the evidence lowerbound of the system likelihood.
To prove the effectiveness of the proposed model, this paper conducts empirical studies on a real-world
dataset which is collected from a stretch of I-15 freeway, Utah. Results show the new PRGP model can
outperform the previous compatible methods, such as calibrated pure physical models and pure machine
learning methods, in estimation precision and input robustness.
Keywords: macroscopic traffic flow model, physics regularized machine learning, multivariate Gaussian
process, posterior regularization inference
1. Introduction
Traffic state (i.e. flow, speed, and density) estimation (TSE) is the precursor of a variety of advanced
traffic operation tasks and plays a key role in traffic management. In early stages, macroscopic traffic
dynamics were found to be similar to hydrodynamics. By borrowing concepts from the fluid mechanism, flow,
speed, and density were defined and their relationship, named the fundamental diagram, was discovered.
Based on these definitions, macroscopic traffic flow models were developed based on the conservation law
and momentum and a set of kinematic wave models were also formulated (Seo et al., 2017). However, most
models, derived under ideal theoretical conditions, require great efforts for parameter calibrations and are
Email address: [email protected] (Xianfeng Terry Yang*)
Preprint submitted to Transportation Research Part B February 7, 2020
arX
iv:2
002.
0237
4v1
[st
at.M
L]
6 F
eb 2
020
difficult to work with noisy and fluctuated data collected by traffic sensors.
Then to capture the measurement errors, stochastic traffic flow models were developed for the investiga-
tion and explanation of a variety of observed traffic phenomena, which are also better suited for real-time
traffic state estimation and forecasting (Jabari et al., 2014). Since the deterministic prominent models
and their higher-order extensions are ill-posed, researchers developed stochastic traffic flow models in two
categories. The first category used stochastic extensions (Gazis and Knapp, 1971; Szeto and Gazis, 1972;
Gazis and Liu, 2003; Wang and Papageorgiou, 2005; Wang et al., 2007), which were performed by adding
Gaussian noises to the model expressions and obtained real-world data were used to quantify those noises.
However, Jabari and Liu (2012) pointed out that those simply-noised models could lead to the possibility of:
(i) causing negative sample paths and (ii) producing mean dynamics that do not coincide with the original
deterministic dynamics due to nonlinearity. The second category includes stochastic traffic models such as
Botlzmann-based models (Prigogine and Herman, 1971; Paveri-Fontana, 1975), Markovian queuing network
approaches (Davis and Kang, 1994; Kang, 1995; Di et al., 2010; Osorio et al., 2011; Jabari and Liu, 2012),
and cellular automaton based models (Nagel and Schreckenberg, 1992; Gray and Griffeath, 2001; Sopasakis
and Katsoulakis, 2006; Sopasakis, 2012). Stochastic traffic models do not have the same concerns of the
models in the first category. However, they may lose the analytical tractability (Jabari and Liu, 2013),
defined as the ability of obtaining a mathematical solution such as a closed-form expression, and are much
more similar to data-driven approaches than classical analytical models.
In view of the increasing data availability, many data-driven methods were developed because they do
not require explicit theoretical assumptions and have a remarkably low computational cost in the testing
phase. In the literature, data-driven approaches include autoregressive integrated moving average Zhong
et al. (2004), Bayesian network Ni and Leonard (2005), kernel regression (Yin et al., 2012), fuzzy c-means
clustering (Tang et al., 2015), k-nearest neighbors clustering (Tak et al., 2016), stochastic principal com-
ponent analysis (Li et al., 2013; Tan et al., 2014), Tucker decomposition (Tan et al., 2013), deep learning
(Duan et al., 2016; Polson and Sokolov, 2017b; Wu et al., 2018), Bayesian particle filter (Polson and Sokolov,
2017a), etc. However, due to the data-driven nature, those machine learning (ML) models fundamentally
suffers from three scenarios: (i) training data are scarce and insufficient to reveal the complexity of the
system, (ii) training data are noisy and include much incorrect/misleading information, and (iii) test data
are far from the training examples, i.e., extrapolation. In these scenarios which are unfortunately very com-
mon in the real-world, their performance can drop dramatically along with large and/or biased estimations.
Fig. 1a shows an example of applying a pure ML method on a dataset that contains flawed data and its
biased estimation (dash line) diverges from ML methods on accurate data (solid line). Moreover, another
deficiency of ML models is that they are developed as ”black boxes” and researchers are hard to interpret
model results.
In summary, classical traffic flow models can effectively characterize the underlying mechanisms (i.e.,
physical processes of traffic) of transportation systems, however, are usually developed with strong assump-
2
0 x
yFlawed data
Accurate data
ML with all data
ML without flaw data
(a) ML with flawed data
0 x
yFlawed data
Accurate data
PRML with all data
ML without flaw data
(b) PRML with flawed data
Figure 1: Comparison between pure ML and the proposed PRML
tions, require great efforts in parameter calibrations, and fall short of capturing data uncertainties. On
the other hand, the performances of pure data-driven approaches such as ML models highly depend on the
data quality and their results are usually hard to be interpreted. Hence, recognizing those limitations, this
research aims to develop an innovative approach, named physics regularized machine learning (PRML),
to fill the gap between classical traffic flow (physical) models and ML methods. The contributions of this
study are significant. Compared with physical models, the PRML can (1) use the ML portion to capture the
uncertainties in estimation which beyond the capability of the closed-form expressions; and (2) eliminate
the efforts in calibrating model parameter by a sequential learning process. Different from pure ML models,
the PRML is (1) more robust under the condition of the noisy/flawed dataset as valuable knowledge from
physical models can help regularize the fitting process (see Fig. 1b); and (2) more explainable in terms of
the model performance in estimation accuracy. With this innovative modeling framework, this research is
expected to bring a new insight into ML applications in transportation and build a bridge to connect the
researches of classical traffic flow models and more recent data-driven approaches.
More specifically, this study develops a physics regularized Guassian process (PRGP) method for TSE by
integrating three macroscopic traffic flows models with Gaussian process (GP), implementing a shadow GP
to regularize the original GP, and incorporating enhanced Latent Force Models (LFM) (Raissi et al., 2017)
to encode the traffic flow model knowledge. To learn the GPs from data efficiently, this study also proposes
an inference algorithm under the posterior regularization inference framework. To justify the effectiveness
of the proposed methods, numerical experiments with field data are conducted on a I-15 freeway segment
in Utah and the performances of PRGP models are compared with that of both classical traffic flow models
and pure ML models. To further investigate the robustness of PRGP, synthesized noises are also added to
the training set and results show PRGP is much more resilient to the noisy/flawed dataset.
The remainder of this paper is organized as follows. Section 2 reviews the existing studies regarding
the TSE modeling and estimation methods as well as the Gaussian process and inference methods. In
Section 3, the integrated GP and enhanced LFM for encoding physics knowledge into Bayesian statistics
and the posterior regularized inference algorithm are derived. In Section 4, the case study on a real-world
3
data from the interstate freeway I-15 is conducted to justify the proposed methods. The conclusion section
summarizes the the critical findings and future research directions.
2. Literature Review
2.1. Macroscopic Traffic flow model
To effectively control traffic flows, TSE has been recognized as a critical fundamental task of freeway
traffic management in the literature. TSE refers to estimating a complete traffic state based on limited
traffic measurement data from stationary sensors. Key parameters, i.e. traffic flow, speed, and density, of
the macroscopic traffic flow model are used to approximate the continuous traffic state with the fundamental
diagram. Deterministic traffic flow model usually consist of a conservation law equation and a fundamental
relationship (Seo et al., 2017). For formalization, key concepts, including cumulative flow, flow, density,
speed, are defined as follows.
Definition 1. The cumulative flow N(t, x) is defined as the number of vehicles that passed the position x
by the time t.
Definition 2. The flow q, density ρ, speed v are defined in Eqs. 1-3.
q(t, x) = ∂tN(t, x) (1)
ρ(t, x) = −∂xN(t, x) (2)
v(t, x) =q(t, x)
ρ(t, x)(3)
In traffic flow studies, researchers found the existence of the fundamental diagram (FD) to illustrate the
relationship among flow, speed and density:
Definition 3. The fundamental diagram is defined as the relationship among flow, speed, and density, as
shown in Eqs. 4-5.
v = V ρ (4)
q = ρV (ρ) (5)
where V (·) denotes the density-speed function. Macroscopic traffic flow models were proposed based on
continuum fluid approximation to describe the aggregated behavior of traffic, which can generally be clas-
sified into three basic formulations. The well-known first-order Lighthill-Whitham-Richards (LWR) model
(Lighthill and Whitham, 1955; Richards, 1956) is formulated in Eqs. 6-7.
∂tρ+ ∂x(ρv) = 0 (6)
v = V (ρ) (7)
4
The LWR model can describe simple behaviors, such as traffic jam and shockwave, however, has limitations
in reproducibility of more complex phenomena.
To overcome such limitations, second-order models use the additional momentum equation to describe
the dynamics of speed.For example, Payne-Whitham (PW) model (Payne, 1971; Whitham, 1975) is formu-
lated by Eqs. 8-9, in which Eq.9 is the momentum equation.
∂tρ+ ∂x(ρv) = 0 (8)
∂tv + v∂xv = −V − V (ρ)
τ0− c20ρ∂xρ (9)
where τ0 denotes the relaxation time and c20 denotes a parameter related to driver anticipation. Despite the
success of the PW model and its extensions (Papageorgiou et al., 1989), the PW-like models may produce
non-realistic outputs, such as negative speed (Del Castillo et al., 1994; Daganzo, 1995; Papageorgiou, 1998;
Hoogendoorn and Bovy, 2001).
To overcome this limitation, another second-order Aw-Rascle-Zhang (ARZ) model (Aw and Rascle,
2000; Zhang, 2002) is formulated in Eqs. 10-11, where another momentum equation is proposed in Eq. 11.
The original ARZ model was extended extensively in the literature (Colombo, 2003; Lebacque et al., 2007;
Blandin et al., 2013; Fan et al., 2013).
∂tρ+ ∂x(ρv) = 0 (10)
∂t(v − V (ρ) + v∂x(v − V (ρ)) = −v − V (ρ)
τ0(11)
However, it should be noted that despite of the elegance of differential equation formalization, the traffic
flow model is difficult to estimate due to the nonlinearity and the measure errors of observations in the
real world. Thus, the researchers proposed advanced estimation methods to facilitate the application of the
models.
2.2. Stochastic estimation methods
To use field data to capture traffic flow uncertainties, some estimation models with stochastic extensions
are later derivedSeo et al. (2017). For example, TSE is defined as Boundary Value Problem (BVP) based
on partial observations (i.e. boundary conditions) (Coifman, 2002; Laval et al., 2012; Kuwahara, 2015;
Blandin et al., 2013; Fan et al., 2013). In solving BVPs, the boundary conditions are assumed to be correct.
However, the real-world measure error can not be ignored.
Considering system and observation noise, data assimilation or inverse modeling techniques were then
developed for model estimation and calibration. In the literature, there exist three ways to add randomness
in the traffic models: (a) stochastic initial and boundary conditions, (b) stochastic source terms (e.g.
inflows), and (c) stochastic speed-density relationship or fundamental diagram (Sumalee et al., 2011). To
capture the measure error in data, a stochastic modeling method is performed by adding Gaussian noise to
the traffic state estimates Gazis and Knapp (1971); Szeto and Gazis (1972); Gazis and Liu (2003); Wang and
Papageorgiou (2005); Wang et al. (2007); Sumalee et al. (2011). For example, in view of the nonlinearity
5
of the second order traffic flow model, Gazis and Liu (2003); Wang and Papageorgiou (2005) assumed the
error terms on the formula and developed extended Kalman filter (EKF) to estimate a PW-like discrete
model (Papageorgiou et al., 1989).
Note that the applying EKF to non-differentiable models (e.g. Cell Transmission Model) is not rigorous
(Blandin et al., 2012). The unscented Kalman filter (UKF) overcomes the shortcomings of EKF by avoiding
an analytical differentiation (Mihaylova et al., 2006). The ensemble Kalman filter (EnKF) employs the
Monte Carlo simulation to handle nonlinear and nondifferentiable systems, but is computational costly
(Work et al., 2008). The particle filter (PF) uses Monte Carlo simulation and is computation-consuming
as well (Mihaylova and Boel, 2004). The simulation-based methods were further extended to reduce the
computational cost.
In summary, despite the adequate applications of these methods, the stochastic extension models have
two critical theoretical deficiencies: (a) negative sample paths and (b) the mean dynamics that do not
coincide with the original deterministic dynamics due to the nonlinearity (Jabari and Liu, 2012, 2013;
Jabari et al., 2014; Pascale et al., 2013; Wada et al., 2017). In view of such deficiencies, the intractable
methods, stochastic traffic flow models, were proposed in view of the tradeoff between relaxing assumptions
and the model tractability, such as (a) Botlzmann-based methods (Prigogine and Herman, 1971; Paveri-
Fontana, 1975), (b) Markovian queuing methods (Davis and Kang, 1994; Kang, 1995; Di et al., 2010; Osorio
et al., 2011; Jabari and Liu, 2012), (c) cellular automation based methods (Nagel and Schreckenberg, 1992;
Gray and Griffeath, 2001; Sopasakis and Katsoulakis, 2006; Sopasakis, 2012).
2.3. Data-driven method
More recently, with much enriched data, researchers started to seek data-driven methods, such as machine
learning, Bayesian statistics, etc. Among the existing data-driven methods, Gaussian process (GP) is a
powerful non-parametric function estimator and has various successful applications. In traffic modeling,
GP-based methods are applied in traffic speed imputation (Rodrigues and Pereira, 2018; Rodrigues et al.,
2018), public transport flows Neumann et al. (2009), traffic volume estimation and prediction Xie et al.
(2010), travel time prediction (Ide and Kato, 2009), driver velocity profiles (Armand et al., 2013) and traffic
congestion Liu et al. (2013). It can capture relationship between stochastic variables without requiring
strong assumptions (such as memorylessness).
However, as a data-driven approach, GPs can perform poorly when the training data are scarce and
insufficient to reflect the complexity of the system or testing inputs are far away from the training data. Few
traffic estimation methods were developed based on GPs because it’s difficult to obtain deductive insights
and leverage physics knowledge.
Taking advantage of valuable knowledge from physical models (i.e., classical traffic flow model), we aim
to encode them into GPs to improve their performance, especially when training on scarce data and marking
estimations in areas with flawed observations. However, it shall be noted that using GP to represent physical
knowledge, modeled by differential equations, has two major difficulties: (a) differential equations are hard
6
to represent as a probabilistic term, such as priors and likelihoods; (b) in practice, physics knowledge is
usually incomplete, the differential equations can include latent functions and parameters (e.g. unobserved
noise, inflows, outflows), making their presentations and joint estimation with GPs even more challenging.
To better encode the differential equations in GPs, Alvarez et al. (2009, 2013) proposed a Latent Force
Models (LFM) for training and then the estimation of GP would be based on the convolved kernel upon
Green’s function. Later on, Raissi et al. (2017) extended the framework by assuming observable noise.
However, the assumption of LFM is too restrictive since many realistic flexible equations are nonlinear, or
linear but do not have analytical Green’s function. Also, the complete kernel is still infeasible to obtain.
Thus, it is more feasible to use expressive kernels, e.g. deep kernels (Wilson et al., 2016).
In summary, there lack a hybrid framework to consider the physics knowledge (i.e. kinematic wave
differential equations and fundamental diagram) and the data-driven methods with minimal assumptions
and reasonable computational cost. This paper aims to fill the gap by proposing a Gaussian process based
data-driven method considering tractable physics knowledge.
2.4. Gaussian process and Bayesian inference
Gaussian process is a general framework for measuring of the similarity between observations from
training data to estimate the unobserved values. Rodrigues and Pereira (2018) and Rodrigues et al. (2018)
applied the multi-output Gaussian processes to model the complex spatiotemporal patterns about incom-
plete traffic speed data. The key task is to learn the kernel (i.e. covariance) function between the variables.
The previous studies (Calderhead et al., 2009; Barber and Wang, 2014; Heinonen et al., 2018) investigated
the GP ordinary differential derivatives. They assumed the noisy forces are observable, for example, the
observable noisy forces (Graepel, 2003), and observable noisy forces and solutions (Raissi et al., 2017).
To model the observable noisy forces, Latent Force Models (LFM) (Alvarez et al., 2009, 2013) first placed
a prior over the latent forces, and then derives the covariance of the solution function via the convolution
operation. Despite the successful applications, such as transcriptional regulation modeling (Lawrence et al.,
2007), the LFM method has two critical deficiencies: (a) it requires the linear differential equations and
the analytical Green’s functions, which is restrictive and does not fit the traffic flow model; and (b) the
convolution procedure is computationally difficult and restrictive.
To address these issues, this paper generalizes the LFM framework and enables the nonlinear differenti-
ation to encode the physics knowledge. To key task is to optimize model likelihood on data and a penalty
term that encodes the constraints over the posterior of the latent variables. Via the penalty term, the
domain knowledge or constraints outright to the posteriors rather than through the priors and a complex,
intermediate computing procedure, hence it can be more convenient and effective. In view of computational
efficiency, this paper further employs a posterior regularization algorithms to solve the likelihood optimiza-
tion problem (Ganchev and Das, 2013; Zhu et al., 2014; Libbrecht et al., 2015; Song et al., 2016). To the
best of authors knowledge, this new modeling framework is innovative and has not been developed by other
transportation studies yet. The proposed method is designed to avoid the error-prone simple stochastic
7
assumptions and leverage the physics knowledge in a data-driven framework, which also has remarkable
performance in scare data situations and unobserved inflow and outflows (e.g. an arterial stretch).
3. Methodology
3.1. Macroscopic traffic flow model with Physics Regularized Gaussian Process
3.1.1. Gaussian process
Suppose we aim to learn a machine f : Rd → Rd′ , it will map a d-dimensional Euclidean space to a
d′-dimensional Euclidean space from a training set D = (X,Y), where X = [x1, . . . ,xN ]ᵀ is the input
vector, Y = [y1, . . . ,yN ]ᵀ is the output vector, x is the d dimensional input vector, y is the d′ dimensional
output vector, f = [f(x1), . . . , f(xN )]ᵀ is the learning function, and N refers to the sample size. Note that
X,Y may have physical meanings only in their feasible domains.
Assumption 1. It is assumed that the input X and the true output f follow a multivariate Gaussian
distribution as shown in Eq. 12 , where N (·, ·) represents the Gaussian distribution, m denotes the mean
matrix, and K represents the covariance matrix.
p(f |X) = N (f |m,K) (12)
Note that Gaussian process in d-dimension is also called Gaussian Random Field and the above definition
involves the multi-dimensional outputs.
Assumption 2. It is assumed that the observations Y have an isotropic Gaussian noise, as shown in
Eq. 13.
p(Y|f) = N (f , τ−1I) (13)
where τ refers to the inverse variance, and isotropic noise means that the noise from each dimension is
independent identically distributed (i.i.d.) and of the same variance τ .
Then, by Marginalizing out f , we can obtain the marginal likelihood as shown in Eq. 14.
p(Y|X) = N (Y|0,K + τ−1I) (14)
where the kernel matrix K is defined in Eq. 15.
[K]ij = k(xi, xj) (15)
Commonly, Assumption 3, which requires the kernel has derivatives of all orders in its domain, would
also be necessary and the positive-definite kernels include linear, polynomial, radial-basis, Laplacian, etc.
(Fasshauer, 2011).
Assumption 3. The kernel k(·, ·) is assumed to be positive-definite and smooth.
8
Given the new input x∗, the f function value can be estimated based on Eq. 16.
p(f(x∗)|x∗,X,Y) = N (f(x∗)|µ(x∗), ν(x∗)) (16)
where the mean µ(x∗), standard deviation ν(x∗), and the kernel vector k∗ are calculated in Eqs. 17-19,
respectively.
µ(x∗) = kᵀ∗(K + τ−1I)−1Y (17)
ν(x∗) = k(x∗,x∗)− kᵀ∗(K + τ−1I)−1k∗ (18)
k∗ = [k(x∗, x1), . . . , k(x∗, xN )]ᵀ (19)
If the kernel K has been learned from data D, the estimated output matrix f(x∗) can be calculated
via the reparameterization (Kingma and Ba, 2014) as shown in Eqs. 20-21, where ε is standard normally
distributed.
ε = N (0, 1) (20)
f(x∗) = µ(x∗) + ε ∗√ν(x∗) (21)
Fig. 2 shows the structure of the conventional GP method, where the circled nodes denote the random
vector,s the shaded node represent known vectors, and the arrows indicate the conditional probabilities.
X x∗
Y f f∗
learning estimation
data
noise
GP GP
Figure 2: The conventional framework for inferring Gaussian process
3.1.2. Latent Force Model
In many applications, physics knowledge, expressed as differential equations, provide the insight of the
system’s mechanism and can be very useful for both estimation and prediction. In the seminal work of
Alvarez et al. (2009, 2013), they propose latent force models (LFM) that use convolution operations to
encode physics into GP kernels. They assume the differential equations are linear and have analytical
Green’s function with the kernel of the latent functions. Given this assumption, the kernel of the target
function can be derived by convolving the Green’s function with the kernel of the latent functions. LFM
considers W output functions f1(x), . . . , fw(x), . . . , fW (x), and assumes each output function fw is governed
by a linear differential equation.
L fw(x) = uw(x) (22)
where L is linear differential operator (Courant and Hilbert, 2008), and u is a latent force function.
9
Lemma 1. If one side of Eq. 22 is one GP, the other side is another GP. The covariance of a GP’s
derivative and the cross-covariance between the GP and its derivative can be obtained by taking derivatives
over the original covariance function.
Lemma 1 is proven by (Alvarez et al., 2009, 2013). The reasoning is based on that applying a linear
differential operator on one GP results in another GP (Graepel, 2003) because the derivative of GP is still
a GP (Williams and Rasmussen, 2006).
The latent force function u can be further decomposed as a linear combination of several common latent
force functions as follows.
uw(x) =
R∑r=1
srwgr(x) (23)
where R is the number of decomposed force functions, s is the latent matrix. Since L is linear, if we assign
a GP prior over u(x), fw(x) has a GP prior as well. Moreover, if the Green’s function, namely the solution
of Eq. 24, is available, we can obtain Eq. 25.
L G(x, s) = δ(s− x) (24)
where δ is the Dirac delta function, G is the Green’s function.
fw(x) =
∫G(x, s)ui(s)ds (25)
Hence, given the kernel for uw, we can derive the kernel for fw through a convolution operation which
is shown in Eq. 26.
kfw(x1,x2) =
∫∫G(x1, s1)G(x2, s2)kuw
(s1, s2)ds1ds2 (26)
To deal with the multiple outputs, we can place independent GP priors over common latent function
gr, then each uw and fw will obtain GP priors in turn. Via a similar convolution, we can derive the kernel
across different outputs (i.e. cross-covariance) kfw,fi′ . In this way, the physics knowledge in the Green’s
function are hybridized with the kernel for the latent forces. This procedure is used to learn the GP model
with an convolved kernel from the training data.
3.1.3. Augmented Latent Force Model
Despite the elegance and success of LFM, the precondition for using LFM might be too restrictive.
To enable the kernel convolution, LFM requires that the differential equations must be linear and have
analytical Green’s functions. However, many realistic differential equations from traffic flow models are
either nonlinear or linear but do not possess analytical Green’s functions, and therefore, cannot be exploited.
In some other cases, even with a tractable Green’s function, the complete kernel of all the input variables is
still infeasible to obtain. In order to obtain an analytical kernel after the convolution, we have to convolve
Green’s functions with smooth kernels. This may prevent us from integrating the physics knowledge into
more complex yet highly flexible kernels, such as deep kernel (Wilson et al., 2016). To handle the intractable
integral, we need to develop extra approximation methods, such as Monte-Carlo approximation.
10
Given the differential equation that describes the physics knowledge, the proposed augmented LFM
equation is formulated in Eq. 27.
Ψf(x) = g(x) (27)
where the differential operator Ψ can be linear, nonlinear, or numerical differential operator, g(·) represents
the unknown latent force functions, f(x) is the function to be estimated from data D. We aim to create a
generative component to regularize the original GP with a differential equation. Using Augmented LFM,
the differential equation is encoded to another GP, which is called a shadow GP. To yield the numerical
outputs, the kernel of the shadow GP should be efficiently learnable.
Theorem 1. If one side of Eq. 27 is one GP, the other side is another GP.
Proof. The reasoning is based on that applying a differential operator on one GP results in another GP. The
regularization is fulfilled via a valid generative model component rather than the process differentiation,
and hence can be applied to any linear or nonlinear differential operators. In view of the fact that the
resultant covariance and cross-covariance are not obvious via analytical derivatives, the expressive kernels
can be learned from data empirically.
The original LFM starts with the RHS (right-hand side) of Eq. 22, assigns it a GP prior and then use
the convolution operation to obtain the GP prior of the left-hand side (LHS) target function. Since the
convolution operation is an integration procedure, it can be more restrictive and challenging. In contrast,
our approach chooses a reverse direction, i.e. from LHS to RHS. We first sample the target function with an
expressive kernel, use differentiation operation to obtain the latent force, and then regularize it with another
GP prior. The differentiation operation is more flexible and convenient (Baydin et al., 2018), which does not
need to restrict the operator and GP kernels to ensure tractable computation. The computational challenge
can be overcomed by using auto-differential libraries (Baydin et al., 2018) and deep learning techniques
(e.g. deep kernel, Tensorflow, PyTorch). Therefore, the shadow GP can be efficiently learned from pseudo
observations via differential computations.
3.2. Physics regularized Gaussian process (PRGP)
Involving the shadow GP, the design concept of the proposed PRGP is illustrated in Fig. 3. To enable
X
Y
Z
f g
ω
Ψ
shadow GP
data GP
Figure 3: The proposed framework for physics regularized Gaussian process learning
11
Bayesian framework that incorporats the physics knowledge in Eq. 27, we introduce a set of m pseudo
observations, ω = [0, . . . , 0]ᵀ, to propose a generative component p(ω|X,Y) that acts as a physics knowledge
based regularizer on the GP model p(Y|X). To sample the pseudo observation ω, the input vector Z of the
length m is given as follow:
Z = [z1, . . . , zm]ᵀ (28)
Then, we sample the posterior function values at each zj , 1 ≤ j ≤ m as shown in Eq. 29.
p(f(zj)|zj ,X,Y) = N (f(zj)|µ(zj), ν(zj)) (29)
We apply the differentiation operator in Eq. 27 to obtain the latent function values at Z, g = [g(z1), . . . , g(zm)],
which is equivalent to sampling g(·) from the Green’s function in Eq. 30.
p(g|f) = δ(g −Ψf) (30)
Given the latent function values g, we sample the pseudo observations ω from another GP.
p(ω|g,Z) = N (ω|g, K) (31)
where K is the covariance matrix and each element is calculated from the kernel k(·, ·) in Eq. 32.
[K]ij = k(zi, zj) (32)
Considering the symmetry property of the Gaussian distribution shown in Eq. 33, the sampling of the pseudo
observations in essence is equivalent to placing another GP prior over the sampled latent force function g.
Therefore, this GP prior regularizes the sampled latent function. Through the differential operator Ψ, the
regularization propagates back to the target machine f(·).
p(ω|g, K) = p(g|ω, K) = p(Ψf |ω, K) (33)
Thus, the joint probability of the generative component is broken into four parts, as shown in Eq. 34.
p(ω,g, f ,Z|X,Y) = p(Z)p(f |Z,X,Y)p(g|f)p(ω|g) (34)
where the prior of the m input locations, p(Z), p(f ,Z,X,Y), and p(ω|g) are given by Eqs. 35-37, respectively.
Nals note that when no extra knowledge is available, zj can be uniformly distributed assumedly.
p(Z) = Πmj=1p(zj) (35)
p(f ,Z,X,Y) = Πmj=1[N (f(zj)|µ(zj), ν(zj))] (36)
p(ω|g) = N (ω|g, K) (37)
12
3.3. Posterior regularized inference algorithm
Posterior regularization is a powerful inference methodology in the Bayesian stochastic modeling frame-
work (Ganchev et al., 2010). The objective includes the model likelihood on data and a penalty term that
encodes the constrains over the posterior of the latent variables. Via the penalty term, we can incorpo-
rate our domain knowledge or constrains outright to the posteriors, rather than through the priors and
a complex, intermediate computing procedure. A variety of successful posterior regularization algorithms
have been proposed (He et al., 2013; Ganchev and Das, 2013; Zhu et al., 2014; Libbrecht et al., 2015; Song
et al., 2016). Hence it can be more convenient and effective. For efficient model inference, we marginalize
out all latent variables in the joint probability to avoid estimating extra approximate posteriors. Then we
derive a convenient evidence lower bound to enable the reparameterization. Using the reparameterization
and auto-differentiation libraries, we develop an efficient stochastic optimization algorithm based on the
posterior regularization inference framework (Ganchev et al., 2010).
The proposed inference algorithm is derived as follows. The generative component in Eq. 34 is bind to
the original GP in Eq. 14 to obtain a new principled Bayesian model. The joint probability is given by
Eq. 38.
p(Y, ω,g, f ,Z|X) = p(Y|X)p(ω,g, f ,Z|X,Y) (38)
We first marginalize out all the latent variables in the generative component to avoid approximating their
posterior in Eq. 39.
p(ω|X,Y) =
∫∫∫[p(ω,g, f ,Z|X,Y)dZdgdf ]
=
∫∫[p(Z)p(f |Z,X,Y)p(ω|Ψf , K)dZdf ]
=
∫∫[p(Z)p(f |Z,X,Y)N (ω|Ψf , K)dZdf ]
= Ep(Z)Ep(f |Z,X,Y)N (Ψf |0, K)
(39)
The parameter γ ≥ 0 is used to control the strength of regularization effect.
p(Y, ω|X) = p(Y|X)p(ω|X,Y)γ (40)
The objective is to maximize the log-likelihood in Eq. 41.
log[p(Y, ω|X)] = log[p(Y|X)] + γ log[p(ω|X,Y)]
= log[(N (Y|0, K + τ−1I))]
+ γ log[Ep(Z)Ep(f |Z,X,Y)[N (Ψf |0, K)]]
(41)
However, the log-likelihood is intractable due to the expectation inside the logarithm term. To address
this problem, the Jensen’s inequality is used to obtain an evidence lower bound L in Eq. 42.
log[p(Y, ω|X)] ≥ L = log[N (Y|ω, K + τ−1I)]
+ γEp(z)Ep(f |Z,X,Y)[log[N (Ψf |ω, K)]](42)
13
The existence of the general evidence lowerbound (ELBO) of a posterior distribution is proved with
analyzing a decomposition of the Kullback-Leibler (KL) divergence by Bishop (2006). Thus, we can obtain
the ELBO of the log-likelihood in Eq. 42. However, the ELBO is still intractable due to the non-analytical
expectation term. In view of the expectation is out of the logarithm, we can maximize L via stochastic
optimization shown in Alg. 1.
Algorithm 1: The stochastic inference algorithm
Result: Learned kernel parameters
1 Initialization;
2 while not reach stopping criteria do
3 Sample a set of input locations Z;
4 Estimate the mean µ and the variance ν of f in Eqs. 17-18;
5 Generate a parameterized sample of the posterior target function values f by the
reparameterization in Eqs. 20-21;
6 Substitute the parameterized samples f to obtain the unbiased estimated ELBO L in Eq. 42;
7 Calculate ∇θL, an unbiased stochastic gradient of L via the auto-differential technique;
8 Update the parameters θ via the gradient decent shown in Eq. 43 ;
9 end
θt+1 = θt + α∇θL (43)
where α refers to the learning rate and θ denotes all trainable parameters.
To prove the correctness of Alg. 1, we need to prove the correctness of employing a regularization via
ELBO as follows.
Theorem 2. Maximizing the lowerbound of the log-likelihood is equivalent to a soft constraint over the
posterior of the target function in the original GP.
Proof. While the proposed inference algorithm is developed for a hybrid model rather than pure GP
(Ganchev et al., 2010), the evidence lower bound optimized by Alg. 1 is a typical posterior regulariza-
tion objective that estimates a pure GP model and meanwhile penalizes the posterior of the target function
to encourage a consistency with the differential equations. Jointly maximizing the term
Ep(z)Ep(f |Z,X,Y)[log[N (Ψf |ω, K)]]
in the lowerbound of the log-likelihood L encourages all the possible latent force functions that are obtained
from the target function f(·) via the differential operator Ψ should be considered as being sampled from
the same shadow GP. This can be viewed as a soft constraint over the posterior of the target function in
the original GP model. Therefore, while being developed for inference of a hybrid model, the algorithm
is equivalent to estimating the original GP model with some soft constraints on its posterior distribution.
Thus, the physics knowledge regularizes the learning of the target function in the original GP.
14
To apply the proposed method with multiple differential equations (i.e. FD, conservation law, momen-
tum), Fig. 4 shows the multi-equation multi-output framework of applying the proposed method to model
the stochastic traffic flow process.
X
Y
Z
f1
g1
. . . fd′
gw
ω
[(x, t)i]2×N
[(q, ρ, v)i]3×N
[(x, t)j ]2×m
[(q, ρ, v)j ]3×m
[g1,j ]1×m [gw,j ]1×m
[[0, . . . , 0]ᵀ]1×m
Ψf1(q, ρ, v) Ψfd′(q, ρ, v)
shadow GPs
data
GP1(K1) GPd′(Kd′)
Figure 4: The proposed framework for multi-output multi-equation PRGP learning
The log-likelihood and the ELBO of the traffic flow model can be formulated in Eq. 44.
log[p(Y, ω|X)] ≥ L =
d′∑i=1
log[N ([Y]i|ω, Ki + τ−1I)]
+
W∑w=1
γwEp(z)Ep(fw|Z,X,Y)[log[N (Ψfw|ω, Kw)]]
(44)
3.3.1. Expressive kernels
The expressive kernels are defined as the non-parametric smooth covariance functions, such as the
well-known Squared Exponential Automatic Relevance Determination (SEARD) Kernel, and Radial Basis
Function (RBF) kernel (Bishop, 2006), and deep kernels (Wilson et al., 2016). The employed kernel functions
are shown as follows:
The SE-ARD kernel is formulated in Eq. 45.
k(xi,xj) = σ2 exp(−(xi − xj)ᵀdiag(η(xi − xj))) (45)
where diag(·) represents the diagonal matrix, σ and η are kernel parameters.
The RBF kernel is formulated in Eq. 46.
k(xi,xj) = exp(− (||xi − xj ||)2
2σ2) (46)
where σ is the kernel parameter.
3.3.2. Algorithm complexity
The time complexity of the inference of the original GP is O(N3). The time complexity of the inference
of the shadow GP is O(m3). Thus, the total time complexity for the inference of two GPs is O((Nd′)3+m3).
15
To store the kernel metrics of original GP and the shadow GP, the space complexity is O((Nd′)2 +m2). In
the testing phase, the time complexity of the model estimation is marginal (less than 1 ms) empirically.
3.4. Physics regularized traffic state estimation
To apply the proposed method, the traffic flow models need to be converted to the form of Eq. 27. In
this study, we aims to encode three classical traffic flow models, LWR, PW, and ARZ, into the GP and
compare their performance under the framework of PRGP. More specifically, the converted LWR, PW, ARZ
models are presented as follows. In PRGP, the stochastic conservation law of LWR is formulated in Eq. 47.
Ψf1(q, ρ, v) = ∂tρ+ ∂xq = g1 (47)
The stochastic PW model is formulated in Eqs. 48-49.
Ψf1(q, ρ, v) = ∂tρ+ ∂x(ρv) = g1 (48)
Ψf2(q, ρ, v) = ∂tv + v∂xv +V − V (ρ)
τ0+c20ρ∂xρ = g2 (49)
And the stochastic ARZ model is formulated in Eqs. 50-51.
Ψf1(q, ρ, v) = ∂tρ+ ∂x(ρv) = g1 (50)
Ψf2(q, ρ, v) = ∂t(v − V (ρ) + v∂x(v − V (ρ)) +v − V (ρ)
τ0= g2 (51)
4. Numerical Tests with Field Data
4.1. Case setting
To evaluate the performance of the proposed PRML framework, We applied the three PRGP models to
estimate the traffic flow in a stretch of the interstate freeway I-15 across Utah, U.S. The Utah Department
of Transportation (UDOT) has installed sensors every a few miles along the freeway. Each sensor counts the
number of vehicles passed every minute, measures the speed of each vehicle, and sends the data back to a
central database, named Performance Measurement System (PeMS). The collected real-time data and road
conditions are available online and can be accessed by the public. For model evaluations, the data, from
August 5, 2019 to August 11, 2019, were collected by four sensors on the I-15, Utah. The input variables
include the location coordinates of each sensor and the time of each read. The studied stretch is illustrated
in Fig. 5, where the yellow line indicates the studied freeway segments and the blue bars represent the
locations of traffic detectors. In the case, the data is shuffled and randomly split into the training set and
testing set separately.
16
Figure 5: The stretch of the studied freeway segment which includes four detectors
4.2. Implementation
The deep kernel can be in any neural network structure, such as the feed-forward neural network, and
can be fine-tuned to achieve better empirical results. Incorporating the SEARD and RBF kernels, the
compound kernel of the d′-dimensional original GP and the W -dimensional shadow GP are computed in
Fig. 6. The procedure for estimating the target traffic state q, v of any given input x, t is illustrated in
Fig. 7. In the multi-output multi-equation PRGP, the d′-dimension means for each dimension of y creating
one compound kernel, and the W -dimension means for each differential equation creating one compound
kernel. Note that the structure of the GPs can be fine-tuned to achieve better empirical performance.
መ𝐟
Ψመ𝐟
𝐂𝑤
𝐗 (input data) Y (output data)
ARD Kernel
𝐒 = |𝐊 + 𝜏−1𝐈|
− log 𝑝 𝐘𝑖 𝐗 = 0.5log 𝐒 + 0.5𝐘𝑖T𝐒−𝟏𝐘𝑖
𝐊
Min ℒ = −σ𝑖 log 𝑝 𝐘𝑖 𝐗 −σ𝑖 log[𝑝(𝛚|𝐗, 𝐘𝑖)]
Z (randomized input)
− log 𝑝 𝛚 𝐗, 𝐘𝑖 ∝
𝑤
𝛾𝑤[0.5 log 𝐂 + 0.5𝐠𝑤𝑇 𝐂𝒘
−1𝐠𝑤]
Predict መ𝐟
Physics equations
RBF Kernel
𝐠𝒘
Figure 6: The structure of the proposed loss function
17
𝐗 (input data)
ARD Kernel
𝐊
𝐒 = |𝐊 + 𝜏−1𝐈|
𝝁 = 𝐊𝐒−𝟏𝐘
𝝈 = 𝛽∑𝐊 𝐒−𝟏𝐊𝐓 −𝟏
Y (output data)
𝜖 = 𝒩(0,1)
𝜖(random number)
መ𝐟 = 𝝁 + 𝝈𝝐
Figure 7: The structure of estimation
In the experiments, the parameters of the proposed method are set as follows: (a) the number of pseudo
observations m = 10, (b) the strength of regularization λ is fine-tuned numerically. The proposed inference
algorithm is implemented in the Tensorflow framework, where the optimizer ADAM (Kingma and Ba, 2014)
is chosen for updating the parameters.
4.3. Results Analysis
4.3.1. Comparison with Pure Machine Learning Models
To prove the superiority of the proposed PRML framework compared with pure ML models, this sub-
section aims to compare the three PRGP models, LWR-PRGP, PW-PRGP, and ARZ-PRGP, with pure
GP and other popular ML models such as multilayer perceptron, support vector machine, and random
forest (Bishop, 2006). Also recall that one main contribution of PRML is that it is more explainable in
terms of model performance. Hence, this study further adopts another physical model, the well-known heat
equation, to prove the indispensability of classical traffic flow models in the PRML framework, since the
heat equation is not suitable to model traffic flows. The heat equation is formulated in Eq. 52.
∂fh(x, t)
∂t= β1∇2fh(x, t) (52)
Note that the inputs of the proposed PRGP-based methods and classical traffic flow models are different.
The latter method often requires the on-ramp and off-ramp flow observations as inputs, while the proposed
method assumes unobserved on-ramp and off-ramp flows in the framework and does not require such data.
The training process of each model, with 500 iterations and 2, 880 samples, costs 10, 480 seconds in average
on a workstation equipped with a 3.5GHz 6-core CPU. In the testing phase, the time complexity of the
model estimation is marginal (less than 1 second) empirically, similar to all ML models. Note that the
computational process can be accelerated by about 5 time if a NVIDIA CUDA-capable GPU is used.
18
Figs 8-9 compare the flow and speed estimations with the ground truth in the studied case. If the coefficient
of the trend line is close to 1 and the intercept is close to 0, the estimation will be considered as accurate.
The results show that both pure GP and proposed PRGP models can perform well in estimating the flows
and speeds.
(a) GP (b) LWR-PRGP
(c) PW-PRGP (d) ARZ-PRGP
Figure 8: Comparison between flow estimation by GP and PRGPs and the ground truth
To quantify the precision of outputs, Rooted Mean Squared Error (RMSE) and Mean Absolute Percent-
age Error (MAPE) of each dimension are used as the performance metric, which are defined in Eqs. 53-54.
RMSEj =
√√√√ 1
N
N∑i=1
( [yj ]i − [fj ]iσi
)2,∀j ∈ 1, . . . , d′ (53)
MAPEj =100%
N
N∑i=1
∣∣∣ [yj ]i − [fj ]i[yj ]i
∣∣∣,∀j ∈ 1, . . . , d′ (54)
Table 1 summarizes of the results of the comparable baselines and the proposed method in the same
dataset. Among the four pure ML models, the GP can obviously outperform the other ML models in terms
of providing more accurate estimations of both flows and speeds. The GP can yield a 39.74 veh/5-min
of RMSE and a 13.70% of MAPE for flow and a 2.7 mph of RMSE and a 2.64% for MAPE for speed,
while the other three produced much higher RMSEs and MAPEs of both flow and speed estimates. Further
19
(a) GP (b) LWR-PRGP
(c) PW-PRGP (d) ARZ-PRGP
Figure 9: Comparison between speed estimations by GP and PRGPs and the ground truth
comparison between the pure GP and the three PRGP models reveal that PRGP models can improve the
accuracy of both flow and speed estimations. However, the improvement is not significant, which is because
pure GP can already achieve a very good estimation performance and leaves limited space for improvement
by the PRGP. Moreover, to validate PRGP’s contribution in making the results more explainable, the
comparison with Heat-PRGP, which uses the physical knowledge from the heat equation, shows that a
physical model that cannot precisely describe the traffic flow patterns could even downgrade the capability
of the PRGP. Another side evidence is that PW and ARZ, which are the improved version of LWR, can
improve the performance of the PRGP compared with the LWR.
4.3.2. Comparison with physical models (Traffic Flow Models)
To provide physical baselines for the performance comparison, the LWR, PW, ARZ models are calibrated
with the obtained field data. For model calibration, we follow the method by Akwir et al. (2018), where the
hybrid scheme of neural network and nonlinear partial differential equation is used to dynamically adjust
all outputs of the three models to obtain their calibrated parameters. Figs 10-11 plot the estimated flow
and speed from the three physical models versus the ground truth. Obviously, the estimation results are
quite biased for both flow and speed.
20
Table 1: Comparison of the results of the proposed method and the baseline methods
Method Flow RMSE
(veh/5min)
Flow MAPE Speed RMSE
(mph)
Speed MAPE
Multilayer perceptron 113.95 30.80% 13.61 19.91%
Support Vector Machine 124.84 34.24% 9.58 13.01%
Random Forest 108.24 27.60% 8.66 12.02%
pure GP 39.74 13.70% 2.76 2.64%
LWR-PRGP 37.19 12.77% 2.96 2.65%
PW-PRGP 35.45 12.42% 3.02 2.68%
ARZ-PRGP 34.75 11.48% 2.90 2.72%
Heat-PRGP 79.51 23.49% 5.20 6.75%
(a) LWR (b) PW (c) ARZ
Figure 10: Estimated flow by the calibrated physical models v.s. ground truth
(a) LWR (b) PW (c) ARZ
Figure 11: Estimated speed by the calibrated physical models v.s. ground truth
To better justify models’ estimation accuracy, Table 2 shows the results of proposed method and the
calibrated physical models in estimation errors. It can be found that the proposed method significantly
outperforms the baseline methods by around 80 veh/5min in flow RMSE and 18% in MAPE and 7 mph in
speed RMSE and 15% in MAPE. Hence, it can be concluded that the estimation performance of traffic flows
models can be greatly improved if they are encoded into a ML framework. The real-world uncertainties of
21
flow and speed can be captured by the ML portion properly.
Table 2: Comparison of the results of the proposed methods and the physics-based methods
Method Flow RMSE
(veh/5min)
Flow MAPE Speed RMSE
(mph)
Speed MAPE
Calibrated LWR 115.75 32.96% 9.88 14.4%
LWR-regularized GP 37.19 12.77% 2.96 2.76%
Calibrated PW 115.80 30.00% 10.41 18.2%
PW-regularized GP 35.5 12.42% 3.02 2.68%
Calibrated ARZ 155.20 32.00% 12.71 18.4%
ARZ-regularized GP 34.75 11.48% 2.90 2.72%
4.3.3. Robustness study
As aforementioned, the proposed PRML framework is expected to be more robust than pure ML models
on noisy dataset. Hence, in this subsection, 50% of the training data is replaced by the flawed data,
which are generated with 100 veh/5min noises in flows, and the testing data keep unchanged. Notably, for
model evaluations, the testing dataset is not mixed with noises. Also, since GP can outperform multilayer
perceptron, support vector machine, and random forest in both flow and speed estimation, we will only
examine the robustness of GP and PRGP in this subsection. Table 3 and Figs 12-13 summarize their
estimation performance on the noised training data. The results show that the GP has limited resistance
to high biased data, e.g., caused by traffic detector malfunctions. The three PRGP models can greatly
outperform pure GP by about 160 veh/h of RMSE and over 100% of MAPE in flow estimations. Hence,
it can be concluded that the proposed PRML framework are much more robust than the pure ML models
when the input data is subject to unobserved random noise. This is due to PRML’s capability of adopting
physical knowledge to regularized the ML training process. The results also show that heat equation does
not capture the dynamics of the traffic flow, and only the well-developed traffic flow model can improve the
accuracy of Gaussian process.
5. Conclusions and Future Research Directions
In the literature, traffic flow models have been well developed to explain the traffic phenomena, however,
have theoretical difficulties in stochastic formulations and rigorous estimation. In view of the increasing
availability of data, the data-driven methods are prevailing and fast-developing, however, have limitations
of lacking sensitivity of irregular events and compromised effectiveness in sparse data. To address the issues
of both methods, a hybrid framework to incorporate the advantages of both methods is investigated. This
paper proposes a stochastic modeling framework to capture the random detection noise and the latent
22
Table 3: Comparison of the estimation accuracy with noisy training dataset
Method Flow RMSE
(veh/5min)
Flow MAPE Speed RMSE
(mph)
Speed MAPE
pure GP 212.17 135.19% 5.96 3.35%
GP-LWR 41.78 9.73% 6.01 3.46%
GP-PW 41.11 9.60% 4.43 3.30%
GP-ARZ 35.37 9.51% 3.06 2.72%
GP-HEAT 215.01 138.29% 4.31 33.6%
(a) GP (b) GP-LWR
(c) GP-PW (d) GP-ARZ
Figure 12: Comparison between flow estimation and ground truth with noisy training dataset
unobserved of traffic data as well as leveraging the well-defined fundamental diagram, conservation law and
momentum conditions. The traffic state indicators (i.e. flow, speed, density) are assumed to be multi-variant
Gaussian distributed. A physics regularized Gaussian process (PRGP) is proposed to encode the physics
knowledge in the Bayesian inference structure as the shadow Gaussian process. The shadow Gaussian
process is proven to regularize the conventional constraint-free Gaussian process as a soft constraint. To
estimate the proposed PRGP, a posterior regularized inference algorithm is derived and implemented with
23
(a) GP (b) GP-LWR
(c) GP-PW (d) GP-ARZ
Figure 13: Comparison between speed estimation and ground truth with noisy training dataset
auto-differentiation libraries. The computational complexity is cubic of the product of the sample size and
the output dimension O((Nd′)3+m3). A preliminary real-world case study is conducted on PeMS detection
data collected from a freeway segment in Utah and the well-known continuous traffic flow models (i.e. LWR,
PW, ARZ) are tested. In comparison to the pure machine learning methods and pure physical models, the
numerical results justify the effectiveness and the robustness of the proposed method.
The potential directions for future Research may include: (1) extending the proposed method to leverage
other models for traffic state estimation, such as discrete macroscopic traffic flow model regularized Gaussian
process; (2) extending the proposed method to solve other problems, such as microscopic behavior models
regularized Gaussian process for vehicle trajectory prediction; (3) extending the physics regularization
methodology in other machine learning algorithms, such as random forest and support vector machine to
combine general physics knowledge in learning tasks.
References
Akwir, N.A., Chedjou, J.C., Kyamakya, K., 2018. Neural-network-based calibration of macroscopic traffic
flow models, in: Recent Advances in Nonlinear Dynamics and Synchronization. Springer, pp. 151–173.
24
Alvarez, M., Luengo, D., Lawrence, N.D., 2009. Latent force models, in: Artificial Intelligence and Statistics,
pp. 9–16.
Alvarez, M.A., Luengo, D., Lawrence, N.D., 2013. Linear latent force models using gaussian processes.
IEEE transactions on pattern analysis and machine intelligence 35, 2693–2705.
Armand, A., Filliat, D., Ibanez-Guzman, J., 2013. Modelling stop intersection approaches using gaussian
processes, in: 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013),
IEEE. pp. 1650–1655.
Aw, A., Rascle, M., 2000. Resurrection of” second order” models of traffic flow. SIAM journal on applied
mathematics 60, 916–938.
Barber, D., Wang, Y., 2014. Gaussian processes for bayesian estimation in ordinary differential equations,
in: International Conference on Machine Learning, pp. 1485–1493.
Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M., 2018. Automatic differentiation in machine
learning: a survey. Journal of machine learning research 18.
Bishop, C.M., 2006. Pattern recognition and machine learning. springer.
Blandin, S., Argote, J., Bayen, A.M., Work, D.B., 2013. Phase transition model of non-stationary traffic
flow: Definition, properties and solution method. Transportation Research Part B: Methodological 52,
31–55.
Blandin, S., Couque, A., Bayen, A., Work, D., 2012. On sequential data assimilation for scalar macroscopic
traffic flow models. Physica D: Nonlinear Phenomena 241, 1421–1440.
Calderhead, B., Girolami, M., Lawrence, N.D., 2009. Accelerating bayesian inference over nonlinear dif-
ferential equations with gaussian processes, in: Advances in neural information processing systems, pp.
217–224.
Coifman, B., 2002. Estimating travel times and vehicle trajectories on freeways using dual loop detectors.
Transportation Research Part A: Policy and Practice 36, 351–364.
Colombo, R.M., 2003. Hyperbolic phase transitions in traffic flow. SIAM Journal on Applied Mathematics
63, 708–721.
Courant, R., Hilbert, D., 2008. Methods of Mathematical Physics: Partial Differential Equations. John
Wiley & Sons.
Daganzo, C.F., 1995. Requiem for second-order fluid approximations of traffic flow. Transportation Research
Part B: Methodological 29, 277–286.
25
Davis, G.A., Kang, J.G., 1994. Estimating destination-specific traffic densities on urban freeways for ad-
vanced traffic management. 1457.
Del Castillo, J., Pintado, P., Benitez, F., 1994. The reaction time of drivers and the stability of traffic flow.
Transportation Research Part B: Methodological 28, 35–60.
Di, X., Liu, H.X., Davis, G.A., 2010. Hybrid extended kalman filtering approach for traffic density estimation
along signalized arterials: Use of global positioning system data. Transportation Research Record 2188,
165–173.
Duan, Y., Lv, Y., Liu, Y.L., Wang, F.Y., 2016. An efficient realization of deep learning for traffic data
imputation. Transportation research part C: emerging technologies 72, 168–181.
Fan, S., Herty, M., Seibold, B., 2013. Comparative model accuracy of a data-fitted generalized aw-rascle-
zhang model. arXiv preprint arXiv:1310.8219 .
Fasshauer, G.E., 2011. Positive definite kernels: past, present and future. Dolomite Research Notes on
Approximation 4, 21–63.
Ganchev, K., Das, D., 2013. Cross-lingual discriminative learning of sequence models with posterior regular-
ization, in: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,
pp. 1996–2006.
Ganchev, K., Gillenwater, J., Taskar, B., et al., 2010. Posterior regularization for structured latent variable
models. Journal of Machine Learning Research 11, 2001–2049.
Gazis, D., Liu, C., 2003. Kalman filtering estimation of traffic counts for two network links in tandem.
Transportation Research Part B: Methodological 37, 737–745.
Gazis, D.C., Knapp, C.H., 1971. On-line estimation of traffic densities from time-series of flow and speed
data. Transportation Science 5, 283–301.
Graepel, T., 2003. Solving noisy linear operator equations by gaussian processes: Application to ordinary
and partial differential equations, in: ICML, pp. 234–241.
Gray, L., Griffeath, D., 2001. The ergodic theory of traffic jams. Journal of Statistical Physics 105, 413–452.
He, L., Gillenwater, J., Taskar, B., 2013. Graph-based posterior regularization for semi-supervised structured
prediction, in: Proceedings of the Seventeenth Conference on Computational Natural Language Learning,
pp. 38–46.
Heinonen, M., Yildiz, C., Mannerstrom, H., Intosalmi, J., Lahdesmaki, H., 2018. Learning unknown ode
models with gaussian processes. arXiv preprint arXiv:1803.04303 .
26
Hoogendoorn, S.P., Bovy, P.H., 2001. State-of-the-art of vehicular traffic flow modelling. Proceedings of the
Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 215, 283–303.
Ide, T., Kato, S., 2009. Travel-time prediction using gaussian process regression: A trajectory-based ap-
proach, in: Proceedings of the 2009 SIAM International Conference on Data Mining, SIAM. pp. 1185–
1196.
Jabari, S.E., Liu, H.X., 2012. A stochastic model of traffic flow: Theoretical foundations. Transportation
Research Part B: Methodological 46, 156–174.
Jabari, S.E., Liu, H.X., 2013. A stochastic model of traffic flow: Gaussian approximation and estimation.
Transportation Research Part B: Methodological 47, 15–41.
Jabari, S.E., Zheng, J., Liu, H.X., 2014. A probabilistic stationary speed–density relation based on newells
simplified car-following model. Transportation Research Part B: Methodological 68, 205–223.
Kang, J.G., 1995. Estimation of destination-specific traffic densities and identification of parameters on
urban freeways using Markov models of traffic flow. University of Minnesota.
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
.
Kuwahara, M., 2015. Theory, solution method and applications of kinematic wave. Interdisciplinary Infor-
mation Sciences 21, 63–75.
Laval, J.A., He, Z., Castrillon, F., 2012. Stochastic extension of newell’s three-detector method. Trans-
portation Research Record 2315, 73–80.
Lawrence, N.D., Sanguinetti, G., Rattray, M., 2007. Modelling transcriptional regulation using gaussian
processes, in: Advances in Neural Information Processing Systems, pp. 785–792.
Lebacque, J.P., Mammar, S., Salem, H.H., 2007. Generic second order traffic flow modelling, in: Trans-
portation and Traffic Theory 2007. Papers Selected for Presentation at ISTTT17Engineering and Physical
Sciences Research Council (Great Britain) Rees Jeffreys Road FundTransport Research FoundationTMS
ConsultancyOve Arup and Partners, Hong KongTransportation Planning (International) PTV AG.
Li, L., Li, Y., Li, Z., 2013. Efficient missing data imputing for traffic flow by considering temporal and
spatial dependence. Transportation research part C: emerging technologies 34, 108–120.
Libbrecht, M.W., Hoffman, M.M., Bilmes, J.A., Noble, W.S., 2015. Entropic graph-based posterior regu-
larization: Extended version, in: Proceedings of the International Conference on Machine Learning.
Lighthill, M.J., Whitham, G.B., 1955. On kinematic waves ii. a theory of traffic flow on long crowded roads.
Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 229, 317–345.
27
Liu, S., Yue, Y., Krishnan, R., 2013. Adaptive collective routing using gaussian process dynamic congestion
models, in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and
data mining, ACM. pp. 704–712.
Mihaylova, L., Boel, R., 2004. A particle filter for freeway traffic estimation, in: 2004 43rd IEEE Conference
on Decision and Control (CDC)(IEEE Cat. No. 04CH37601), IEEE. pp. 2106–2111.
Mihaylova, L., Boel, R., Hegiy, A., 2006. An unscented kalman filter for freeway traffic estimation, IFAC.
Nagel, K., Schreckenberg, M., 1992. A cellular automaton model for freeway traffic. Journal de physique I
2, 2221–2229.
Neumann, M., Kersting, K., Xu, Z., Schulz, D., 2009. Stacked gaussian process learning, in: 2009 Ninth
IEEE International Conference on Data Mining, IEEE. pp. 387–396.
Ni, D., Leonard, J.D., 2005. Markov chain monte carlo multiple imputation using bayesian networks for
incomplete intelligent transportation systems data. Transportation research record 1935, 57–67.
Osorio, C., Flotterod, G., Bierlaire, M., 2011. Dynamic network loading: a stochastic differentiable model
that derives link state distributions. Procedia-Social and Behavioral Sciences 17, 364–381.
Papageorgiou, M., 1998. Some remarks on macroscopic traffic flow modelling. Transportation Research
Part A: Policy and Practice 32, 323–329.
Papageorgiou, M., Blosseville, J.M., Hadj-Salem, H., 1989. Macroscopic modelling of traffic flow on the
boulevard peripherique in paris. Transportation Research Part B: Methodological 23, 29–47.
Pascale, A., Gomes, G., Nicoli, M., 2013. Estimation of highway traffic from sparse sensors: Stochastic
modeling and particle filtering, in: 2013 IEEE International Conference on Acoustics, Speech and Signal
Processing, IEEE. pp. 6158–6162.
Paveri-Fontana, S., 1975. On boltzmann-like treatments for traffic flow: a critical review of the basic model
and an alternative proposal for dilute traffic analysis. Transportation research 9, 225–235.
Payne, H., 1971. Models of freeway traffic and control. mathematical models of public systems.
Polson, N., Sokolov, V., 2017a. Bayesian particle tracking of traffic flows. IEEE Transactions on Intelligent
Transportation Systems 19, 345–356.
Polson, N.G., Sokolov, V.O., 2017b. Deep learning for short-term traffic flow prediction. Transportation
Research Part C: Emerging Technologies 79, 1–17.
Prigogine, I., Herman, R., 1971. Kinetic theory of vehicular traffic. Technical Report.
28
Raissi, M., Perdikaris, P., Karniadakis, G.E., 2017. Machine learning of linear differential equations using
gaussian processes. Journal of Computational Physics 348, 683–693.
Richards, P.I., 1956. Shock waves on the highway. Operations research 4, 42–51.
Rodrigues, F., Henrickson, K., Pereira, F.C., 2018. Multi-output gaussian processes for crowdsourced traffic
data imputation. IEEE Transactions on Intelligent Transportation Systems 20, 594–603.
Rodrigues, F., Pereira, F.C., 2018. Heteroscedastic gaussian processes for uncertainty modeling in large-scale
crowdsourced traffic data. Transportation research part C: emerging technologies 95, 636–651.
Seo, T., Bayen, A.M., Kusakabe, T., Asakura, Y., 2017. Traffic state estimation on highway: A compre-
hensive survey. Annual reviews in control 43, 128–151.
Song, Y., Zhu, J., Ren, Y., 2016. Kernel bayesian inference with posterior regularization, in: Advances in
Neural Information Processing Systems, pp. 4763–4771.
Sopasakis, A., 2012. Lattice free stochastic dynamics. Communications in Computational Physics 12,
691–702.
Sopasakis, A., Katsoulakis, M.A., 2006. Stochastic modeling and simulation of traffic flow: asymmetric
single exclusion process with arrhenius look-ahead dynamics. SIAM Journal on Applied Mathematics 66,
921–944.
Sumalee, A., Zhong, R., Pan, T., Szeto, W., 2011. Stochastic cell transmission model (sctm): A stochastic
dynamic traffic model for traffic state surveillance and assignment. Transportation Research Part B:
Methodological 45, 507–533.
Szeto, M.W., Gazis, D.C., 1972. Application of kalman filtering to the surveillance and control of traffic
systems. Transportation Science 6, 419–439.
Tak, S., Woo, S., Yeo, H., 2016. Data-driven imputation method for traffic data in sectional units of road
links. IEEE Transactions on Intelligent Transportation Systems 17, 1762–1771.
Tan, H., Feng, G., Feng, J., Wang, W., Zhang, Y.J., Li, F., 2013. A tensor-based method for missing traffic
data completion. Transportation Research Part C: Emerging Technologies 28, 15–27.
Tan, H., Wu, Y., Cheng, B., Wang, W., Ran, B., 2014. Robust missing traffic flow imputation considering
nonnegativity and road capacity. Mathematical Problems in Engineering 2014.
Tang, J., Zhang, G., Wang, Y., Wang, H., Liu, F., 2015. A hybrid approach to integrate fuzzy c-means based
imputation method with genetic algorithm for missing traffic volume data estimation. Transportation
Research Part C: Emerging Technologies 51, 29–40.
29
Wada, K., Usui, K., Takigawa, T., Kuwahara, M., 2017. An optimization modeling of coordinated traffic
signal control based on the variational theory and its stochastic extension. Transportation research
procedia 23, 624–644.
Wang, Y., Papageorgiou, M., 2005. Real-time freeway traffic state estimation based on extended kalman
filter: a general approach. Transportation Research Part B: Methodological 39, 141–167.
Wang, Y., Papageorgiou, M., Messmer, A., 2007. Real-time freeway traffic state estimation based on
extended kalman filter: A case study. Transportation Science 41, 167–181.
Whitham, G., 1975. Linear and nonlinear waves. Modern Book Incorporated.
Williams, C.K., Rasmussen, C.E., 2006. Gaussian processes for machine learning. volume 2. MIT press
Cambridge, MA.
Wilson, A.G., Hu, Z., Salakhutdinov, R., Xing, E.P., 2016. Deep kernel learning, in: Artificial Intelligence
and Statistics, pp. 370–378.
Work, D.B., Tossavainen, O.P., Blandin, S., Bayen, A.M., Iwuchukwu, T., Tracton, K., 2008. An ensemble
kalman filtering approach to highway traffic estimation using gps enabled mobile devices, in: 2008 47th
IEEE Conference on Decision and Control, IEEE. pp. 5062–5068.
Wu, Y., Tan, H., Qin, L., Ran, B., Jiang, Z., 2018. A hybrid deep learning based traffic flow prediction
method and its understanding. Transportation Research Part C: Emerging Technologies 90, 166–180.
Xie, Y., Zhao, K., Sun, Y., Chen, D., 2010. Gaussian processes for short-term traffic volume forecasting.
Transportation Research Record 2165, 69–78.
Yin, W., Murray-Tuite, P., Rakha, H., 2012. Imputing erroneous data of single-station loop detectors
for nonincident conditions: Comparison between temporal and spatial methods. Journal of Intelligent
Transportation Systems 16, 159–176.
Zhang, H.M., 2002. A non-equilibrium traffic model devoid of gas-like behavior. Transportation Research
Part B: Methodological 36, 275–290.
Zhong, M., Lingras, P., Sharma, S., 2004. Estimation of missing traffic counts using factor, genetic, neural,
and regression techniques. Transportation Research Part C: Emerging Technologies 12, 139–166.
Zhu, J., Chen, N., Xing, E.P., 2014. Bayesian inference with posterior regularization and applications to
infinite latent svms. The Journal of Machine Learning Research 15, 1799–1847.
30