Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro...

11
1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The metro system is playing an increasingly im- portant role in the urban public transit network, transferring a massive human flow across space everyday in the city. In recent years, extensive research studies have been conducted to improve the service quality of metro systems. Among them, crowd management has been a critical issue for both public transport agencies and train operators. In this paper, by utilizing accumulated smart card data, we propose a statistical model to predict in-situ passenger density, i.e., number of on-board passengers between any two neighbouring stations, inside a closed metro system. The proposed model performs two main tasks: i) forecasting time-dependent Origin-Destination (OD) matrix by applying mature statistical models; and ii) estimating the travel time cost required by different parts of the metro network via truncated normal mixture distributions with Expectation- Maximization (EM) algorithm. Based on the prediction results, we are able to provide accurate prediction of in-situ passenger density for a future time point. A case study using real smart card data in Singapore Mass Rapid Transit (MRT) system demonstrate the efficacy and efficiency of our proposed method. Index Terms—ExpectationMaximization algorithm, metro sys- tems, passenger crowding prediction, origin-destination matrix, smart card data, truncated normal distribution I. I NTRODUCTION Metro, as one of the most efficient public transport modes, has been an important component in urban development for land-scarce and rail-rely countries like Japan and Singapore. However, as the urban population grows, the increasing public transport demands, especially during peak hours, brings up concern regarding both passenger safety and metro operation security. In such cases, real-time prediction of passenger density inside the metro network is highly demanded for both system operators and passengers. Accurate and fine- grain prediction can support metro agencies for better train operation planning, assist to detect potential abnormal traffic flow and render fast remedial strategies, and provide real-time traffic information to passengers for better travel planning and overcrowding avoidance. Nowadays, the automated fare collection system provides accessibility of massive metro trip data (a.k.a., smart card data) for public transport studies. With the AFC system, This research is supported by the National Research Foundation, Singapore under its International Research Centres in Singapore Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. X. Tian is with Living Analytics Research Centre, Singapore Management University, Singapore 188065 (email: [email protected]) C. Zhang is with Department of Industrial Engineering, Tsinghua Univer- sity, Beijing 100084, China (email: [email protected]) B. Zheng is with School of Information Systems, Singapore Management University, Singapore 188065 (email: [email protected]) when passengers tap their cards on an entry/exit card reader, information such as date, time, location (station ID), and travel direction are recorded. Such data can be used for study of passenger flow distribution, origin-destination (OD) matrix, travel time distribution, trip purposes, passenger route choice preferences, travel behavior analysis, etc. [1] presents a comprehensive review of uses of smart card data in public transit. Though there exist some works performing passenger flow prediction by utilizing AFC data, most of them were conducted at the aggregate level, e.g., estimation of passenger inflow and outflow at each station, or the passenger flow of each OD pair. Yet methods providing finer-grain in-situ passen- ger density estimation across the whole metro network, i.e., number of on-board passengers in each rail segment between two neighboring stations, are not available. This is due to the fact that metro is a closed system, and AFC data only captures trip information of boarding/alighting stations and timestamps, without tracking passengers’ real-time locations inside the metro system. Furthermore, as most metro systems are designed to be fault tolerant, there could be multiple routes linking an origin station to a destination station. However the actual route taken by an individual passenger remains unknown, which brings additional challenges to prediction of in-situ passenger density. To address the above mentioned issues, in this research work, we propose a statistical inference framework, namely PIPE, to provide accurate and fine-grain prediction of in- situ passenger density across the metro network, by utilizing collected AFC data. PIPE involves two tasks: a) forecasting time-dependent OD matrices; This aims to infer the potential number of passengers in each OD path; and b) inferring travel time variability of each transit link (to be defined in Section III) of the metro network. This aims to infer how the passengers inferred by a) distribute in different segments of the network. In particular, we directly apply existing machine learning algorithms for the first task. The core of the framework is the second task. Here we propose a white-box statistical inference model by assuming that the travel time distribution for each transit link of the metro system follows a truncated Gaussian distribution. We infer the distribution parameters together with the population-level route choice probabilities using truncated Gaussian mixture models. By leveraging the inferences of both tasks, we can achieve accurate prediction of in-situ passenger density inside the metro system. Note that PIPE is different from the in-train passenger load prediction models [2]–[4], which generally require a arXiv:2009.02880v1 [cs.LG] 7 Sep 2020

Transcript of Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro...

Page 1: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

1

Crowding Prediction of In-Situ Metro PassengersUsing Smart Card Data

Xiancai Tian, Chen Zhang, Baihua Zheng

Abstract—The metro system is playing an increasingly im-portant role in the urban public transit network, transferringa massive human flow across space everyday in the city. Inrecent years, extensive research studies have been conductedto improve the service quality of metro systems. Among them,crowd management has been a critical issue for both publictransport agencies and train operators. In this paper, by utilizingaccumulated smart card data, we propose a statistical modelto predict in-situ passenger density, i.e., number of on-boardpassengers between any two neighbouring stations, inside a closedmetro system. The proposed model performs two main tasks: i)forecasting time-dependent Origin-Destination (OD) matrix byapplying mature statistical models; and ii) estimating the traveltime cost required by different parts of the metro networkvia truncated normal mixture distributions with Expectation-Maximization (EM) algorithm. Based on the prediction results,we are able to provide accurate prediction of in-situ passengerdensity for a future time point. A case study using real smart carddata in Singapore Mass Rapid Transit (MRT) system demonstratethe efficacy and efficiency of our proposed method.

Index Terms—ExpectationMaximization algorithm, metro sys-tems, passenger crowding prediction, origin-destination matrix,smart card data, truncated normal distribution

I. INTRODUCTIONMetro, as one of the most efficient public transport modes,

has been an important component in urban development forland-scarce and rail-rely countries like Japan and Singapore.However, as the urban population grows, the increasing publictransport demands, especially during peak hours, brings upconcern regarding both passenger safety and metro operationsecurity. In such cases, real-time prediction of passengerdensity inside the metro network is highly demanded forboth system operators and passengers. Accurate and fine-grain prediction can support metro agencies for better trainoperation planning, assist to detect potential abnormal trafficflow and render fast remedial strategies, and provide real-timetraffic information to passengers for better travel planning andovercrowding avoidance.

Nowadays, the automated fare collection system providesaccessibility of massive metro trip data (a.k.a., smart carddata) for public transport studies. With the AFC system,

This research is supported by the National Research Foundation, Singaporeunder its International Research Centres in Singapore Funding Initiative. Anyopinions, findings and conclusions or recommendations expressed in thismaterial are those of the author(s) and do not reflect the views of NationalResearch Foundation, Singapore.

X. Tian is with Living Analytics Research Centre, Singapore ManagementUniversity, Singapore 188065 (email: [email protected])

C. Zhang is with Department of Industrial Engineering, Tsinghua Univer-sity, Beijing 100084, China (email: [email protected])

B. Zheng is with School of Information Systems, Singapore ManagementUniversity, Singapore 188065 (email: [email protected])

when passengers tap their cards on an entry/exit card reader,information such as date, time, location (station ID), andtravel direction are recorded. Such data can be used forstudy of passenger flow distribution, origin-destination (OD)matrix, travel time distribution, trip purposes, passenger routechoice preferences, travel behavior analysis, etc. [1] presentsa comprehensive review of uses of smart card data in publictransit.

Though there exist some works performing passenger flowprediction by utilizing AFC data, most of them were conductedat the aggregate level, e.g., estimation of passenger inflowand outflow at each station, or the passenger flow of eachOD pair. Yet methods providing finer-grain in-situ passen-ger density estimation across the whole metro network, i.e.,number of on-board passengers in each rail segment betweentwo neighboring stations, are not available. This is due tothe fact that metro is a closed system, and AFC data onlycaptures trip information of boarding/alighting stations andtimestamps, without tracking passengers’ real-time locationsinside the metro system. Furthermore, as most metro systemsare designed to be fault tolerant, there could be multiple routeslinking an origin station to a destination station. Howeverthe actual route taken by an individual passenger remainsunknown, which brings additional challenges to prediction ofin-situ passenger density.

To address the above mentioned issues, in this researchwork, we propose a statistical inference framework, namelyPIPE, to provide accurate and fine-grain prediction of in-situ passenger density across the metro network, by utilizingcollected AFC data. PIPE involves two tasks: a) forecastingtime-dependent OD matrices; This aims to infer the potentialnumber of passengers in each OD path; and b) inferringtravel time variability of each transit link (to be definedin Section III) of the metro network. This aims to inferhow the passengers inferred by a) distribute in differentsegments of the network. In particular, we directly applyexisting machine learning algorithms for the first task. Thecore of the framework is the second task. Here we proposea white-box statistical inference model by assuming that thetravel time distribution for each transit link of the metrosystem follows a truncated Gaussian distribution. We inferthe distribution parameters together with the population-levelroute choice probabilities using truncated Gaussian mixturemodels. By leveraging the inferences of both tasks, we canachieve accurate prediction of in-situ passenger density insidethe metro system.

Note that PIPE is different from the in-train passengerload prediction models [2]–[4], which generally require a

arX

iv:2

009.

0288

0v1

[cs

.LG

] 7

Sep

202

0

Page 2: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

2

lot of additional information, such as on-board headcountdata, train timetables or passenger GPS data as input. YetPIPE only requires AFC data. Furthermore, compared with in-train crowding prediction, PIPE aims to infer the macroscopicspatiotemporal passenger density inside the network, ratherthan the route or train each individual passenger chooses orboards. In conclusion, PIPE’s contribution is twofold:(a) PIPE represents the first attempt to make fine-grain pre-

diction of in-situ passenger density in each rail segmentacross a closed metro system.

(b) By taking the metro network topological structure intoconsideration, PIPE proposes a truncated Gaussian mix-ture model to infer the travel time of each rail segmentof the metro system, and to analyze the population-levelroute choice preferences, which brings high interpretabil-ity to the modelling result.

The remainder of this paper is organized as follows. Sec-tion II reviews existing works related to this paper. Sec-tion III presents the preliminaries of our work, including tripreconstruction process and generation of route choice sets.Section IV details the proposed PIPE framework. Section Vreports a case study using Singapore metro AFC data. Finally,Section VI provides concluding remarks.

II. LITERATURE REVIEW

Some of the most related research topics include i) pas-senger flow prediction, ii) OD matrix prediction, iii) traveltime estimation and route selection, and iv) spatiotemporalcrowding analysis.Passenger Flow Prediction. Passenger flow prediction hasbeen extensively studied in many literature works, amongwhich statistical and machine learning-based approaches havebecome increasingly attractive. In particular, [5]–[7] proposedseveral time series models, such as autoregressive integratedmoving average (ARIMA) model and state space model, forshort-term passenger flow prediction at each metro station.Besides, online decomposition based methods, such as non-negative matrix factorization model [8] and wavelet decompo-sition model [9], have also been proposed, where the passengerflow features are extracted and used for prediction.

Recently deep neural network-based methods have also beendeveloped for extracting the complex spatial and temporaldependence structure of traffic flows. Various network struc-tures have been applied for metro passenger flow prediction,such as the most classical multiscale radial basis functionnetworks [10], stacked auto-encoder [11], Long Short TermMemory (LSTM) model [12], sequence-to-sequence modelwith attention mechanism [13], etc. However, all the modelsdeveloped so far only predict passenger counts, i.e., the inflowand outflow, at the station level, regardless of where thepassengers come from or where they are headed, let alonehow they are distributed in the system.OD Matrix Forecasting. OD matrix forecasting aims at pre-dicting trip demands between different nodes (metro stationsin our case) of the network. Some pioneer works use linearstatistical models for analysis. For example, [14] used Kalmanfilters to predict time-dependent OD flows of drivers, based

on traffic volume and average speed data collected using on-board sensors. A similar approach was also proposed in [15]for short-term forecasting of OD Matrix for bus boardingin China. [16] uses ARIMA model for dynamic forcastingof time-dependent OD flows in a dutch passenger rail. Onecrucial limitation of the aforementioned linear models is thatthey cannot represent complex relationship that non-linearmodels do.

Recently, newly developed deep learning models, e.g.,Recurrent Neural Networks (RNN), have also been widelyused for sequence prediction problems. For example, [17]applies LSTM to predict OD matrix in the metro system.[18] proposed a matrix factorization embedded graph CNNfor city ground transportation OD matrics prediction. [19]proposes a Multi-Perspective Graph Convolutional Networks(MPGCN) with LSTM to extract temporal features for ODmatrix prediction. Yet these methods treat the metro systemas a black box and only predict the enter/exit of passengers,but cannot track the passenger locations or infer the in-situpassenger density.

Travel Time Estimation and Route Selection. To better trackpassengers’ trajectory inside the metro system, many recentresearches rely on statistical modelling to infer travel timevariability and passenger route choices.

These models generally characterize trip travel time betweeneach OD pair as mixture distributions from its candidateroutes. Travel time of each candidate route can then bedecomposed into time of each transit link, such as the plat-form waiting time, in-vehicle travel time, transfer time, etc.Passengers’ route selection can be influenced by a varietyof factors such as the number of transfers required and thetotal travel time. Different assumptions on the travel time,such as constant [20], Gaussian distribution [21]–[23], Poissondistribution [24], have also been discussed in the literature.However, some of them require additional knowledge, suchas the train schedule table [20], [22] and train crowdinginformation [21], which is not always available in reality.Furthermore, all these methods aimed at recovering the routestaken by individual passengers, and did not provide a solutionto make macroscopic forecasting of in-situ passenger densityacross the metro system.

Spatiotemporal Crowding Analysis. To our best knowledge,only a few works utilize AFC data for crowdedness estimationor passenger distribution inference inside metro systems. Inparticular, [25] constructed a regression model to extract thespatial distribution of passengers by dividing them into twogroups, based on whether they are travelling on the train orwaiting at the platform. However, this method focused on asingle track scenario that is oversimplified. [20], [26] proposedan empirical probability model to estimate the route choiceprobabilities from the perspective of individual passengerbased on the AFC data and train operating time table, andfurther extracted spatio-temporal segmentation information oftrips as a by-product. As an alternative, some models also aimat directly estimating the in-train passenger density. For exam-ple, [4] constructed a LSTM encoder-predictor combined witha contextual representation for train load prediction. [27] also

Page 3: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

3

proposed Boosted Regression Tree Ensemble for both train-centered prediction and station-centered crowdedness predic-tion. All the above models require additional information, suchas the train operating timetable and passenger load of eachtrain car. However, these types of information are not alwaysavailable, which hinder their applications in general cases.

III. PRELIMINARY

In this section, we first propose a trip reconstruction processin Subsection III-A, which decomposes a trip into a sequenceof travel steps. We formulate the metro system as an undirectednetwork and generate a feasible route set for each OD pair ofthe metro network in Subsection III-B. These two steps layfoundation for in-situ passenger density prediction.

A. Trip Reconstruction

We model a metro network as a general transportation graphG(S,E,L), consisting of a set of metro stations S, a set ofedges E, and a set of metro lines L. A station s ∈ S couldbe either a normal station that is crossed by only one metroline or an interchange station that is crossed by multiple lines.An edge (or a segment, interchangeably) e(si, sj , l) ∈ E isdefined as a segment on a train line l ∈ L that connectsthe two neighbouring stations si and sj without passing anyother station. Stations si and sj are adjacent if there is anedge e(si, sj , l) ∈ E between them. Note that there couldbe multiple edges between two adjacent stations (si, sj), cor-responding to different metro lines. This undirected networkformulation is reasonable since most metro systems in theworld are bi-directional. However, the techniques developedin this paper could be easily extended to support the casewhere a metro system has single-directional lines and shouldbe modelled as a directed graph.

A route rij from an origin station si to a destination stationsj is a sequence of adjacent edges 〈e1, · · · , eLrijij 〉 that couldbring passengers from station si to station sj . In this paper, weonly consider simple routes without loop, so that each routeonly visits a station at most once.

We denote T rijij as the corresponding travel time requiredwhen a passenger takes a particular route rij to travel fromthe origin station si to the destination station sj (note therecould be multiple possible routes which will be detailed later).As illustrated in Fig. 1, a trip normally consists of threecomponents: the entry component, the travel component, andthe exit component. If the route taken requires transfers, anadditional transfer component is involved. Accordingly, wecan model T rijij by decomposing it into travel time of thefollowing four kinds of travel components described above.• T gsi represents the time required by an entry link, con-

sisting of the walking time from an entry turnstile at theorigin station si to the platform and the waiting time fornext train at the platform.

• T ce represents the time required by a travel link, consistingof time spent in travelling on edge e;

• T qs represents the time required by a transfer link, con-sisting of the walking time from one metro platform to

another, and the waiting time for the next train at aninterchange station s; and

• T asj represents the time required by an exit link, i.e., thewalking time from the platform to the turnstiles at thedestination station sj .

Hereafter the term transit link is used to refer to one com-ponent of a trip via a metro system, which contributes to thetotal time required by a trip from entering the origin stationto exiting the destination station.

Given an edge e(si, sj , lx) connecting station si and stationsj along service line lx, we assume the travel time requiredfrom si to sj via service line lx is identically distributed asthat required from sj to si via the same line. This assumptiongenerally holds for most metro systems. Yet our analyticframework could be easily extended to cases when the traveltime from si to sj is asymmetric, even along the same serviceline.

After decomposing a trip into four different types of transitlinks, we can sum up the time spent on each transit link ofrij and calculate the total travel time required by rij as statedin Equation (1).

Trij = T gsi +∑L

rijij

b=1T ceb +

∑s∈S

rijij

T qs + T asj . (1)

Here Srijij refers to the set of interchange stations on route

rij where passengers make transfers. If we can derive thetravel time required by each transit link involved in rij , wecan predict the exit time of a trip, given its entry time. Inaddition, we can also infer the position of a passenger in themetro system at any time point before (s)he ends the trip.

B. Route Choice Set Generation

Commonly, in a metro system, there could be multipleroutes for some OD pairs. In the following, the term routechoice set corresponding to each OD pair 〈i, j〉, denoted asRij , represents all the routes used by passengers to travel fromsi to sj . We could adopt different strategies to generate Rij ,such as edge elimination and k-shortest-paths. In this paper,considering the number of stations in a metro system is usuallyin the scale of either tens or hundreds (e.g., as the largestmetro system in the world, New York City Subway has in total400+ stations). We simply adopt brute-force-search algorithmto form Rij for different OD pairs.

In the search process, note that NOT all the available routesare actually practical, e.g., passengers do not prefer a routethat is much longer or with too more transfers than others.Therefore we exclude routes that satisfy at least one of thefollowing criteria from Rij : i) routes with any loops; ii) routesthat are not the shortest path (in terms of number of transitlinks) but require more than σ transfers; and iii) routes with β(> 1) times number of links than the shortest route rminij , i.e.,the route with a minimum number of transit links out of Rod.The controlling parameters β and σ could be set according tothe assumptions of passengers’ behavior. For example, in ourstudy, we set both β and σ to be two. The notation Mij standsfor the number of routes inside the route choice set Rij .

Page 4: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

4

Fig. 1. Transit links of a trip in a metro system

metrosystemPredictionofin-situpassenger

density

Reconstructedtrips

Inferenceofroutepreferences

Predictionofaligtingrate

Routechoiceset

Smartcarddata

Inferenceoftraveltime

ForecastingofODmatrix

Fig. 2. PIPE framework for predicting the metro crowding of in-situpassengers.

IV. IN-SITU PASSENGER DENSITY PREDICTION

In this section, we propose a framework namely PIPE topredict the in-situ passenger density in the metro system basedon AFC data.

Given a metro system G(S,E,L), for a particular day, wewould like to predict the passenger density on edge e at somepoint t in the future, i.e., Xe(t). This can be formulated as:

Xe(t) =∑i∈S

∑j 6=i

∫τ<t

Vij(τ)Pij|eP (Ti,e = t− τ)dτ (2)

where• Vij(τ): the number of passengers boarding at si at timeτ and later alighting at sj ,

• Pij|e: the probability that a passenger boarding at si andalighting at sj would take a route containing edge e,

• P (Ti,e = t − τ): the probability that it takes time t − τfor a passenger boarding at si to reach edge e.

Similarly, we can model the number of passengers alightingat sj in future time t, i.e., Xj(t), as:

Xj(t) =∑

i∈S∧i 6=j

∫τ<t

Vij(τ)P (Tij = t− τ)dτ (3)

where• P (Tij = t − τ): the probability that it takes time t − τ

for a passenger to travel from si to sj .Without loss of generality, we assume the AFC data set D

including in total N historical metro trips, i.e., D = ∪Nn=1trn.Each metro trip is represented as tr = (id, o, d, to, td, c),where id is an encrypted unique string identifying a smartcard, o is the origin station, d is the destination station, toand td record the timestamp when the passenger enters thestation o and exits the station d respectively, and c ∈ C refersto the passenger category (e.g., C = {child, adult, senior,student} in Singapore). T = td − to captures the real traveltime required by this trip. For brevity, we also represent a tripas tr = (id, o, d, to, T, c). Based on D, we want to predictXe(t) and Xj(t). To achieve the prediction goals, we need to

infer Vij(τ), the distributions of Ti,e and Tij , and Pij|e. Theseactually can be divided into two tasks in PIPE: i) forecastingthe number of passengers travelling between each OD pairgiven the entry time window τ 1, i.e., Vij(τ); and ii) estimatingthe travel time parameters for all transit links and route choiceprobabilities for all routes in Rij . Fig. 2 plots the architectureof the proposed solution, we detail each of its components insubsequent subsections.

A. Prediction of Vij(τ)

As introduced in Section II, many works have been devotedto predict Vij(τ) for a particular day (here without confusion,we omit the day subscription for brevity) using statistical ormachine learning techniques in the last decade. In this paper,we consider the following candidates.

Define Xink,i(t), Xout

k,i (t), Vk,ij(t) as the passenger inflowand outflow of si, the passenger flow from si to sj at timewindow t in day k of the training data set. Assume we havek = 1, . . . ,K days of data. The first vanilla candidate is acalendar model using the historical average of the K daysto predict OD matrix for the testing day, , i.e., predictingVij(τ) for a given time window τ by averaging trip countsthat occurred in that same time window of the K days, i.e.,Vij(τ) =

∑Kk=1 Vk,ij(τ)/K.

Besides the vanilla method, we explore the following ma-chine learning and deep learning algorithms:• Linear regression models: We consider

Vij(τ) =∑s∈S

∆∑l=1

[as,lij X

ins (τ − l) + bs,lij X

outs (τ − l)

](4)

where as,lij , bs,lij are regression coefficients of passengerinflow and outflow at station s with a lag order ofl respectively, ∆ is the maximum lag order decidedbased on validation performance. Considering the numberof inputs is high, we introduce regularization to thecoefficients by using Lasso [28] and Ridge [29]) to filterout unrelated inputs.

• Random forest model [30]: Each decision tree takeslagged passenger inflow Xin

s (t− l) and outflow Xouts (t−

l), l = 1, . . . ,∆, as predictors, the final prediction is themean predictions of all individual trees.

• The time series ARIMA model [31]: We formulate theproblem as an univariate time series prediction problem.Trip counts of each OD pair Vk,ij(t) are sorted by dateand time in ascending order and use ARIMA for model

1In this paper, we break time into 20-minute time windows

Page 5: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

5

fitting. In particular, we adopt a walk-forward validationmethod to evaluate model performance. It is a practiceused to evaluate time series models when the model isexpected to be updated sequentially as new observationsare available. For each time window in the testing day, amodel constructed based on the training dataset will beused for prediction. Then the observation of the currenttime window will be added to the training dataset andthe process repeat. When performing multi-step aheadprediction, say m-step ahead, the prediction results fromthe past windows Vij(τ +1), . . . , Vij(τ +m−1), insteadof the ground truth values, are taken as observations forpredicting Vij(τ +m).

• LSTM [32]: The OD matrix of the 100 past time windowsτ− l, l = 1, . . . , 100 are taken as inputs to predict Vij(τ).For the training step, we used a validation-based earlystopping [33]. This method allows to avoid overfittingby stopping the training of the model when the loss ofthe validation set stops decreasing. We use the ADAMimplementation of Stochastic gradient descent (SGD)for weights optimization. The model architecture consista LSTM layer with 9, 000 hidden units and a fullyconnected layer.

For ARIMA and LSTM, it is to be noted that in our case forsome long traval time routes, when predicting Vij(τ), Vij(τ −1), Vij(τ − 2),...,Vij(τ − d) are possibly unknown. This isbecause passengers of Vij(τ − d) can still be in the middle oftrip and have not tapped out yet. Here d is the order of lagdepending on the travel duration form station i to station j.For this case, we simply remove these d windows’ data fromthe input set. Take Singapore MRT network as an example, asalmost all trips can be finished within two hours except that aMRT break down happened, we can set d = 6 (, i.e., 2 hours)whenever forecasting Vij(τ). For example, when performingone-step ahead prediction of Vij(τ), only Vij(τ − 7), Vij(τ −8),... are considered as inputs.

B. Estimation of P (Ti,e), P (Ti,j) and Pij|e

For a certain OD pair 〈i, j〉, we assume i) it has m =1, . . . ,Mij possible routes and ii) the route m includes Lmijedges. Then, the total travel time of a trip taking route m canbe decomposed into the travel time of different transit links, asstated in Equation (5). For notation convenience, we denote allthese transit links as set Hmij , and the total travel time equalsthe sum of the time required by each transit link in Hmij :

Tmij = T gsi +

Lmij∑b=1

T ceb +∑s∈Smij

T qs + T asj =∑h∈Hmij

Th. (5)

In order to consider travel time variability, we assumetravel time T gs , T ce , T qs and T as follow truncated Gaussiandistributions [34], i.e., T gs ∼ TN(µgs , σ

gs , a

gs , b

gs), T ce ∼

TN(µce,σce, a

ce, b

ce), T qs ∼ TN(µqs, σ

qs , a

qs, b

qs), and T as ∼

TN(µas , σas , a

as , b

as), where the probability distribution func-

tion of the truncated Gaussian distribution TN(µ, σ, a, b) isdefined as

x ∼ TN(µ, σ, a, b) =

{φ( x−µσ )

σ(Φ( b−µσ )−Φ( a−µσ ))if a ≤ x ≤ b,

0 otherwise .

Then, the distribution of Tmij can be also approximated bya truncated Gaussian distribution [35] as

Tmij ∼ TN(µmij , σmij , a

mij , b

mij ), (6)

where

µmij = µgsi +

Lmij∑b=1

µceb +∑s∈Smij

µqs + µasj =∑h∈Hmij

µh (7)

σm2

ij = σg2

si +

Lmij∑b=1

σc2

eb+∑s∈Smij

σq2

s + σa2

sj =∑h∈Hmij

σ2h (8)

amij = agsi +

Lmij∑b=1

aceb +∑s∈Smij

aqs + aasj =∑h∈Hmij

ah (9)

bmij = bgsi +

Lmij∑b=1

bceb +∑s∈Smij

bqs + basj =∑h∈Hmij

bh (10)

For OD pairs 〈i, j〉 with single possible route, i.e., Mij = 1,the travel time Tij ∼ TN(µ1

ij , σ1ij , a

1ij , b

1ij). For OD pair 〈i, j〉

with multiple possible routes, i.e., Mij > 1, we assume Tijfollows a truncated Gaussian mixture distributions:

Tij ∼∑Mij

m=1πm,cij TN(µmij , σ

mij , a

mij , b

mij ). (11)

Here, πm,cij is the probability that passengers choose routermij out of Rij , and

∑Mij

m=1 πm,cij = 1, with c represent-

ing the category of passengers. For example, in Singa-pore, there are four categories of passengers, i.e., C ={Adult, Child, Senior, Student}. Based on our observation,route preferences could differ between passenger categories.For example, seniors may prefer more comfort routes whichrequires longer travel time but are less crowded, since oldpeople are more flexible in terms of time and yet are physicallymore vulnerable. In contrast, commuters, who generally rushfor time, probably prefer the shortest routes even though theyare super crowded.

Now we discuss about how to estimate the above truncatedGaussian mixture models. In particular, we use parameterset Θ to represent {µgs , σgs}, {µqs, σqs}, {µas , σas} for anystation s ∈ S, {µce, σce} for any edge e ∈ E, and πij =

{π1,cij , . . . , π

Mij ,cij ;∀Mij > 1,∀c ∈ C}. We use maximum

likelihood estimation to estimate Θ based on accumulatedAFC data D = {trn(id, o, d, to, T, c), n = 1, . . . , N}. As foreach trip in D, if its OD pair has more than one possible route,the route choice information is missing. We propose to useExpectation-Maximization (EM) method to estimate the routechoice preferences together with θ. In particular, suppose themissed route choice information for trip trn is known as Zn.Here, if only one route is available, Zn = 1; if in total Mn

routes are available, Zn = [Zn1, . . . , ZnMn]. Znm = 1 if the

route taken by trn is rmondn and Znm = 0 otherwise. Then,

Page 6: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

6

we have Z = {Zn, n = 1, . . . , N}. Consequently, the fulllikelihood of a particular Θ given the AFC card dataset Dand route choice Z can be formulated as

L (Θ|D,Z) =∏N

n=1

[IMondn=1TN

(tn|µ1

n, σ1n, a

1n, b

1n

)+ IMondn>1

Mondn∏m=1

(πm,cnondn

TN (tn|µmn , σmn , amn , bmn ))Znm]

.

(12)

For label convenience, we abuse the notation µmn = µmondn =∑h∈Hmondn

µh, σm2

n = σm2

ondn=∑h∈Hmondn

σ2h, amn =

amondn =∑h∈Hondn

ah, and bmn = bmondn =∑h∈Hmondn

bh.Taking the logarithm of Equation (12), we can get the log-likelihood as

l(Θ|D,Z) =∑N

n=1

{IMondn=1 ln

(TN (tn|µn, σn, an, bn)

)+IMondn>1

∑M

m=1Znm

[ln(πm,cnondn

)

+ ln(TN (tn|µmn , σmn , amn , bmn )

)]}.

(13)

However, in reality, the route information Znm is unknownand hence Equation (13) cannot be solved directly. The idea ofEM algorithm is to iteratively estimate Θ(k+1) by maximizingthe expectation of the complete log-likelihood function, i.e.,EZ|D,Θ(k) [l (Θ|Z,D)], given the current estimated param-eters Θ(k). In our formulation, this can be achieved viareplacing Znm by E

(Znm|D,Θ(k)

)in Equation (13). In

particular,

Znm =E(Znm|D,Θ(k)

)=

πm,cn(k)ondn

TN(tn|µm(k)

n , σm(k)n , a

m(k)n , b

m(k)n

)∑Mondnm=1 π

m,cn(k)ondn

TN(tn|µm(k)

n , σm(k)n , a

m(k)n , b

m(k)n

)(14)

Then, we have

EZ|D,Θk [l (Θ|Z,D)] = l(D, Z|Θ

)=

N∑n=1

{IMondn=1 ln

(TN (tn|µn, σn, an, bn)

)+IMondn>1

M∑m=1

Znm

[ln(πm,cnondn

)+ ln

(TN (tn|µmn , σmn , amn , bmn )

)]}.

(15)

We reformulate the parameters and update {µgs , σg2

s },{µas , σa

2

s }, {µqs, σq2

s } for s ∈ S, {µce, σc2

e } for e ∈ E, andπm,cij = {π1,c

ij , . . . , πMij ,cij ;∀Mij > 1,∀c ∈ C} separately.

For example, to maximize {µgs , σgs}, we extract the part ofEquation (15) that relates to {µgs , σg

2

s } as:

l(µgs , σg2

s ) =

N∑n=1

{I(Mondn=1,on=s) ln

(TN(tn|µn, σn, an, bn)

)+I(Mondn>1,on=s)

M∑m=1

Znm

[ln(πm,cnondn

)

+ ln(TN (tn|µmn , σmn , amn , bmn )

)]}(16)

The maximization of Equation (16) has no closed formsolution. Consequently we apply stochastic gradient descent(SGD) to update the value of {µgs , σg

2

s } in a iterative way. Inparticular, the first derivatives of (15) with respect to µgs andσg

2

s are G = [ ∂l∂µgs

, ∂l

∂σg2s

] with

∂l

∂µgs=

N∑n=1

I(Mondn=1,on=s)

[1

σn

φ( bn−µnσn)− φ(an−µnσn

)

Φ( bn−µnσn)− Φ(an−µnσn

)

+(Yn − µn)

σ2n

]+

N∑n=1

I(Mondn>1,on=s)

[M∑m=1

Zlm 1

σmn

φ(bmn −µ

mn

σmn)− φ(

amn −µmn

σmn)

Φ(bmn −µmnσmn

)− Φ(amn −µmnσmn

)+

(Yn − µmn )

σm2

n

,(17)

∂l

∂σg2

s

=

N∑n=1

I(Mondn>1,on=s)

[M∑m=1

Znm

((Yn − µmn )2

2σm4

n

− 1

2σm2

n

+1

2σm3

n

(bmn − µmn )φ(bmn −µ

mn

σmn)− (amn − µmn )φ(

amn −µmn

σmn)

Φ(bmn −µmnσmn

)− Φ(amn −µmnσmn

)

+

N∑n=1

I(Mondn=1,on=s)

[(Yn − µn)2

2σ4n

− 1

2σ2n

+1

2σ3n

(bn − µn)φ( bn−µnσn)− (an − µn)φ(an−µnσn

)

Φ( bn−µnσn)− Φ(an−µnσn

)

].

(18)

Similarly, we can estimate {µgs , σg2

s }, {µce, σc2

e }, {µqs, σq2

s }and {µas , σa

2

s } for all transit links.Thereafter, the updated πm,cij becomes

πm,cij =

∑Nn=1 I(on=i,dn=j,cn=c)Znm∑Mij

m=1

∑Nn=1 I(on=i,dn=j,cn=c)Znm

(19)

Now we talk about how to set the truncation points fordifferent transit links. In specific, truncation points [a, b] fortransfer link T qs are set as [0, 2 ∗ (w0 + l0)], where w0 isthe default transfer walking time2, i.e., the walking time fromone metro platform to another, and l0 is the train headway oftrain service l. As to the truncation points for T gs , T as and T ce ,we estimate their values according to the following iterativealgorithm 1.

2In the context of Singapore, we set w0 as 2 minutes.

Page 7: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

7

Algorithm 1: Estimation of truncation points

Data: D = ∪Nn=1tr(idn, on, dn, tn, cn); length of edgeelen for e ∈ E; maximum train travel speed lvmaxfor l ∈ L

Result: Estimated {ags , bgs}, {aas , bas} for s ∈ S, {ace, bce}for e ∈ E

initializationInitialize {ags , bgs}, {aas , bas}, {ace, bce} as 0Estimationfor e(si, sj , l) ∈ E do

Y ← {tr ∈ D|(tr.o = si ∧ tr.d = sj) ∨ (tr.o =sj ∧ tr.d = si)}tmin ← mintr∈Y (tr.t), tmax ← maxtr∈Y (tr.t)bce ← tmin, ace ← elen/lvmaxbgsi ← max(bgsi , tmax − tmin)basi ← max(basi , tmax − tmin)bgsj ← max(bgsj , tmax − tmin)basj ← max(basj , tmax − tmin)

DT17

DT18

DT8 DT10

DT7

DT6

DT5

DT3

DT2

DT13

DT1

CC27

CC26

CC25

CC24

CC23

CC21

CC20

CC19 DT9

CC16

CC17 CC12

CC14

CC11

CC10

CC8

CC7

CC6

CC5

CC3

CC2

DT15 CC4

DT16 CE1

EW16 NE3

NS27 CE2

CC28

NE4 DT19

NE5

NE7 DT12

NE8

NE9

NE10

NE11

NE12 CC13

NE13

NE14

NE15

NE16

NE17

CC29NE1

NS28

NS2

NS3

NS4

NS5 NS13

NS14

NS15

NS16

NS17 CC15

NS18

NS19

NS20

NS24 NE6 CC1

NS23

NS22

NS10NS9 NS11NS7 NS8

NS21 DT11

EW29

EW28

EW15

EW21 CC22

EW23

EW22

EW20

EW19

EW18

EW17

EW14 NS26

EW13 NS25

EW12 DT14

EW11

EW10

EW8 CC9 EW7 EW5EW6 EW4

EW3

EW1

CG2CG1

EW9

NS1 EW24EW27 EW26 EW25

EW2

8

9 10

7

6

5

4

2

11

12

3

1

Pioneer

Joo Koon

Jurong East

Bukit Gombak

Yew Tee

Bukit Batok

Choa Chu Kang

Boon Lay

Lakeside

ChineseGarden

Pasir Ris

PayaLebar

BedokEunos

Kallang

Clementi

Dover

Buona Vista

Commonwealth

Queenstown

Redhill

Tiong Bahru

Outram Park

Ra�es Place

City Hall

Bugis

Lavender

KembanganTanahMerah

Simei

Expo

ChangiAirport

Tanjong Pagar

Aljunied

Tampines

Bishan

Ang Mo Kio

Yio Chu Kang

Orchard

Khatib

Braddell

Novena

Yishun

Toa Payoh

Somerset

Newton

WoodlandsKranji

Marsiling Admiralty

Sembawang

Marina South Pier

Dhoby Ghaut

Marina Bay

Labrador Park

Pasir Panjang

Haw Par Villa

Kent Ridge

one-north

Caldecott

Marymount

Bartley

Lorong Chuan

Tai Seng

MacPherson

Dakota

Mountbatten

Stadium

Nicoll Highway

Esplanade

Bras Basah

Promenade

Bayfront

BotanicGardens

Farrer Road

Holland Village

Telok Blangah

HarbourFront

Serangoon

Clarke Quay

Sengkang

Chinatown

Little India

Farrer Park

Boon Keng

Potong Pasir

Woodleigh

Kovan

Buangkok

Hougang

Punggol

BukitPanjang

Downtown

Telok Ayer

Cashew

Tan Kah KeeSixth Avenue

King Albert Park

Beauty World

Hillview

Rochor

Stevens

Footnote: * Denote stations which are currently not in operation along existing lines

North South Line

North East Line

Downtown Line

Circle Line

Mass Rapid Transit System Map, C 2016 Singapore

East West Line

Fig. 3. Singapore MRT Network Map (as of May 2016)

It is noted that if more information about each individualstation is available, w0 can be set differently for differentstations.

V. CASE STUDYIn this section, we apply the proposed PIPE framework in

the context of Singapore Mass Rapid Transit (MRT) system. Inthe following, we first introduce the dataset used in this studyand the data preprocessing steps to remove outlier trip data inSection V-A; we then present the case study results, includingprediction results of time-dependent OD matrices, travel timeparameters, alighting rate at each MRT station and in-situpassenger density across the metro system in Section V-B, todemonstrate the prediction performance of the proposed PIPEframework.

A. Data Set

1) Singapore MRT System: Singapore’s MRT system playsan increasingly important role in Singapore public transitnetwork. Up to May 2016, the MRT network, as shown in

TABLE IEZ-LINK MRT RECORD SAMPLES

card id id type c entry date-time to

exit datetimetd

origin ido

destinationid d

02***5F adult 2016-01-2508:20:04

2016-01-2508:27:27

35 12

02***5F adult 2016-01-2518:13:57

2016-01-2518:21:25

12 35

02***5F adult 2016-01-2608:13:51

2016-01-2608:21:21

35 12

02***5F adult 2016-01-2618:31:45

2016-01-2618:38:11

12 35

Figure 3, consists of 102 stations, 7 MRT lines (including twoline extensions), and 114 edges between adjacent stations.

2) EZ-Link Card Data: EZ-Link card is the smart cardused in Singapore for the payment of public transport trips.251, 089, 965 MRT trip records, which were collected fromall the working days from January 1 to May 31 in 2016, areutilized as the data source in our study, since the passengerflow patterns of working days is significantly different fromthat of weekends and public holidays. However, our frameworkcan be easily applied to include weekend/public holiday dataprediction as well. As introduced in Section III, each MRTtrip is represented as Tr(id, o, d, to, td, c) or Tr(id, o, d, T, c),with four sample records listed in Table I.

3) Data Pre-Processing: Due to AFC system deficiencyand other technical limitations, some trips are not properlycaptured. Three types of noisy data are removed before weproceed with the data analysis: i) duplicate records for thesame trip; ii) outlier trip records with extremely long traveltime identified based on the interquartile range (IQR) rule;and iii) trip records with missing information. Consequently,about 5.3% of the records have been identified as noisy data.As the size of noisy data is significantly smaller than thatof the valid data, we assume that the removal of those noisyrecords will not bias our analysis.

B. Case Study Result and Discussion

In the following, we present the performance of PIPE,including i) OD matrix prediction, ii) travel time distributioninference, iii) alighting rate prediction and iv) in-situ passengerdensity prediction respectively.

1) OD Matrix Prediction: As presented in Section IV-A,in this paper we consider six different methods to predictOD matrix for a particular time window in a new day, basedon previously observed OD matrices and passenger inflowand outflow data of each metro station. For each method,we performed a grid search to select the meta parameters,e.g., the number of lags ∆, that can lead to the best results.We divide the five months’ EZ-link data into three disjointdatasets as follows: 70% of the data are used as training set,20% are used as the validation set to perform model selection,and the remaining 10% are used as testing set to evaluatethe models. We adopt the Mean Square Error (MSE) as themain performance metric. Table II reports the MSE of the100 most busy OD pairs, which covers 13.30% of the wholeAFC dataset, with different prediction ahead time steps, wherethe ahead time step refers to the length between current timewindow and the time window to be predicted.

Page 8: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

8

TABLE IIOD MATRIX PREDICTION ERROR (MSE)

ModelAhead time 1 4 6

Vanilla model 249.11 249.11 249.11Linear Regression with lasso 190.83 280.23 351.28Linear regression with ridge 191.20 283.76 354.14ARIMA 2136.98 2315.65 2382.65LSTM 205.21 228.00 235.10Random Forest 159.44 182.57 209.86

As can be observed from the results, among all the models,random forest produces the best results in general, followedby two linear regression models, while ARIMA and LSTMhave comparably worse performance. It indicates Vij(τ) ismore related to Xin

s (τ − l), Xouts (τ − l), l = 1, . . . , than to

Vij(τ − l), l = 1, . . .. This is because in Xins (τ − l), Xout

s (τ −l), l = 1, . . ., both temporal relation between the predictedvalue and the lagged passenger inflow and outflow, and spatialrelation between different stations are taken into consideration.Furthermore, compared with the linear models, random forestcan better capture nonlinear relation between Vij(τ) andXins (τ−l), Xout

s (τ−l), l = 1, . . .. Of course, some other non-linear spatio-temporal models can be also applied in practice,if better prediction performance can be achieved. Since thispart is not the focus of PIPE, in our following analysis, we justselect random forest for predicting Vij(τ), and other methodsare left open to practitioners. In addition, as we can observe,as the ahead time increases, it has negative influence on allthe models, excepted the vanilla model. This is reasonable. Asthe ahead time step m increase, more accumulated predictionerrors in Vij(τ+1), . . . , Vij(τ+m−1) are used as model input,and consequently deteriorate the prediction performance. Asto the vanilla model, since it simply utilizes historical averageas the prediction, it would not be influenced by the predictionahead step.

To better demonstrate the prediction results, three OD pairsare selected to report the random forest’s one-step aheadprediction performance for a particular day in Fig. 4. For allthree sample OD pairs, random forest is able to capture smalltemporal fluctuations of Vij(τ) much more accurately than thevanilla model. Take 〈Boon Lay, Jurong East〉 in Fig. 4 (a) asan example, random forest achieves a MSE of 22.86, whichis significantly lower than 334.29 achieved by vanilla model.

2) Travel Time Prediction: Similar as OD matrix predic-tion, here we use the 70% training data to infer the traveltime distribution of all the transit links based on the pro-posed truncated Gaussian mixture model. For demonstrationpurpose, we decompose the route from Tanah Merah stationto Changi Airport station in Singapore, and report the traveltime distribution for each of its transit links in Fig. 5. Notethat both Tanah Merah station and Changi Airport station arelocated at the Changi Airport Branch (CG) line, as shown inFig. 3. They are EW4 and CG2 respectively, with a distanceof 2 segment away along the CG line. There are in totalfour travel links that contribute to the travel time from TanahMerah station to Changi Airport station, as reported in Fig. 5,including a) entry link at Tanan Merah station, b) in-train travellink from Tanan Merah (EW4) to Expo station (CG1), c) in-

train travel link from Expo (CG1) to Changi Airport station(CG2), d) exit link at Changi Airport station. We notice thatthe mean of transit link a) is relatively larger than that ofother stations, this is because that CG line has a higher trainheadway (i.e., 6-9 minutes) than other service lines (i.e., 2-6minutes) and consequently passengers at EW4 usually spendmore time waiting for trains. We also notice that transit linkc) takes much longer time than transit link b), this is becausethe physical distance between CG1 and CG2 is much longerthan that between EW4 and CG1.

Based on the estimated travel time distribution of eachtransit link, we can infer the travel time distribution for eachOD pair. We first present results of some OD pairs with singleroute (i.e., |Rod|= 1) in Fig. 6, where the blue solid line is theempirical distribution of the travel time calculated by kerneldensity estimation based on the training data set, the orangedash line is the probability density function of our estimatedtruncated Gaussian distribution. As observed, PIPE is able toprovide a nice fit of the empirical travel time distribution.

We next present the results corresponding to OD pairs withmultiple routes in Fig. 7. Different from the results in Fig. 6,we could observe multiple modes, with each representingthe travel time distribution of one particular route and themagnitude of the mode value being proportional to its routechoice probability. As we can observe, PIPE is able to provideaccurate predictions on both travel time and route choiceprobabilities.

Furthermore, we analyze route preferences of differentsmart card types, which are captured by the weights oftruncated Gaussians in the mixture model. Fig. 8 shows routechoice probabilities of different smart card types for threeselected OD pairs. The numerical value in each cell representsthe estimated πcod. It shows that in Fig. 8 (a), where thereare three candidate routes bringing passengers from Admiraltystation to Farrer Park station, different smart card types sharesimilar route choice preferences. Most of the passengers preferroute 0, only small ratio of passengers take route 2, while nopassengers are willing to use route 1 to complete their trips.In contrast, in Fig. 8 (b), the route choice preferences variesacross smart card types. Both child and student passengerprefer route 0, while adults like route 1 more, and seniorcitizens like route 0 and route 1 almost equally. Anotherinteresting phenomenon we observed is, in many cases, eventhere are multiple candidate routes available, some of them arerarely or never travelled by people. For example, as shown inFig. 8 (c), almost all passengers take route 0 and very fewpassengers take route 1 when they travel from Ang Mo Kiostation to Kent Ridge station.

3) Alighting Rate Forecasting: Now based on the estima-tion of OD matrix, travel time parameters and route choiceprobabilities, we use the remaining 10% dataset to evaluate theprediction performance of PIPE. We first use PIPE to predictthe alighting rate (a.k.a., outflow) at each station based onEquation (3), whose ground truth is known. Take one-stepahead prediction results as an example. Fig. 9 reports thepredicted alighting rate of PIPE for three selected stations. Theactual outflow together with the prediction result of the vanillamodel are also reported. As can be observed, PIPE can track

Page 9: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

9

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Trip entry time

0

50

100

150

200

250

Trip

cou

ntActualRandom forestCalendar model

(a) Boon Lay to Jurong East

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Trip entry time

0

50

100

150

200

Trip

cou

nt

ActualRandom forestCalendar model

(b) Pasir Ris to Tampines

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Trip entry time

0

100

200

300

400

500

Trip

cou

nt

ActualRandom forestCalendar model

(c) Raffles Place to City HallFig. 4. OD matrix prediction results

2 4 6 8 10 12Travel time (minutes)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Pro

babi

lity

dens

ity

(a) Entry link at EW4 (µ=7.03)

1.0 1.5 2.0 2.5 3.0Travel time (minutes)

0.0

0.5

1.0

1.5

2.0

2.5P

roba

bilit

y de

nsity

(b) In-train travel link from EW4 toCG1 (µ=1.01)

2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0Travel time (minutes)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Pro

babi

lity

dens

ity

(c) In-train travel link from CG1 toCG2 (µ=4.97)

0 2 4 6 8Travel time (minutes)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Pro

babi

lity

dens

ity

(d) Exit link at CG2 (µ=1.63)

Fig. 5. Travel time fit for travel links of sample route (Tanah Merah (EW4) to Changi Airport (CG2))

30.0 32.5 35.0 37.5 40.0 42.5 45.0 47.5 50.0Travel time (minutes)

0.000

0.025

0.050

0.075

0.100

0.125

0.150

0.175

Pro

babi

lity

dens

ity

ActualPIPE

(a) Admiralty to Boon Lay

20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5Travel time (minutes)

0.000

0.025

0.050

0.075

0.100

0.125

0.150

0.175

Pro

babi

lity

dens

ity

ActualPIPE

(b) Jurong East to Woodlands

20 22 24 26 28 30Travel time (minutes)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Pro

babi

lity

dens

ity

ActualPIPE

(c) Beauty World to PromenadeFig. 6. Travel time fit for single-route OD pairs

45 50 55 60 65 70Travel time (minutes)

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Pro

babi

lity

dens

ity

ActualPIPE

(a) Admiralty to HarbourFront

32.5 35.0 37.5 40.0 42.5 45.0 47.5Travel time (minutes)

0.00

0.05

0.10

0.15

0.20

Pro

babi

lity

dens

ity

ActualPIPE

(b) Hougang to Kent Ridge

40 45 50 55 60 65Travel time (minutes)

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Pro

babi

lity

dens

ity

ActualPIPE

(c) Yew Tee to City HallFig. 7. Travel time fit for multi-route OD pairs

Adult Child Senior Citizen StudentSmart card type

01

2R

oute

inde

x

0.70 0.75 0.64 0.78

0.00 0.00 0.00 0.00

0.29 0.25 0.36 0.22

0.0

0.2

0.4

0.6

0.8

1.0

(a) Admiralty to Farrer Park

Adult Child Senior Citizen StudentSmart card type

01

23

Rou

te in

dex

0.23 0.78 0.52 1.00

0.64 0.22 0.48 0.00

0.13 0.00 0.00 0.00

0.00 0.00 0.00 0.00

0.0

0.2

0.4

0.6

0.8

1.0

(b) Chinatown to Raffles Place

Adult Child Senior Citizen StudentSmart card type

01

Rou

te in

dex

0.94 1.00 0.92 1.00

0.06 0.00 0.08 0.00

0.0

0.2

0.4

0.6

0.8

1.0

(c) Ang Mo Kio to Kent RidgeFig. 8. Route choice preferences by smart card type

the small temporal fluctuations of outflow of the testing daywell, while vanilla model fails to capture lots of details, suchas the magnitudes of the peak hour. Though the performanceof PIPE deteriorate as the prediction ahead step increases, itstill performs consistently better than the vanilla method. Fur-

thermore, we consider directly using random forest to predictXouti (τ) by taking Xin

i (τ − l), Xouti (τ − l), l = 1, . . . ,∆ as

input. This can be regarded as another baseline of PIPE, whichuses a black box model for alighting rate prediction, withoutconsidering the travel behaviors or traces of passengers inside

Page 10: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

10

TABLE IIIALIGHTING RATE PREDICTION ERROR (MSE)

ModelAhead time 1 4 6

Calendar model 6840.64 6840.64 6840.64Random Forest 3328.21 4826.65 5268.05PIPE 2433.20 2591.15 2963.79

the metro system. As shown in Table III, PIPE outperformsrandom forest by a large extent in terms of MSE, whichdemonstrates the efficacy and advantages of our white-boxmodel.

4) In-situ Passenger Density Prediction: Last but not least,we visualize the prediction results of in-situ passenger density.It is noted that we cannot achieve the ground truth data for thispart. We first choose the most busy line, i.e., East-West (EW)line to illustrate the passenger density of different segmentsover time. For better spatio-temporal pattern illustration, weshow the passenger densities along two train directions, i.e.,to east direction and to west direction, separately. As shownin Fig. 10, there are two commuting peak periods, one is themorning peak hour occurring around 8am, when people lefthome and make trip to office. Therefore we can observe amorning peak originating from western residential districts(e.g., Jurong East station) all the way to central businessdistricts (CBD) (e.g., City Hall station) in Fig. 10 (a), andanother morning peak originating from eastern residentialareas (e.g., Bedok Station) all the way to CBD in Fig. 10 (b).The other peak hour occurs at about 6pm, when people get offwork and make trip home or go to places for entertainmentactivities like dinner and shopping. Consequently, we canobserve a evening peak originating from CBD to easternresidential districts in Fig. 10 (a), and another evening peakoriginating from CBD to western residential areas in Fig. 10(b).

Then we evaluate the spatial distribution of passenger den-sity by reporting in-situ passenger density snapshots of thewhole metro system at particular time point. For example,Fig. 11 reports the passenger density distribution of the wholeSingapore MRT system at 9am. Here the node size indicatesthe number of passengers inside each MRT station at thecorresponding timestamp, and color intensity of each segmentis proportional to the number of passengers traveling on thesegment. Obviously there are a few MRT stations servinglarger number of passengers than the rest, most of these busystations are transit hubs where people make transfers or stop byfor activities like dining. We also notice that the most crowdededge is the one between Yio Chu Kang station and Khatibstation, this is because this edge is the longest in the MRTsystem, and in most of the time there are two trains runningon the same edge, while for other edges, in most of the time,there is only one single train.

VI. CONCLUSION

In this paper, we proposed a statistical inference frameworkPIPE that makes fine-grain and accurate in-situ passengerdensities prediction across the metro network. PIPE con-ducted inference tasks including time-dependent OD matrices

forecasting, study of travel time distribution of any travellink inside the metro network, and inference of route choiceprobabilities. Based on derived parameters from the inferencetasks, we further estimate the passenger flow properties interms of alighting rate at each metro station and in-situ passen-ger density distribution. We apply our solution in SingaporeMRT network, and the satisfactory prediction performancedemonstrates its applicability and efficiency.

REFERENCES

[1] M.-P. Pelletier, M. Trepanier, and C. Morency, “Smart card data usein public transit: A literature review,” Transportation Research Part C:Emerging Technologies, vol. 19, no. 4, pp. 557–568, 2011.

[2] L. Heydenrijk-Ottens, V. Degeler, D. Luo, N. van Oort, and J. van Lint,“Supervised learning: Predicting passenger load in public transport,” inCASPT, 2018.

[3] G. Vandewiele, P. Colpaert, O. Janssens, J. Van Herwegen, R. Verborgh,E. Mannens, F. Ongenae, and F. De Turck, “Predicting train occupanciesbased on query logs and external data sources,” in WWW’17 Companion,pp. 1469–1474.

[4] K. Pasini, M. Khouadjia, A. Same, F. Ganansia, and L. Oukhellou, “Lstmencoder-predictor for short-term train load forecasting,” in ECML PKDD2019, pp. 535–551.

[5] L. Hong, W. Li, and W. Zhu, “Assigning passenger flows on a metronetwork based on automatic fare collection data and timetable,” DiscreteDynamics in Nature and Society.

[6] X. Xu, Y. Dou, Z. Zhou, T. Liao, Y. Lu, and Y. Tan, “Railway passengerflow forecasting based on time series analysis with big data,” in CCDC.IEEE, 2018, pp. 3584–3590.

[7] E. Chen, Z. Ye, C. Wang, and M. Xu, “Subway passenger flow predictionfor special events using smart card data,” IEEE Transactions on ITS,vol. 21, no. 3, pp. 1109–1120, 2019.

[8] Y. Gong, Z. Li, J. Zhang, W. Liu, Y. Zheng, and C. Kirsch, “Network-wide crowd flow prediction of sydney trains via customized online non-negative matrix factorization,” in ICKM, 2018, pp. 1243–1252.

[9] Y. Sun, B. Leng, and W. Guan, “A novel wavelet-svm short-timepassenger flow prediction in beijing subway system,” Neurocomputing,vol. 166, pp. 109–121, 2015.

[10] Y. Li, X. Wang, S. Sun, X. Ma, and G. Lu, “Forecasting short-termsubway passenger flow under special events scenarios using multiscaleradial basis function networks,” Transportation Research Part C: Emerg-ing Technologies, vol. 77, pp. 306–328, 2017.

[11] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow predictionwith big data: a deep learning approach,” IEEE Transactions on ITS,vol. 16, no. 2, pp. 865–873, 2014.

[12] Y. Liu, Z. Liu, and R. Jia, “Deeppf: A deep learning based architecturefor metro passenger flow prediction,” Transportation Research Part C:Emerging Technologies, vol. 101, pp. 18–34, 2019.

[13] S. Hao, D.-H. Lee, and D. Zhao, “Sequence to sequence learningwith attention mechanism for short-term passenger flow prediction inlarge-scale metro system,” Transportation Research Part C: EmergingTechnologies, vol. 107, pp. 287–300, 2019.

[14] K. Ashok and M. E. Ben-Akiva, “Estimation and prediction of time-dependent origin-destination flows with a stochastic mapping to pathflows and link flows,” Transportation Science, vol. 36, no. 2, pp. 184–198, 2002.

[15] X. Chen, S. Guo, L. Yu, and B. Hellinga, “Short-term forecasting oftransit route od matrix with smart card data,” in ITSC. IEEE, 2011,pp. 1513–1518.

[16] E. Van der Hurk, L. G. Kroon, G. Maroti, and P. Vervest, “Dynamicforecast model of time dependent passenger flows for disruption man-agement,” in CASPT, 2012, pp. 23–27.

[17] F. Toque, E. Come, M. K. El Mahrsi, and L. Oukhellou, “Forecastingdynamic public transport origin-destination matrices with long-shortterm memory recurrent neural networks,” in ITSC. IEEE, 2016, pp.1071–1076.

[18] J. Hu, B. Yang, C. Guo, C. S. Jensen, and H. Xiong, “Stochastic origin-destination matrix forecasting using dual-stage graph convolutional,recurrent neural networks,” in ICDE. IEEE, 2020, pp. 1417–1428.

[19] H. Shi, Q. Yao, Q. Guo, Y. Li, L. Zhang, J. Ye, Y. Li, and Y. Liu, “Pre-dicting origin-destination flow via multi-perspective graph convolutionalnetwork,” in ICDE. IEEE, 2020, pp. 1818–1821.

Page 11: Crowding Prediction of In-Situ Metro Passengers Using ... · 1 Crowding Prediction of In-Situ Metro Passengers Using Smart Card Data Xiancai Tian, Chen Zhang, Baihua Zheng Abstract—The

11

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Trip entry time

0

100

200

300

400

500

600Al

ight

ing

rate

ActualPIPECalendar model

(a) Bartley

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Trip entry time

0

500

1000

1500

2000

2500

Alig

htin

g ra

te

ActualPIPECalendar model

(b) City Hall

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Trip entry time

0

250

500

750

1000

1250

1500

1750

Alig

htin

g ra

te

ActualPIPECalendar model

(c) PunggolFig. 9. Alighting rate prediction results

0 6 12 18 24

Joo Koon

Pioneer

Boon Lay

Lakeside

Chinese Garden

Jurong East

Clementi

Dover

Buona Vista

Commonwealth

Queenstown

Redhill

Tiong Bahru

Outram Park

Tanjong Pagar

Raffles Place

City Hall

Bugis

Lavender

Kallang

Aljunied

Paya Lebar

Eunos

Kembangan

Bedok

Tanah Merah

Simei

Tampines

Pasir Ris 0

500

1000

1500

2000

2500

(a) To east direction

0 6 12 18 24

Joo Koon

Pioneer

Boon Lay

Lakeside

Chinese Garden

Jurong East

Clementi

Dover

Buona Vista

Commonwealth

Queenstown

Redhill

Tiong Bahru

Outram Park

Tanjong Pagar

Raffles Place

City Hall

Bugis

Lavender

Kallang

Aljunied

Paya Lebar

Eunos

Kembangan

Bedok

Tanah Merah

Simei

Tampines

Pasir Ris 0

500

1000

1500

2000

(b) To west direction

Fig. 10. East-West line passenger density

Yio Chu Kang

Khatib

Fig. 11. In-situ passenger density @ 9am

[20] J. Zhao, F. Zhang, L. Tu, C. Xu, D. Shen, C. Tian, X.-Y. Li, and Z. Li,“Estimation of passenger route choice pattern using smart card data forcomplex metro systems,” IEEE Transactions on ITS, vol. 18, no. 4, pp.790–801, 2016.

[21] L. Sun, Y. Lu, J. G. Jin, D.-H. Lee, and K. W. Axhausen, “An integratedbayesian approach for passenger flow assignment in metro networks,”Transportation Research Part C: Emerging Technologies, vol. 52, pp.116 – 131, 2015.

[22] X. Xu, L. Xie, H. Li, and L. Qin, “Learning the route choice behaviorof subway passengers from afc data,” Expert Systems with Applications,vol. 95, pp. 324–332, 2018.

[23] X. Tian, B. Zheng, Y. Wang, H.-T. Huang, and C.-C. Hung, “Tripde-coder: Study travel time attributes and route preferences of metrosystems from smart card data,” arXiv preprint arXiv:2005.01492, 2020.

[24] N. Colombo, R. Silva, and S. M. Kang, “Tomography of the londonunderground: a scalable model for origin-destination data,” in NIPS,2017, pp. 3062–3073.

[25] L. Sun, D.-H. Lee, A. Erath, and X. Huang, “Using smart card datato extract passenger’s spatio-temporal density and train’s trajectory ofmrt system,” in SIGKDD international workshop on urban computing,

2012, pp. 142–148.[26] F. Zhang, J. Zhao, C. Tian, C. Xu, X. Liu, and L. Rao, “Spatiotemporal

segmentation of metro trips using smart card data,” IEEE Transactionson Vehicular Technology, vol. 65, no. 3, pp. 1137–1149, 2015.

[27] E. Jenelius, “Data-driven metro train crowding prediction based on real-time load data,” IEEE Transactions on ITS, vol. 21, no. 6, pp. 2254–2265, 2019.

[28] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Jour-nal of the Royal Statistical Society: Series B (Methodological), vol. 58,no. 1, pp. 267–288, 1996.

[29] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimationfor nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67,1970.

[30] A. Liaw, M. Wiener et al., “Classification and regression by randomfor-est,” R news, vol. 2, no. 3, pp. 18–22, 2002.

[31] S. Makridakis and M. Hibon, “Arma models and the box–jenkinsmethodology,” Journal of Forecasting, vol. 16, no. 3, pp. 147–163, 1997.

[32] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neuralcomputation, vol. 9, no. 8, pp. 1735–1780, 1997.

[33] L. Prechelt, “Early stopping-but when?” in Neural Networks: Tricks ofthe trade. Springer, 1998, pp. 55–69.

[34] Y. Wang, W. Dong, L. Zhang, D. Chin, M. Papageorgiou, G. Rose,and W. Young, “Speed modeling and travel time estimation based ontruncated normal and lognormal distributions,” Transportation researchrecord, vol. 2315, no. 1, pp. 66–72, 2012.

[35] F. Cozman and E. Krotkov, “Truncated gaussians as tolerance sets,”Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-RI-TR-94-35, September 1994.