1 Taxi demand forecasting: A HEDGE based …krishnaj/Publications_files/papers/ITS_Neema.pdf ·...

12
1 Taxi demand forecasting: A HEDGE based tessellation strategy for improved accuracy Neema Davis, Gaurav Raina, Krishna Jagannathan Abstract—A key problem in location-based modeling and forecasting lies in identifying suitable spatial and temporal resolutions. In particular, judicious spatial partitioning can play a significant role in enhancing the performance of location-based forecasting models. In this paper, we investigate two widely used tessellation strategies for partitioning city space, in the context of real-time taxi demand forecasting. Our study compares (i) Geohash tessellation, and (ii) Voronoi tessellation, using two distinct taxi demand data sets, over multiple time scales. For the purpose of comparison, we employ classical time-series tools to model the spatio-temporal demand. Our study finds that the performance of each tessellation strategy is highly dependent on the city geography, spatial distribution of the data, and the time of the day, and that neither strategy is found to perform optimally across the forecast horizon. We propose a hybrid algorithm that picks the best tessellation strategy at each instant, based on their recent performance. Our hybrid algorithm is a non-stationary variant of the well-known HEDGE algorithm for choosing the best advice from multiple experts. We show that the hybrid strategy performs consistently better than either of the two tessellation strategies across the data sets considered, at multiple time scales, and with different performance metrics. We achieved an average accuracy of above 80% per km 2 for both data sets considered at 60 minute aggregation levels. KeywordsTaxi Demand, Forecasting, Time-series, Geohash, Voronoi, HEDGE. I. I NTRODUCTION Mobile application based e-hailing taxi services are gaining popularity across the world due to the advancement in GPS based smart phone technologies. These services can be used to supplement the use of public transit and other traditional modes of transportation. A key challenge faced by these fast growing taxi services is the demand-supply mismatch problem. During peak hours, the demand for taxis surpass the available supply, creating unmet demand. During off-peak hours, the scenario reverses and the vacant taxis cruise for longer periods to find passengers. These issues lead to dynamic surge pricing, along with reduced customer satisfaction and low driver utilization. Therefore, it is crucial to devise accurate location-specific demand forecasting algorithms, so as to gain prior knowledge of under-supplied and over-supplied areas. This knowledge can be used to mitigate the demand-supply imbalance by re-routing vacant taxis, to compensate for the unmet demand. One of the key steps towards devising efficient location-based forecasting algorithms is the selection of appropriate spatial resolution techniques. Each cell should contain sufficient demand density N. Davis, G. Raina and K. Jagannathan are with the Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai 600 036, India. E-mail: {ee14d212, gaurav, krishnaj}@ee.iitm.ac.in for prediction algorithms to be effective. At the same time, the driver should not travel a large distance before finding a passenger. Hence, a carefully planned tessellation strategy is a key step in any location-based modeling exercise. A. Related works Demand-supply levels in taxi services have been studied in the literature [1], [2]. Identification and modeling of passenger hot spots for rerouting taxi drivers is also a widely researched area. Some of the commonly employed modeling techniques are the Auto Regressive Integrated Moving Average model (ARIMA) and its variants [3], [4], Exponential Weighted Average (EWA) models [5], Nearest Neighbour clustering [6], [7], and Neural Networks (NN) [8]. Irrespective of the modeling technique employed, the preliminary step in modeling a location-based entity such as passenger demand is to spatially partition the space. In the transportation literature, two common tessellation strategies are used. Spatial aggregations are either performed using grids, where the space is partitioned into square or rectangular grids of fixed area [3], [8], [9]; or using polygons, where the space is partitioned into regular or irregular polygons of variable area [10], [11], [12]. It is common practice in the transportation literature to consider either one of these tessellation styles for spatial partitioning. In our previous work [13], we tessellated the city of Ben- galuru, India, into fixed-sized partitions known as geohashes, obtained from Geohash tessellation. We observed that for those regions with low demand density, fixed sized partitions resulted in data scarcity, which led to low model accuracy. To improve the accuracy, we explored the spatial correlation between the neighbouring geohashes. The performance limitation due to the chosen tessellation strategy motivates us to conduct a comparison study of various tessellation strategies. It is natural to ask the following questions in this context: (i) How can one decide the tessellation scheme to be used?, (ii) How sensitive is the performance of the models to the tessellation strategy?, (iii) Can we arrive at a tessellation strategy that performs well for a broad range of data sets? We aim to address these questions in this paper. To the best of our knowledge, an extensive study of tessellation techniques and their effects on model performance have not been conducted in the past. In this paper, we explore the relationships between tessellation strategies, demand densities, and city geographies. This is one of the features that distinguishes our work from the existing literature. For comparing fixed and variable sized tessellation styles, we choose Geohash tessellation, and Centroidal Voronoi tes- sellation with K-Means [16] respectively. We note that each tessellation strategy has been shown to outperform the other

Transcript of 1 Taxi demand forecasting: A HEDGE based …krishnaj/Publications_files/papers/ITS_Neema.pdf ·...

1

Taxi demand forecasting: A HEDGE based

tessellation strategy for improved accuracyNeema Davis, Gaurav Raina, Krishna Jagannathan

Abstract—A key problem in location-based modeling andforecasting lies in identifying suitable spatial and temporalresolutions. In particular, judicious spatial partitioning can playa significant role in enhancing the performance of location-basedforecasting models. In this paper, we investigate two widely usedtessellation strategies for partitioning city space, in the contextof real-time taxi demand forecasting. Our study compares (i)Geohash tessellation, and (ii) Voronoi tessellation, using twodistinct taxi demand data sets, over multiple time scales. Forthe purpose of comparison, we employ classical time-series toolsto model the spatio-temporal demand. Our study finds that theperformance of each tessellation strategy is highly dependent onthe city geography, spatial distribution of the data, and the timeof the day, and that neither strategy is found to perform optimallyacross the forecast horizon. We propose a hybrid algorithm thatpicks the best tessellation strategy at each instant, based on theirrecent performance. Our hybrid algorithm is a non-stationaryvariant of the well-known HEDGE algorithm for choosing thebest advice from multiple experts. We show that the hybridstrategy performs consistently better than either of the twotessellation strategies across the data sets considered, at multipletime scales, and with different performance metrics. We achievedan average accuracy of above 80% per km2 for both data setsconsidered at 60 minute aggregation levels.

Keywords—Taxi Demand, Forecasting, Time-series, Geohash,Voronoi, HEDGE.

I. INTRODUCTION

Mobile application based e-hailing taxi services are gainingpopularity across the world due to the advancement in GPSbased smart phone technologies. These services can be used tosupplement the use of public transit and other traditional modesof transportation. A key challenge faced by these fast growingtaxi services is the demand-supply mismatch problem. Duringpeak hours, the demand for taxis surpass the available supply,creating unmet demand. During off-peak hours, the scenarioreverses and the vacant taxis cruise for longer periods to findpassengers. These issues lead to dynamic surge pricing, alongwith reduced customer satisfaction and low driver utilization.Therefore, it is crucial to devise accurate location-specificdemand forecasting algorithms, so as to gain prior knowledgeof under-supplied and over-supplied areas. This knowledge canbe used to mitigate the demand-supply imbalance by re-routingvacant taxis, to compensate for the unmet demand. One of thekey steps towards devising efficient location-based forecastingalgorithms is the selection of appropriate spatial resolutiontechniques. Each cell should contain sufficient demand density

N. Davis, G. Raina and K. Jagannathan are with the Department of ElectricalEngineering, Indian Institute of Technology Madras, Chennai 600 036, India.E-mail: {ee14d212, gaurav, krishnaj}@ee.iitm.ac.in

for prediction algorithms to be effective. At the same time,the driver should not travel a large distance before finding apassenger. Hence, a carefully planned tessellation strategy is akey step in any location-based modeling exercise.

A. Related works

Demand-supply levels in taxi services have been studied in theliterature [1], [2]. Identification and modeling of passenger hotspots for rerouting taxi drivers is also a widely researched area.Some of the commonly employed modeling techniques are theAuto Regressive Integrated Moving Average model (ARIMA)and its variants [3], [4], Exponential Weighted Average (EWA)models [5], Nearest Neighbour clustering [6], [7], and NeuralNetworks (NN) [8]. Irrespective of the modeling techniqueemployed, the preliminary step in modeling a location-basedentity such as passenger demand is to spatially partition thespace. In the transportation literature, two common tessellationstrategies are used. Spatial aggregations are either performedusing grids, where the space is partitioned into square orrectangular grids of fixed area [3], [8], [9]; or using polygons,where the space is partitioned into regular or irregular polygonsof variable area [10], [11], [12]. It is common practice inthe transportation literature to consider either one of thesetessellation styles for spatial partitioning.

In our previous work [13], we tessellated the city of Ben-galuru, India, into fixed-sized partitions known as geohashes,obtained from Geohash tessellation. We observed that for thoseregions with low demand density, fixed sized partitions resultedin data scarcity, which led to low model accuracy. To improvethe accuracy, we explored the spatial correlation between theneighbouring geohashes. The performance limitation due tothe chosen tessellation strategy motivates us to conduct acomparison study of various tessellation strategies. It is naturalto ask the following questions in this context: (i) How can onedecide the tessellation scheme to be used?, (ii) How sensitiveis the performance of the models to the tessellation strategy?,(iii) Can we arrive at a tessellation strategy that performswell for a broad range of data sets? We aim to address thesequestions in this paper. To the best of our knowledge, anextensive study of tessellation techniques and their effects onmodel performance have not been conducted in the past. Inthis paper, we explore the relationships between tessellationstrategies, demand densities, and city geographies. This is oneof the features that distinguishes our work from the existingliterature.

For comparing fixed and variable sized tessellation styles,we choose Geohash tessellation, and Centroidal Voronoi tes-sellation with K-Means [16] respectively. We note that eachtessellation strategy has been shown to outperform the other

2

[14], [15]. In [14], the authors showed that the Geohashtechnique performs better at partitioning data than the Voronoitechnique for their data set, while in [15], the authors pre-ferred Voronoi over Geohash for forecasting supply of driversin the demand dense localities. In this paper, we favour apartition based clustering technique such as K-Means [16]over other clustering techniques. The reader is referred to [17]and [18] for a comprehensive survey of various clusteringalgorithms. In the survey paper [18], K-Means is listed asa promising clustering algorithm for large data sets, due toits low time complexity, and high computing efficiency. K-Means has a linear memory and time complexity, which isideal for our very large data sets. It has also been shownthat K-Means performs reasonably well in comparison withother clustering techniques; for example, DBSCAN [20] andHierarchical clustering [19], among others. In addition, K-Means has been widely used in the literature to generateCentroidal Voronoi polygons [21]. Of the widely used model-ing techniques, Exponential Smoothing and ARIMA modelsare known for their accuracy, ease of implementation, andhigh computational speed [22]. Their modeling performancehave been shown to be comparable to other methods suchas Bayesian networks, Support Vector Regression (SVR), andArtifical Neural Networks (ANN) [23], [24]. The authors in[3] proposed an improved ARIMA based prediction methodto forecast the spatial-temporal variation of passengers in thehot spots with a prediction error of 5.8%. Using a time varyingPoisson process and ARIMA model, the work in [4] obtainedan accuracy of 76% for passenger demand on taxi services.Clustering along with Exponential Weighted Moving Averagemodels were used to predict passenger demand hot spots witha 79.6% hit ratio in [5]. The subway ridership demand wasanalysed in [25] using a combined ARIMA-GARCH modelwith a maximum error of 7%. This promising performanceof regression and smoothing based models in the context ofpassenger demand modeling, along with high computationalspeeds, make them appealing choices for our comparisonstudy.

We aim to perform an extensive city wide spatio-temporalanalysis and compare the two tessellation techniques for twoindependent data sets – an e-hailing mobile application baseddemand data set in Bengaluru, India and a street hailing basedtaxi demand data set in New York city, USA. Further, inaddition to comparing the two strategies, we also propose ahybrid strategy by using a combination of the two tessellationstrategies based on their past performances, suitable for anydata set. To the best of our knowledge, no similar study hasbeen made in the transportation literature to use a combinationof tessellation strategies.

B. Our contributions

The demand data points are segregated into clusters using theK-Means clustering algorithm. The cluster centroids that weobtain are used to partition the city into geohashes and Voronoicells. In previous studies, road intersections [10], [12] andbus stops [26] were employed to act as tessellation centers.In contrast, we use K-Means cluster centroids to act as thetessellation centers. Different time-series modeling techniques

are applied on the Geohash and the Voronoi aggregated datafrom each centroid to compare the two techniques. The datais aggregated and analyzed at two different time-scales; at15 minute to enable real-time decisions, and at 60 minute toobserve patterns on longer time scales. While comparing thetwo tessellation strategies, we observe that the performance ofthe strategies vary with different data sets. They also showedtime dependent variations within each data set. In order to dealwith this apparent non-stationarity, we need an algorithm thatcan pick the best tessellation strategy at every time instant.

The method of using advices from multiple experts1 wasfirst introduced in [27], and later generalized in [28] to arrive atthe well-known HEDGE algorithm. Specifically, by adoptinga multiplicative update of the weight parameter, the authorsof [28] were able to produce algorithms performing almost aswell as the best expert in the pool. The authors in [29] extendedthe HEDGE algorithm to include non-stationary experts, byintroducing a discounting factor. We modify the discountedvariant of HEDGE to arrive at a hybrid tessellation strategyfor enhanced taxi demand forecasts.

The key contributions and findings of this paper can besummarized as follows:

1) We conduct a comparative study of Voronoi and Geo-hash tessellation strategies for spatial demand partition-ing.

2) We highlight the dependence of the performance of thetessellation strategy on the time of the day, and on theproperties of the data set.

3) We find that models based on Voronoi tessellation havesuperior performance compared to models based onGeohash tessellation for demand scarce cells. On theother hand, Geohash tessellation performs better thanVoronoi tessellation in demand dense cells.

4) We develop a hybrid tessellation strategy using aHEDGE based combining algorithm. We find that ourhybrid strategy always performs at least as well as thebest strategy out of the list of experts.

5) Our algorithm picks the best tessellation strategy foreach instant in the forecasting horizon, across varioustemporal and spatial resolutions.

At a broader level, our work demonstrates the potential ofimproving location-based forecasts by efficiently partitioninglarge-scale temporal and spatial data for taxi hailing services.The rest of the paper is organized as follows. In Section II, weoutline the problem setting and provide a brief description ofthe data. In Section III, the tessellation strategies are outlined.The time-series modeling procedure is explained, along withsome preliminary results in Section IV. The proposed modelcombining algorithm and the results are provided in SectionV, and we conclude in Section VI.

II. PROBLEM SETTING

Let yt be the number of taxi bookings (i.e., the demand)observed at time t in a particular region R. In order to modelyt, we can represent the bookings as a time-sequence, wherethe sequence contains points aggregated over the space R at

1In our case, the “experts” refer to the tessellation strategies.

3

time t1, . . . , tn. The space R can be fixed-sized or variable-sized. In our case, we denote them as geohash and Voronoicells respectively. Geohash partitioning of a city is straightforward, as it does not incorporate neighbour information orthe volume of the data. On the other hand, to perform Voronoipartitioning, we need to run a clustering algorithm to clusternearby points together. In order to reduce the passenger searchtime, we aim to route drivers to regions of area no morethan 1 km2. The parameters in the clustering algorithm areset accordingly. We define the density of a K-Means clustercentroid as the number of data points closer to that centroidthan to any other centroid. We assume that a unique smallestdistance K-Means centroid exists for each data point.

For an idle taxi, we aim to provide the driver with thelocation of the nearest suitable K-Means centroid. The de-mand aggregated from a Voronoi cell may vary in magnitudefrom the demand aggregated from its corresponding geohash.Hence, for a standard comparison of the two techniques, wenormalize the demand by the area of the partition to obtaindemand per km2. The area normalized demand Dnorm for theP th geohash/Voronoi partition is computed as follows:

Dnorm =aggregate(d) over sp period

area(P ), (1)

where d refers to the demand in P, sp is the samplingperiod of 15 minute or 60 minute intervals. This Dnorm isused to generate the time-sequences for further analysis. TheSymmetric Mean Absolute Percent Error (SMAPE) and MeanAbsolute Scaled Error (MASE) [30] are the error performancemetrics that are considered in this paper, and are defined inSection IV. See Fig. 1 for a schematic representation of thework that will be conducted in this paper. First, we tessellatethe city using Voronoi and Geohash strategies. Then, we applytime-series modeling techniques on the data, and finally, wedevelop a hybrid tessellation strategy based on the dHEDGEalgorithm. The data sets used for our study are mentionedbelow.

A. Data set description

The Bengaluru taxi demand data is acquired from a leadingIndian e-hailing taxi service provider. The data contains GPStraces of taxi passengers booking a taxi by logging into themobile application. The data is available for a period of

two months, 1st of January 2016 to 29th of February 2016.The data set contains latitude-longitude coordinates of thelogged in customer, along with his/her identification number,session duration and time stamp. The latitude and longitudecoordinates of the city are 12.9716 N, 77.5946 E, with an areaof approximately 740 km2.

The New York yellow taxi cab data set is publicly availableat [31]. The data set contains GPS traces of a street hailingyellow taxi service. This data set differs from the mobileapplication based data set, both in terms of the volume ofthe data and the city structure. The geographical structure ofBengaluru is radial, while that of New York city is linear. Weconsidered the period of January-February 2016, for analysis.We extract the pick up locations and time stamps from thedata to form the demand data. The latitude and longitude

STL with ARIMA, ETS, Naive, Random driftTBATS

GeohashVoronoi

Tessellation Strategies

dHEDGE

Time-series Models

Combining Algorithm

Fig. 1: Flow chart of the study in this paper.

coordinates of New York city are 40.7128 N, 74.0059 W, withan area of approximately 780 km2.

III. TESSELLATION STRATEGIES

As motivated in Section II, we perform spatial partitioningof both cities using two tessellation strategies for the purposeof modeling and comparison. We generate clusters of demandpoints for each city, and then, the centroids of these clustersaid in the generation of tessellation cells.

A. K-Means algorithm

K-Means [16] is a widely used unsupervised learning algo-rithm to classify a given data set into a certain number ofclusters (assume k clusters) fixed apriori. This algorithm aimsto minimize the squared error function given as:

J =

k∑

j=1

n∑

i=1

||yj(i) − cj ||2, (2)

where ||yj(i) − cj ||2 is the Euclidean distance between a data

point yj(i) and its center cj , and n is the total number of

data points. For ensuring that the average cluster area remainsclose to 1 km2, the number of centers K is set to 740 and780 for Bengaluru and New York city respectively. For eachdata point, the algorithm calculates the distance from the datapoint to each cluster. If the data point is closest to its owncluster, we leave it where it is, else, we move it into the closestcluster. The algorithm stops when no data point is reassigned.By sorting these clusters in descending order of density, wecan find locations that generate relatively higher demand (hotspots) compared to other locations.

B. Voronoi tessellation

Voronoi tessellation is a spatial partitioning method that dividesspace according to the nearest neighbour-rule. Specifically,each point, called a seed or site, is associated with the region(i.e., the Voronoi cell) that is closer to it than all other points inthe space. A Voronoi tessellation is called a Centroidal Voronoitessellation when the generating site of each Voronoi cell isalso its mean (center of mass). In our work, these sites areobtained from the K-Means algorithm. Based on the closenessof sites, this tessellation strategy produces polygon partitionsof varying areas. For example, the 740 demand centers result in740 variable-sized quadrilateral partitions for Bengaluru. Notethat the overall time complexity of the Voronoi and K-Meansalgorithm is O(nlogn). See Fig. 2a and Fig. 3a for heat maps

4

77.3 77.55 77.8

12.7

13

Longitude

Lat

itude

(a) Voronoi tessellation

77.3 77.55 77.8

12.7

13

Longitude

Lat

itude

(b) Geohash tessellation

SCALE

0-7500

7500-17500

17500-35000

35000-70000

70000-230000

Fig. 2: Heat maps obtained after partitioning Bengaluru into Voronoi cells and 6-level geohashes.

-74.01 -73.99 -73.97

40.7

10

40.7

11

Longitude

Lat

itude

(a) Voronoi tessellation

-74.01 -73.99 -73.97

40.7

10

40.7

11

Longitude

Lat

itude

(b) Geohash tessellation

SCALE

0-7500

7500-35000

35000-70000

70000-140000

140000-220000

220000-250000

Fig. 3: Heat maps obtained after partitioning New York city into Voronoi cells and 6-level geohashes.

generated from Voronoi tessellations of Bengaluru and NewYork city. The partitioned cells are color-coded according totheir sample size, for ease of representation on the color scale.

C. Geohash tessellation

Geohash is a technique that hashes the latitude and longitudecoordinates into a character string. It is an extension of thegrid based method, with a simple naming convention. Note thatwith Geohash (first G capitalized), we refer to the technique ofencoding a coordinate pair into a single string, while geohash(all small letters) refers to the string itself. Geohash can bevisualized as a division of Earth into 32 planes, each of whichcan be divided again into 32 planes, and so on. We refer tothese divisions as Geohash levels by defining level x as thedivision that results in geohashes of string length x. A 6-level geohash spans a grid of area 1.2 km × 0.6 km, coveringapproximately 1 km2, where a 5-level geohash spans an area of4.9 km × 4.9 km, covering approximately 25 km2. We choose6-level grids over 5-level for better re-routing of idle taxis. Interms of time complexity, this algorithm is O(1). See Fig. 2band Fig. 3b for heat maps obtained from Geohash tessellationsof Bengaluru and New York city.

D. Observations

1) Bengaluru data set

On observing the heat maps generated by the two tessellationtechniques from Fig. 2, we make the following inferences.Geohash is a region-oriented city map partition approach, andtherefore results in some cells that are highly dense, and in

some cells that are highly sparse. On the other hand, Voronoicells are more uniformly distributed in terms of density (seeFig. 4a and Fig. 4b). The Voronoi tessellation technique triesto uniformly distribute samples among the partitions. This, infact, increases the chance of finding a passenger in a cell asthere are fewer “demand scarce” cells. On the other hand,this process of uniformly distributing samples increases thepartition size to above 1 km2 for some Voronoi cells, as seenin Fig. 4a. As a result, the driver might have to traverse a largerdistance to find a passenger. The size of the smallest Voronoipartition observed is 0.10 km2 and 85% of the partitions arebelow 2 km2. The area of a geohash cell remains constantat 0.72 km2. So when the driver is re-routed to a cell, avacant taxi has to spend less time searching in a geohashwhen compared to its corresponding Voronoi cell, providedthat sufficient demand density exists in the cell. However, attimes, this advantage comes at the expense of masking theactual demand hot spots. In some locations, typically near thecity center, there can be more than one hot spot in 1 km2 area.Due to the inflexible structure of a geohash, these multiplehot spots are considered as a single hot spot. Note that ifthe service providers are more inclined towards re-allocatingdrivers to fixed sized locations, multiple hot spots in a geohashmight not be an issue. With Voronoi technique, we have theadded flexibility of slicing one geohash into further cells. Sofrom the perspective of identifying a hot spot, the Voronoitessellation seems to be a better strategy than the Geohashtessellation for the Bengaluru data set.

5

Area (in km2)

Fre

quen

cy

VoronoiGeohash

0

0 2.5 5

50

100

(a) Cell area (Bengaluru)

Sample size

Fre

quen

cy

VoronoiGeohash

0

0

30000 60000

75

150

(b) Cell volume (Bengaluru)

Area (in km2)

Fre

quen

cy

VoronoiGeohash

0

0

1 2

75

150

(c) Cell area (New York city)

Sample size

Fre

quen

cy

VoronoiGeohash

100

200

0

0

40000 80000

(d) Cell volume (New York city)

Fig. 4: Histograms to show the distribution of the area per partition, and the samples per partition for the two tessellationtechniques.

2) New York data set

Fig. 3 is a zoomed-in heat map of the Manhattan borough.We note that 92.5% of the total generated demand originatedfrom Manhattan over the period of study. Thus, the spatial datadistribution is highly heterogenous, resulting in very closelypacked Voronoi cells in the Manhattan borough. In fact, thesmallest partition is of the size 0.08 km2 and 75% of thepartitions are below 0.10 km2, as observed in Fig. 4c. Routingdrivers to such a small area is infeasible and economically notviable for any service provider. The Geohash strategy, on theother hand, is not a density-dependent technique and hence,assigns geohashes uniformly. From Fig. 3a, we observe thatthere are inter-island tessellations, even though all the centroidsare in mainland. From the perspective of re-routing drivers, thisis not very appealing. This problem does not arise in Fig. 3band the 6-level geohashes are confined to either sides of theriver. Because of the inter-island tessellations, Voronoi doesnot appear to be a suitable spatial tessellation strategy forNew York city, which is spread over multiple islands. If theVoronoi tessellation is to perform better, extensive parametertuning of the K-Means algorithm will have to be conducted.The Voronoi tessellation has to be performed on each boroughto avoid inter-island tessellations, which adds to computationalcomplexity. For this data set, the sheer simplicity of theGeohash tessellation makes it an attractive strategy.

The inferences and comparisons made above are based onthe spatial aggregation of the data. In the next section, weperform comparisons based on the temporal aggregated data.The spatially aggregated data for each Voronoi and Geohashpartition is temporally aggregated and modeled using time-series techniques.

IV. TIME-SERIES MODELING

Before modeling the data, we remove duplicate user IDs,if they appear multiple times in a 30 minute interval perpartition. We assume that it is unlikely for a passenger to bookmultiple taxis within that time, from the same cell. A Box-Coxtransformation is applied to stabilize the variance of the rawdata, thus making the data amenable for linear processing [32].Each K-Means demand centroid has two time-series associatedwith it; one using the demand aggregated at the Voronoicell level, and the other with the demand aggregated at thegeohash level. When these sequences are plotted, we observe

that the series shows significant trend and seasonal patterns.On performing a spectral analysis, we find strong daily andweekly seasonality. A computationally efficient approach toreal-time forecasting is to use linear time series models. Hence,the data is modeled using single and double-seasonal linearparametric time-series models. The city comprises of variousactivity zones such as residential, entertainment, office, schoolzones, etcetera. The demand patterns for each of these zonesdiffer, and hence a single time-series model may not be asatisfactory fit for all the time-sequences. We perform thetime-series modeling exercise for each tessellation strategyseparately, at two time scales: 15 minute and 60 minute.

A. Shortlisted models

Some of the widely used single-seasonal exponential smooth-ing models are Holt-Winters, Auto-Regressive and SeasonalAuto-regressive Integrated Moving Average (ARIMA andSARIMA) models, Seasonal and Trend Decomposition us-ing LOESS (STL) combined with non-seasonal exponentialsmoothing [32, Chapters 6-8]. Some of the common double-seasonal models are Double Seasonal Holt-Winters (DSHW)[32] and Trigonometric BATS (TBATS) [33]. In order to makesure that these models work better than naive alternatives, wecompare the aforementioned models against a simple averagingmodel, the Naive and Seasonal Naive models, and the Randomdrift model. It was observed that in a few cells, the simplealternatives indeed performed better than the parametric time-series models. We consider a simple seasonal averaging modelas the baseline model for our analysis. If a model performsbetter than the baseline model, it is kept, else it is discarded.We now briefly explain the models with which almost all theactivity zones are well-modeled for our data.

1) Baseline model: In order to forecast for a particular timestep, the mean of all the previous season samples cor-responding to that time step is computed. Our baselinemodel is as follows:

yt+1 =1

S

S∑

i=1

yt+1−im, (3)

where yt is the demand at time t, S is the number ofseasons and m is the seasonality period. For example,if weekly forecast is considered, forecast for a Sundayat 12 a.m. is the average of all Sunday 12 a.m. demands

6

Fre

quen

cy

Residual Magnitude

-2 -1-1 0

0

00 111

150

300

(a) Sample histogram of residuals

AC

F

Lag0

0

1

0.5

2

1

(b) Significance levels

Mag

nit

ude

Sample index

1 720 1440

-2-1

01

(c) Resemblance to white noise

Fig. 5: Plots obtained by performing residual diagnostics for model evaluation.

in the past. Then, for data aggregated over 60 minute,S is the number of lookback Sundays, and m is 168(i.e., 24 hours × 7 days).

2) TBATS: TBATS model is a state space model intro-duced in [33] for forecasting time-series with multipleseasonal periods, high frequency seasonal periods, noninteger seasonality, and calender effects. TBATS is anacronym for Trigonometric (T), Box-Cox transform (B),ARMA errors dt (A), Trend bt (T) and Seasonal compo-

nents s(i)t (S). The seasonal components are represented

using trigonometric fourier series. The equations for ah-step ahead additive trend, multiplicative seasonalityTBATS model prediction are as follows:

Forecast yt+h|t = (lt + hbt)∏

i

s(i)t−mi+h,

Level lt = α

(

yt∏

i s(i)t−mi+h

)

+ (1− α)(lt−1 + bt−1),

Trend bt = β(lt − lt−1) + (1− β)(bt−1),

Seasonal component s(i)t =

ki∑

j=1

sij,t,

where, s(i)j,t = sij,t−1cosλj

i + s∗ij,t−1sinλji + γi,

λij =

2πj

mi

.

(4)where α, β, and γ are the smoothing parameters,mi ∀ i ∈ {1, . . . , T } are the seasonal periods, s∗i isthe complex conjugate of si, and ki is the number ofharmonics needed for the ith seasonal component.

3) Seasonal and Trend decomposition using Loess (STL):STL is a time-series decomposition technique where thedata is decomposed into seasonal (St) and non-seasonal(NSt) components:

yt = St +NSt,

NSt = Tt +Rt,

ST+h = ST+h−km; k = ⌊(h− 1)/m⌋+ 1.

(5)

Here, Tt denotes the trend, Rt is the irregular compo-nent, ⌊u⌋ is the integer part of variable u, h is the pre-diction horizon, and m is the seasonal period. The nonseasonal component is modeled using techniques suchas ARIMA, ExponenTial Smoothing models (ETS),Random drift, Naive, etcetera. The seasonal component,after a Seasonal Naive forecast for the seasons, is addedto the non-seasonal component to form the final model.

B. Model validation

The suitability of the models for the data can be analyzed asfollows:

• Plotting the histogram of the residuals: The histogramshould follow a Gaussian distribution with residual meanas the mean of the distribution.

• Plotting the Auto Correlation Function (ACF) of theresiduals: The residuals should lie within the significantband of the ACF plot, i.e, within ±2/

√N band, where

N is the sample size.• Resemblance with white noise: The residuals should

appear to be uncorrelated and should pass a statisticaltest for correlation; for example, the Box-Ljung test.The Box-Ljung statistic is a function of the accumulatedsample auto correlations up to any specified time lag.

We refer the reader to [32, Chapter 2, Section 2.6] for anelaborate explanation of the afore-mentioned residual diag-nostic tools. Results of residual diagnostics performed on theVoronoi time-series generated from an entertainment zone inBengaluru are plotted in Fig. 5. Here, STL decomposition withARIMA(2,1,2) performed satisfactorily and the residuals fromthe fit appear to be uncorrelated.

The performance of the models can further be evaluatedby employing performance metrics such as Symmetric MeanAbsolute Percent Error (SMAPE) and Mean Absolute ScaledError (MASE). The SMAPE is a symmetrized version of theMean Absolute Percent Error (MAPE), which is defined if,at all future time points, the point forecasts and actuals areboth not zero. For a forecast horizon N considered, SMAPEis defined as follows:

SMAPE =100

N

N∑

t=1

|yt − yt|yt + yt

, (6)

7

Strategy Bengaluru New York

GeohashSTL (ETS, a.e., d.t.),

STL (ARIMA, BoxCox)

STL (ETS, a.e., d.t.),

STL (Random drift)

Voronoi

STL (ETS, a.e., d.t.),

TBATS (ARMA error,

BoxCox,trend)

STL (ETS, a.e., d.t.),

STL (ARIMA, BoxCox),

STL (Naive)

TABLE I: Best performing models for the data sets, wherea.e. means additive errors, and d.t. means damped trend.

Data set Error Metric60 minute 15 minute

Vor Geo Vor Geo

BengaluruSMAPE (%) 17.6 21.7 37 40.4

MASE 0.75 0.73 0.52 0.53

New YorkSMAPE (%) 15.2 15.6 33.6 31.2

MASE 0.60 0.59 0.62 0.47

TABLE II: Preliminary results for the Bengaluru and New Yorkcity data sets.

where yt is the actual demand and yt is the forecast at timet. Even though this metric is widely used, SMAPE is a scale-dependent error, and is not well-suited for intermittent demand.Hence, we also evaluate our models against MASE. For a time-series of forecast horizon N and seasonal period m, MASE isdefined as:

MASE =1

N

N∑

t=1

(

|yt − yt|1

N−m

∑N

t=m+1 |yt − yt−m|

)

. (7)

For a non-seasonal time-series, m = 1. The denominatorof the equation is the mean absolute error of the one-stepSeasonal Naive forecast method on the training set. This errormetric compares the models with the standard Naive model.A MASE value < 1 ensures that the models perform betterthan the Naive technique. MASE can be used to compareforecasts across data sets with different scales, as the metric isindependent of the scale of the data. Smaller the value of themetric, the better is the model. The selected models passedthe residual diagnostics and performed satisfactorily withSMAPE and MASE. In Table I, we list the shortlisted modelswith which Bengaluru and New York city are well-modeledat 60 minute aggregation levels. The errors encountered bythe individual models are not listed because the modelingtechnique that works for one scenario might not provide goodperformance with another scenario.

Sample size (norm.)

SM

AP

E

Voronoi

Geohash

0 1

918

27

(a) Bengaluru

Sample size (norm.)

SM

AP

E

Voronoi

Geohash

411

18

0 1

(b) Manhattan-Bronx

Fig. 6: The variation of SMAPE with the sample size in eachpartition at 60 minute aggregation level.

C. Preliminary results

After tessellating the city space into geohashes and Voronoicells, we ran the time-series models mentioned in the previ-ous section on the spatially and temporally aggregated area-normalized demand. In general, we observe strong hetero-

geneity in the city demand across both spatial and temporaldimensions. It was observed that for cells with low density,the Voronoi tessellation performs better than the Geohashtessellation (Fig. 6). The optimal density based partition inVoronoi cells appears to be the reason for this behavior. Forhigh density cells, Geohash based models perform at least aswell as Voronoi based models. The comparable performancealong with the low computational complexity makes Geohashtessellation the preferred choice for data dense locations. Thenumerical results are summarized below:

1) Using the models mentioned in Section IV-A, weachieve an average accuracy of about 80% per km2 forboth data sets, at 60 minute aggregation levels.

2) From Table II, we notice the absence of an universalwinner on evaluating the performances of Geohash andVoronoi tessellation based models on 740 Bengaluruand 780 New York city demand centers.

3) On average, Voronoi tessellation appears to outperformGeohash tessellation for Bengaluru. On the other hand,Geohash tessellation emerges as the preferred techniquefor most of the tested cases for New York city.

The Empirical Cumulative Distribution Function (ECDF) ofthe average errors obtained from the models based on all theVoronoi and Geohash partitions are plotted in Fig. 7. We seethat the Voronoi technique performs better than the Geohashtechnique at 15 and 60 minute aggregation levels in Fig. 7aand Fig. 7b for Bengaluru. The errors from the Voronoi basedmodels reach the upper bound faster than errors from theGeohash based models. On the other hand, for New Yorkcity, Fig. 7c and Fig. 7d show that the Geohash technique hasbetter performance over the Voronoi technique. Additionally,we conducted the Kolmogorov-Smirnov test (K-S test) [34] tocheck if the ECDFs differ significantly. The results obtainedprovide sufficient evidence that Voronoi technique is statisti-cally better than the Geohash technique for Bengaluru and viceversa for New York city, consistent with the inferences fromFig. 7. We find that the Geohash tessellation performs betterthan Voronoi for New York city, while the Voronoi tessellationhas better performance over Geohash for Bengaluru. Further,we proceed to plot the instantaneous errors obtained from themodels based on the two tessellation techniques in Fig. 8afor Bengaluru at 15 minute aggregation level. Each point onthe graph corresponds to the mean of errors obtained fromall Geohash/Voronoi cells at that time instant. We can seethat during early mornings and late nights, Geohash basedmodels work better compared to Voronoi based models. Forthe rest of the day, i.e., the daylight hours, Voronoi based

8

SMAPE

CD

F

V-60G-60V-15G-15

00.5

1

0 20 40

(a) Bengaluru

MASEC

DF

V-60G-60V-15G-15

00.5

1

0 1 2

(b) Bengaluru

SMAPE

CD

F

V-60G-60V-15G-15

00.5

1

0 20 40

(c) New York city

MASE

CD

F

V-60G-60V-15G-15

00.5

1

0 0.75 1.5

(d) New York city

Fig. 7: ECDF plots of the Voronoi and Geohash strategies using MASE and SMAPE as the metrics. V-60, V-15, G-60, and G-15correspond to performances of Voronoi and Geohash based models at 60 and 15 minute aggregation levels respectively.

Time step

SM

AP

E

VoronoiGeohash

1

22

48

53

96

84

(a) Without HEDGE

Time step

SM

AP

E

VoronoiGeohash

HEDGE

1

22

48

53

96

84

(b) HEDGE without discounting

Time step

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.7)

1

22

48

53

96

84

(c) HEDGE with discounting

Fig. 8: The performances of models based on tessellation techniques without the HEDGE, after the HEDGE without discounting,and after the HEDGE with discounting.

models tend to produce better forecasts compared to Geohashbased models. Note that even within a single city, a uniquetessellation technique does not have better performance overthe other at all time horizons. This time varying behavior of themodels prompts us to combine the models for better location-based forecasts, which we do in the next section.

V. MODIFIED HEDGE FOR COMBINING MODELS

Consider a learning framework, where models provide recom-mendations. Each model is referred to as an expert. Assumingthat the predictor has complete knowledge about all the pastdecisions and performances of the experts, the goal is toperform as well as the best expert in the pool. This exercisebelongs to the class of ensemble learning where we make useof multiple experts to obtain better predictive performance thancould be obtained from any of the constituent experts alone. Inour context, whenever the Geohash tessellation performs betterthan the Voronoi tessellation at a time step t, the ideal decisionwould be to choose the Geohash technique as the preferredtessellation strategy for that time step. For all demand centers,the forecasts at that instant should then be made from theirgeohash cells.

On observing the performances of both strategies at 15 and60 minute aggregation levels, we see that a single tessellationtechnique is unable to yield optimal results for the entireforecast horizon. The best strategy varies with time. A plausibleapproach towards addressing this issue is to combine the two

experts. As an initial step towards that goal, we apply theclassic HEDGE algorithm as proposed in [28] to combine themodels (Fig. 8b). We see that the decision maker is able tofollow the best expert, the model based on Voronoi in this case,till a cross over occurs. The algorithm is unable to adapt tothe shifting nature of the experts. In order to adapt to such anon stationary environment, the dHEDGE [29] was proposed,where the authors modify the conventional HEDGE to includean exponential discounting factor. By exponentially filteringthe past performances of the experts, the dHEDGE takes intoaccount the non-stationarity of the process. By normalizingthe weights of each expert at every instant, the dHEDGEprovides a set of coefficients for experts. This can then beused to generate a convex combination of expert opinions. Ourproblem does not require normalized weights. Since there areonly two experts, at every instant we can pick the expert whichhas the maximum weight, i.e, the expert with the minimumloss. We modify the dHEDGE algorithm to pick an expertif it has a higher weight than the other expert. The originalalgorithm has a generic loss function L(x, y) ∈ [0, 1]. Forillustrative purposes, the authors in [29] have used a discreteloss function L(x, y) = [[x 6= y]], where [[·]] is the indicatorfunction, x is the observed symbol and y is the predictedsymbol. For their cellular LTE network application, that lossfunction outputs 0 for all the experts who predicted the symbolcorrectly at that instant, and output 1 for all other experts. Onthe other hand, we build our loss functions based on the errors

9

observed for each strategy. Our modified dHEDGE algorithmis given in Algorithm 1.

Algorithm 1: Modified dHEDGE algorithm for select-

ing the best expert

1 Parameters: Learning parameter β ∈ [0, 1],

Discounting factor γ ∈ [0, 1]

2 Initialization: Set wk[1] = W > 0 for k ∈ 1, 2 s.t.∑2

i=1 wi[1] = 1

3 for t = 1, 2, . . . do

4 Choose expert with index = argmaxi

(wi[t])

5 Error ev = MEAN (forecast errors from models

based on Voronoi at t)

6 Error eg = MEAN (forecast errors from models

based on Geohash at t)

7 Loss L at t, l1[t] =ev

ev+eg, l2[t] =

egev+eg

8 Update weights as wi[t+ 1] = wi[t]γ · βli[t]

9 end

The weights can be initialized either uniformly or based onsome prior knowledge about the experts. The factor γ givesmore leverage to the observations made in the recent pastcompared to the observations in the distant past. Thus, theexpert which have been performing well in the recent past isboosted. By tuning the parameter γ, we can get the dHEDGEalgorithm to respond to sudden changes in the average errorperformance. By setting γ = 1, this algorithm becomes theHEDGE algorithm with no forgetting factor. When the modi-fied dHEDGE is used, we observe that the algorithm tends tochoose the best expert quite rapidly compared to the originalHEDGE (Fig. 8c).

The modified dHEDGE algorithm was run on the two datasets at 15 and 60 minute sampling periods. The parameters βand γ for each scenario were chosen based on an independentvalidation set spanning over 24 hours. The performance ofthe algorithm can be seen in the Fig. 9. We can see that thealgorithm picks up the best shifting expert, by giving moreweightage to the behavior of that expert in the recent past(decided by γ). Note that in Fig. 9a, the plots of Voronoi anddHEDGE overlap as Voronoi is consistently the best performerthroughout the forecasting horizon. The last two plots in eachof the Fig. 9a-9d show the cumulative mean error behaviors ofthe strategies. The cumulative mean error plots reveal that themodified dHEDGE performs at least as well as the best expert.Whenever the experts in consideration have similar cumulativeerrors, the hybrid strategy has an error lower than both experts;else, the hybrid strategy consistently has a cumulative errorequal to the best performing expert in the recent past. Wealso analyze the performance of the strategies on variousboroughs of New York city, to avoid the issue of inter-islandVoronoi tessellations. To further demonstrate the flexibility ofour algorithm in adapting to various scenarios, we repeated thesimulations for temporal aggregation levels of 5 and 30 minute.

Multiple spatial resolutions are also explored to validate theapplicability of our algorithm. Irrespective of all these consid-erations, when dHEDGE was applied, the hybrid strategy stillturned out to be the winner for all the data sets considered,with different metrics at multiple temporal and spatial scales.Table III summarizes the performances of the strategies onthe data sets, with SMAPE as the metric. Similar results wereobserved using MASE, and hence are not included. We observethat the modified dHEDGE improves the prediction accuracyby picking the best strategy for that particular instant, forthat city, at that temporal and spatial resolution. Note that ourpolicy does not switch continuously between strategies everytime step. This is because the dHEDGE chooses the winnerstrategy for a time step based on neither the current nor theimmediate past performances of the experts. The dHEDGEmakes a switch only if one strategy performs better thanthe other for a period of time in the past, which is decidedby the discounting factor γ. On tracking the performance ofour strategy for all the test cases mentioned in Table III, weobserve an average of only 1.5 switches per day at 60 minuteaggregation levels. In other words, for 24 time steps, the hybridstrategy made less than 2 switches on average for all testedscenarios. For 30 minute (i.e., 48 steps) and 15 minute (i.e.,96 steps) aggregation levels, the average switches made wereonly 3.2 and 8.4 per day. With finer temporal resolutions, thenumber of switches increases, which is inevitable, consideringthe increase in the number of time steps in the forecast horizon.We feel that the observed number of switches is acceptable,in return for a significant improvement in accuracy.

Based on our observations from two independent data sets,we find that there is no universal winning tessellation techniquethat works for all data sets. However, an algorithm thatcombines different tessellation strategies has the potential toenhance prediction performance across a broad range of datasets.

VI. CONCLUSIONS

Efficient spatial partitioning is a key step towards betterlocation-based demand modeling and forecasting. We per-formed a comparison of two widely used spatial partitioningtechniques, i.e. Voronoi tessellation and Geohashing. Our studywas conducted using two distinct data sets; (i) a mobile appli-cation based taxi demand data set from Bengaluru, India and(ii) a street hailing yellow taxi service data set from New Yorkcity, USA. A K-Means clustering algorithm was employed togenerate demand clusters, that acted as generating sites for thetessellations. In order to compare the Voronoi and Geohashbased models at multiple time scales, time-series techniqueswere applied to the spatially partitioned data at 15 and 60minute temporal aggregation levels. The shortlisted models,that were found to be suitable, were ARIMA, ETS, Naive,Random drift which were decomposed using STL, and TBATS.We noted that neither Geohash nor the Voronoi tessellationtechnique proved to be optimal for the entire forecast horizon.Further, models based on Voronoi had superior performanceover models based on Geohash when the demand density waslow, and vice versa. While Voronoi tessellation appeared to bethe recommended strategy for tessellating Bengaluru, Geohash

10

Time steps

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.1)

11

51

1 12 24Time steps

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.4)

19

88

1 48 96Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.1)

13

27

1 12 24Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.4)

25

58

1 48 96

(a) Bengaluru with SMAPE as the metric

Time steps

MA

SE

VoronoiGeohash

dHEDGE(0.5,0.7)

1 12 24

0.3

1.4

0.2

Time steps

MA

SE

VoronoiGeohash

dHEDGE(0.1,0.3)

1 48 96

1.1

Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.5,0.7)

1 12 24

0.5

0.9

Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.3)

1 48 96

0.4

0.6

(b) Bengaluru with MASE as the metric

Time steps

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.1)

1 12 24

10

35

Time steps

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.1)

1 48 96

20

71

Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.1)

1 12 24

35

16

Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.1)

1 48 96

32

54

(c) New York city with SMAPE as the metric

Time steps

MA

SE

VoronoiGeohash

dHEDGE(0.1,0.1)

1 12 24

0.2

1.6

Time steps

MA

SE

VoronoiGeohash

dHEDGE(0.1,0.5)

1 48 96

0.3

1.4

Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.1)

1 12 24

0.2

1.6

Time steps

Cum

.M

ean

Err

or

VoronoiGeohash

dHEDGE(0.1,0.5)

1 48 96

0.5

1.2

(d) New York city with MASE as the metric

Time steps

Cum

.M

ean

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.3)

1 48 96

14

39

Time steps

Cum

.M

ean

MA

SE

VoronoiGeohash

dHEDGE(0.1,0.7)

1 48 96

0.2

0.7

Time steps

Cum

.M

ean

SM

AP

E

VoronoiGeohash

dHEDGE(0.1,0.1)

1 48 96

72

128

Time steps

Cum

.M

ean

MA

SE

VoronoiGeohash

dHEDGE(0.1,0.8)

1 48 96

0.6

1.4

(e) Manhattan-Bronx and Brooklyn-Queens data sets with SMAPE and MASE as metrics

Fig. 9: The prediction performance of the dHEDGE (β,γ) algorithm based model against single strategy based models at 15 and60 minute aggregation levels for the two cities with SMAPE and MASE as the metrics are plotted in (a)-(d). In (e), the first twosubplots correspond to the dHEDGE performance on Manhattan-Bronx boroughs, and the last two sub plots correspond to thedHEDGE performance on Brooklyn-Queens boroughs respectively, at 15 minute aggregation levels.

11

Data set Tessellation size

Sampling period

5 minute 15 minute 30 minute 60 minute

Vor Geo dHEDGE Vor Geo dHEDGE Vor Geo dHEDGE Vor Geo dHEDGE

Bengaluru

6-level geohash

and K = 74063.2 60.8

60.737.1 40.4

36.525.6 29.6

25.617.6 21.7

17.6

(0.1,0.4) (0.1,0.4) (0.1,0.1) (0.1,0.1)

sub 5-level geohash

and K = 11423 20.7

20.512.8 19.3

12.69.7 16.7

9.66.9 12.8

7.0

(0.1,0.1) (0.1,0.6) (0.1,0.1) (0.1,0.3)

5-level geohash

and K = 3010.9 10.5

9.77.4 9.4

7.36.4 7.4

6.45.4 5.8

5.4

(0.1,0.9) (0.1,0.7) 0.9,0.7 0.9,0.9

New York city

6-level geohash

and K = 78061.4 40.0

40.033.6 31.2

3124.8 24.4

23.815.2 15.6

15.2

(0.1,0.1) (0.1,0.1) (0.1,0.6) (0.1,0.1)

sub 5-level geohash

and K = 12023.7 15.9

15.513.9 12.4

12.111 11.6

10.69.4 11.9

9.3

(0.8,0.8) (0.7,0.2) (0.1,0.3) (0.1,0.4)

5-level geohash

and K = 3110.9 10.5

10.49.7 8.7

8.311.2 7.7

7.910.6 12.8

10.8

(0.1,0.9) (0.1,0.7) (0.1,0.1) (0.1,0.8)

Manhattan-Bronx

6-level geohash

and K = 16929.9 20.6

20.617 16.5

16.513.2 11.1

11.19.15 10.5

9.15

(0.1,0.2) (0.1,0.3) (0.1,0.1) (0.1,0.1)

sub 5-level geohash

and K = 2611.6 14.7

11.48.3 12.9

8.47.4 10.2

7.56.8 9.3

6.9

(0.1,0.8) (0.1,0.3) (0.1,0.2) (0.1,0.1)

5-level geohash

and K = 77.9 7.2

6.96.2 6.7

6.25.5 5.7

5.15.9 5.4

5.5

(0.1,0.7) (0.7,0.9) (0.7,0.2) (0.6,0.9)

Brooklyn-Queens

6-level geohash

and K = 46090.2 57.2

57.297.6 74.9

74.273.9 41.8

41.862.6 52.9

52.9

(0.1,0.1) (0.1,0.1) (0.1,0.1) (0.1,0.1)

sub 5-level geohash

and K = 7173.0 35.1

35.149.3 27.2

27.237 22.9

22.926.2 15.9

15.8

(0.1,0.1) (0.1,0.1) (0.1,0.4) (0.1,0.3)

5-level geohash

and K = 1842.4 19.5

19.523.1 28.8

23.118.2 22.8

18.212.9 18

12.9

(0.1,0.4) (0.1,0.8) (0.1,0.8) (0.9,0.9)

TABLE III: The table contains the performance comparison of Voronoi, Geohash and dHEDGE using SMAPE, for the entireset of simulations performed. We observe that the dHEDGE has a superior performance for all the tested combinations on all4 data sets. The unique set of parameters (β,γ) obtained for each test case are mentioned below the errors encountered usingdHEDGE algorithm.

tessellation was the suggested strategy for New York city. Weconcluded that the tessellation strategy is very dependent onthe demand density in each partition, and the geography of thecity.

The lack of a clear winning strategy prompted us to devisea hybrid tessellation strategy by combining models based onthe well known HEDGE algorithm. We modified dHEDGE,which is a discounted version of the HEDGE algorithm, tosuit our context. Our modified algorithm always picked up thebest possible strategy at each time step in the forecast horizon.It is appealing that the tessellation performance is not impactedby the time of the day, or the features of the underlying dataset. The hybrid strategy was the winning strategy for all thedata sets considered, performing consistently better at multipletime scales with different performance metrics.

This paper was directed towards a comparison of tessella-tion strategies, where we focused on combining strategies toenhance performance. There are numerous avenues that meritfurther investigation. For improving the prediction accuracy,we can explore techniques such as Support Vector Regressionand Artificial Neural Networks. In fact, recent developmentsin Recurrent Neural Networks have also shown promise inimproving predictions [8]. It will certainly be interesting toconsider the supply side of the data as well, to build a morecomprehensive demand-supply modelling framework.

ACKNOWLEDGEMENTS

The work is undertaken as part of the Cognizant-Ernet IITMadras project. The authors are also thankful to Vishnu Raj,Gopal Krishna Kamath, Saarthak Jain, and Venkata Siva SaiPrasad for many helpful discussions.

REFERENCES

[1] D. Shao, W. Wu, S. Xiang, and Y. Lu, “Estimating taxi demand-supplylevel using taxi trajectory data stream,” in Proc. of IEEE International

Conference on Data Mining Workshop, pp. 407–413, 2015.

[2] C. Kamga, M. A. Yazici, and A. Singhal, “Hailing in the rain: temporaland weather-related variations in taxi ridership and taxi demand-supplyequilibrium,” in Transportation Research Board 92nd Annual Meeting,2013.

[3] X. Li, G. Pan, Z. Wu, G. Qi, S. Li, D. Zhang, W. Zhang, and Z. Wang,“Prediction of urban human mobility using large-scale taxi traces andits applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121, 2012.

[4] L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, andL. Damas, “Predicting taxi–passenger demand using streaming data,”IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3,pp. 1393–1402, 2013.

[5] K. Zhang, Z. Feng, S. Chen, K. Huang, and G. Wang, “A framework forpassengers demand prediction and recommendation,” in Proc. of IEEE

International Conference on Services Computing, pp. 340–347, 2016.

[6] J. Lee, I. Shin, and G. L. Park, “Analysis of the passenger pick-uppattern for taxi location recommendation,” in Proc. of International

Conference on Networked Computing and Advanced Information Man-

agement, pp. 199–204, 2008.

[7] Y. Dong, S. Qian, K. Zhang, and Y. Zhai, “A novel passenger hotspots

12

searching algorithm for taxis in urban area,” in Proc. of IEEE/ACIS In-

ternational Conference on Software Engineering, Artificial Intelligence,

Networking and Parallel/Distributed Computing, pp. 175–180, 2017.

[8] J. Xu, R. Rahmatizadeh, L. Boloni, and D. Turgut, “Real-time predictionof taxi demand using recurrent neural networks,” IEEE Transactions on

Intelligent Transportation Systems (Early Access), 2017.

[9] S. Phithakkitnukoon, M. Veloso, C. Bento, A. Biderman, and C. Ratti,“Taxi-aware map: Identifying and predicting vacant taxis in the city,” inProc. of International Joint Conference on Ambient Intelligence, pp. 86–95, 2010.

[10] Y. Ding, Y. Li, K. Deng, H. Tan, M. Yuan, and L. M. Ni, “Dissectingregional weather-traffic sensitivity throughout a city,” in Proc. of IEEE

International Conference on Data Mining, pp. 739–744, 2015.

[11] J. Shen, X. Liu, and M. Chen, “Discovering spatial and temporalpatterns from taxi-based floating car data: a case study from nanjing,”GIScience & Remote Sensing, vol. 54, no. 5, pp. 1–22, 2017.

[12] Y. Zhang, B. Li, and K. Ramayya, “Learning individual behavior usingsensor data: The case of gps traces and taxi drivers,” 2016. [Online].Available: https://ssrn.com/abstract=2779328.

[13] N. Davis, G. Raina, and K. Jagannathan, “A multi-level clusteringapproach for forecasting taxi travel demand,” in Proc. of IEEE Interna-

tional Conference on Intelligent Transportation Systems, pp. 223–228,2016.

[14] R. Outersterp, “Partitioning of spatial data in publish-subscribe messag-ing systems,” Master’s thesis, University of Twente, 2016.

[15] R. Gelda, K. Jagannathan, and G. Raina, “Forecasting supply in voronoiregions for app-based taxi hailing services,” in International Conference

on Advanced Logistics and Transport, 2017.

[16] J. MacQueen, “Some methods for classification and analysis of multi-variate observations,” in Proc. of the Berkeley Symposium on Mathe-

matical Statistics and Probability, pp. 281–297, 1967.

[17] A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya,S. Foufou, and A. Bouras, “A survey of clustering algorithms forbig data: Taxonomy and empirical analysis,” IEEE Transactions on

Emerging Topics in Computing, vol. 2, no. 3, pp. 267–279, 2014.

[18] D. Xu, and Y. Tian, “A comprehensive survey of clustering algorithms,”Annals of Data Science, vol. 2, no. 2, pp. 165–193, 2015.

[19] M. Kaushik, and M. B. Mathur, “Comparative study of k-means andhierarchical clustering techniques,” International Journal of Software &

Hardware Research in Engineering, vol. 2, no. 6, pp. 93–97, 2014.

[20] S. Chakraborty, N. Nagwani, and L. Dey, “Performance comparisonof incremental k-means and incremental dbscan algorithms,” arXiv

preprint arXiv:1406.4751, 2014.

[21] J. Burns, “Centroidal voronoi tessellations,” 2009. [Online]. Available:https://www.whitman.edu/Documents/Academics/Mathematics/burns.pdf.

[22] L. Moreira-Matias, J. Mendes-Moreira, J. F. de Sousa, and J. Gama,“Improving mass transit operations by using avl-based systems: A sur-vey,” IEEE Transactions on Intelligent Transportation Systems, vol. 16,no. 4, pp. 1636–1653, 2015.

[23] C. Chen, Y. Wang, L. Li, J. Hu, and Z. Zhang, “The retrieval of intra-daytrend and its influence on traffic prediction,” Transportation Research

Part C: Emerging Technologies, vol. 22, pp. 103–118, 2012.

[24] M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecast-ing: An experimental comparison of time-series analysis and supervisedlearning,” IEEE Transactions on Intelligent Transportation Systems,vol. 14, no. 2, pp. 871–882, 2013.

[25] C. Ding, J. Duan, Y. Zhang, X. Wu, and G. Yu, “Using an arima-garchmodeling approach to improve subway short-term ridership forecastingaccounting for dynamic volatility,” IEEE Transactions on Intelligent

Transportation Systems, vol. 19, no. 4, pp. 1054–1064, 2017.

[26] Y. Liu, C. Liu, N. J. Yuan, L. Duan, Y. Fu, H. Xiong, S. Xu, and J. Wu,“Intelligent bus routing with heterogeneous human mobility patterns,”Knowledge and Information Systems, vol. 50, no. 2, pp. 383–415, 2017.

[27] V. G. Vovk, “Aggregating strategies,” in Proc. of Computational Learn-

ing Theory, pp. 371–383, 1990.

[28] Y. Freund, and R. E. Schapire, “A decision-theoretic generalization ofon-line learning and an application to boosting,” in Proc. of European

Conference on Computational Learning Theory, pp. 23–37, 1995.

[29] V. Raj, and S. Kalyani, “An aggregating strategy for shifting experts indiscrete sequence prediction,” arXiv preprint arXiv:1708.01744, 2017.

[30] R. J. Hyndman, A. B. Koehler, R. A. Ahmed, H. Booth, and R. D. Sny-der, “Another look at forecast-accuracy metrics for intermittent de-mand,” Foresight: The International Journal of Applied Forecasting,vol. 4, no. 4, pp. 43–46, 2006.

[31] N. Y. C. Taxi & Limousine Commission, “Tlc trip record data,”2017. [Online]. Available: http://www.nyc.gov/html/tlc/html/about/trip record data.shtml.

[32] R. J. Hyndman, and G. Athanasopoulos, Forecasting: principles and

practice. OTexts, 2014.

[33] A. M. De Livera, R. J. Hyndman, and R. D. Snyder, “Forecasting timeseries with complex seasonal patterns using exponential smoothing,”Journal of the American Statistical Association, vol. 106, no. 496,pp. 1513–1527, 2011.

[34] F. J. Massey Jr, “The kolmogorov-smirnov test for goodness of fit,”Journal of the American Statistical Association, vol. 46, no. 253, pp. 68–78, 1951.

Neema Davis is currently pursuing her Dual M.S-Ph.D. degree in Electrical Engineering from theIndian Institute of Technology Madras. She obtainedher B. Tech. in Electronics and CommunicationEngineering from the Mahatma Gandhi UniversityKottayam in 2013. Her research interests lie intransportation planning, predictive data analysis, andintelligent transportation systems.

Gaurav Raina holds a faculty position in ElectricalEngineering at the Indian Institute of TechnologyMadras, and is also a visiting research fellow inMathematics at the University of Cambridge, UK.His research interests are in the design and per-formance evaluation of communication networks,intelligent transportation systems, control theory, andnon-linear systems.

Krishna Jagannathan obtained his B. Tech. in Elec-trical Engineering from IIT Madras in 2004, and theS.M. and Ph.D. degrees in Electrical Engineering andComputer Science from the Massachusetts Instituteof Technology (MIT) in 2006 and 2010 respectively.During 2010-2011, he was a visiting post-doctoralscholar in Computing and Mathematical Sciencesat Caltech, and an off-campus post-doctoral fellowat MIT. Since November 2011, he has been anassistant professor in the Department of ElectricalEngineering, IIT Madras. He worked as a consultant

at the Mathematical Sciences Research Center, Bell Laboratories, MurrayHill, NJ in 2005, and as an engineering intern at Qualcomm, Campbell, CAin 2007. His research interests lie in the stochastic modeling and analysisof communication networks, transportation networks, network control, andqueuing theory. Dr. Jagannathan serves on the editorial boards of the journalsIEEE/ACM Transactions on Networking and Performance Evaluation. He isthe recipient of a best paper award at WiOpt 2013, and the Young FacultyRecognition Award for excellence in Teaching and Research at IIT Madras(2014).