T-drive: driving directions based on taxi trajectories
Transcript of T-drive: driving directions based on taxi trajectories
T-Drive: Driving Directions Based on Taxi Trajectories
Jing Yuan1,2, Yu Zheng2, Chengyang Zhang3, Wenlei Xie2
Xing Xie2, Guangzhong Sun1, Yan Huang3
1 School of Computer Science and Technology, University of Science and Technology of China2 Microsoft Research Asia, 4F, Sigma Building, No. 49 Zhichun Road, Beijing 100190, China
3 Computer Science Department, University of North Texas{yuzheng,xingx}@microsoft.com, [email protected], [email protected]
ABSTRACTGPS-equipped taxis can be regarded as mobile sensors prob-ing traffic flows on road surfaces, and taxi drivers are usuallyexperienced in finding the fastest (quickest) route to a des-tination based on their knowledge. In this paper, we minesmart driving directions from the historical GPS trajecto-ries of a large number of taxis, and provide a user with thepractically fastest route to a given destination at a given de-parture time. In our approach, we propose a time-dependentlandmark graph, where a node (landmark) is a road segmentfrequently traversed by taxis, to model the intelligence oftaxi drivers and the properties of dynamic road networks.Then, a Variance-Entropy-Based Clustering approach is de-vised to estimate the distribution of travel time between twolandmarks in different time slots. Based on this graph, wedesign a two-stage routing algorithm to compute the prac-tically fastest route. We build our system based on a real-world trajectory dataset generated by over 33,000 taxis in aperiod of 3 months, and evaluate the system by conductingboth synthetic experiments and in-the-field evaluations. Asa result, 60ā70% of the routes suggested by our method arefaster than the competing methods, and 20% of the routesshare the same results. On average, 50% of our routes areat least 20% faster than the competing approaches.
KeywordsDriving directions, time-dependent fast route, taxi trajecto-ries, T-Drive, landmark graph
Categories and Subject DescriptorsH.2.8. [Database applications]: Data mining, Spatialdatabases and GIS.
General TermsAlgorithms, Experimentation
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GIS ā10, November 2-5, 2010. San Jose, CA, USACopyright cā2010 ACM 978-1-4503-0428-3/10/11 ...$10.00.
Finding efficient driving directions has become a daily ac-tivity and been implemented as a key feature in many mapservices like Google and Bing Maps. A fast driving routesaves not only the time of a driver but also energy consump-tion (as most gas is wasted in traffic jams). In practice,big cities with serious traffic problems usually have a largenumber of taxis traversing on road surfaces. For the sake ofmanagement and security, these taxis have already been em-bedded with a GPS sensor, which enables a taxi to report onits present location to a data center in a certain frequency.Thus, a large number of time-stamped GPS trajectories oftaxis have been accumulated and are easy to obtain.
Intuitively, taxi drivers are experienced drivers who canusually find out the fastest route to send passengers to adestination based on their knowledge (we believe most taxidrivers are honest although a few of them might give passen-gers a roundabout trip). When selecting driving directions,besides the distance of a route, they also consider other fac-tors, such as the time-variant traffic flows on road surfaces,traffic signals and direction changes contained in a route, aswell as the probability of accidents. These factors can belearned by experienced drivers but are too subtle and diffi-cult to incorporate into existing routing engines. Therefore,these historical taxi trajectories, which imply the intelligenceof experienced drivers, provide us with a valuable resourceto learn practically fast driving directions.
In this paper, we propose to mine smart driving directionsfrom a large number of real-world historical GPS trajecto-ries of taxis. As shown in Figure 1, taxi trajectories areaggregated and mined in the Cloud to answer queries fromordinary drivers or Internet users. Given a start point anddestination, our method can suggest the practically fastestroute to a user according to his/her departure time and basedon the intelligence mined from the historical taxi trajecto-ries. As the taxi trajectories are constantly updated in theCloud, the suggested routes are state-of-the-art.
When proposing the above-mentioned strategy, two ma-
Figure 1: A cloud-based driving directions service
99
jor concerns come to peopleās minds. First, some routes onwhich a taxi can quickly traverse might not be feasible fornormal drivers, e.g., the carpool tracks in some highwaysand a few bus-taxi-preserved tracks in a city. But, in mostcases, especially in many urban cities like New York and Bei-jing, private cars can share the same tracks with taxis. Thatis, taxisā trajectories can still be referenced by other driverswhen finding driving directions in an urban city. Second,the historical trajectory-based approach might not be ag-ile enough to handle some urgent accidents in contrast toreal-time traffic analysis. Intrinsically, the traffic flows of acity follow some patterns unless some emergent events hap-pen, such as serious accidents, traffic control and road-works.Given that the probability of these events is much lower thanthat of regular traffic patterns, our method is still very use-ful in most situations. At the same time, besides the trafficflow, our method also implicitly incorporates additional fac-tors, such as direction changes and traffic signals. Moreover,this method can find the fastest route in a future time andneeds less online communication for data transition. Thus,our solution and the real-time-based approach can comple-ment each other.
However, we need to face the following three challengeswhen performing our method.Intelligence Modeling: As a user can select any place
as a source or destination, there would be no taxi trajectoryexactly passing the query points. That is, we cannot answeruser queries by directly mining trajectory patterns from thedata. Therefore, how to model taxi driversā intelligence thatcan answer a variety of queries is a challenge.Data Sparseness and Coverage: We cannot guarantee
there are sufficient taxis traversing on each road segmenteven if we have a large number of taxis. That is, we cannotaccurately estimate the speed pattern of each road segment.Low-sampling-rate Problem: To save energy and com-
munication loads, taxis usually report on their locations ina very low frequency, like 2-5 minutes per point. This in-creases the uncertainty of the routes traversed by a taxi[11].As shown in Figure 2, there could exist four possible routes(š 1-š 4) traversing the sampling points š and š.
Figure 2: Low-sampling-rate problem
In our approach, we model a large number of historical taxitrajectories with a time-dependent landmark graph, in whicha node (landmark) represents a road segment frequently tra-versed by taxis. Based on this landmark graph, we performa two-stage routing algorithm that first searches the land-mark graph for a rough route (represented by a sequence oflandmarks) and then finds a refined route sequentially con-necting these landmarks. The contributions of this paper liein the following aspects:
ā We propose the notion of a landmark graph, whichwell models the intelligence of taxi drivers based on thetaxi trajectories and reduces the online computation ofroute-finding.
ā We devise Variance-Entropy-Based Clustering (calledVE-Clustering) to learn the time-variant distributionsof the travel times between any two landmarks.
ā We build our system by using a real-world trajectorydataset generated by 33,000+ taxis in a period of 3months, and evaluate the system by conducting bothsynthetic experiments and in-the-field evaluations (per-formed by real drivers). The results show that ourmethod can find out faster routes with less online com-putation than competing methods.
In the remainder of this paper, we first formally define ourproblem in Section 2 and give an overview of our approach inSection 3. Then we elaborate on the time-dependent land-mark graph construction in Section 4 and route computingin Section 5. After that, we report on the evaluation in Sec-tion 6. Finally, we discuss related work in Section 7 andconclude this paper in Section 8.
2. PROBLEM DEFINITIONIn this section, we first introduce some terms used in this
paper, then define our problem.
Definition 1. (Road Segment): A road segment š is a di-rected (one-way or bidirectional) edge that is associated witha direction symbol (š.ššš), two terminal points (š.š , š.š), anda list of intermediate points describing the segment using apolyline. If š.ššš=one-way, š can only be traveled from š.š to š.š, otherwise, people can start from both terminal points,i.e., š.š ā š.š or š.š ā š.š . Each road segment has a lengthš.ššššš”ā and a speed constraint š.š šššš, which is the maxi-mum speed allowed on this road segment.
Definition 2. (Dynamic Road Network): A dynamic roadnetwork šŗš is a directed graph, šŗš = (šš, šøš), where šš
is a set of nodes representing the terminal points of roadsegments, and šøš is a set of edges denoting road segments.The time needed for traversing an edge is dynamic at leastin the following two aspects: (1) Time-dependent. Typically,the traffic flow on a road surface varies over days of the weekand time of day, e.g., a road could become crowded in rushhours while be quite smooth at other times. (2) Location-variant. Different roads have different time-variant trafficpatterns. For instance, some streets could still be very fasteven in the morning rush. However, the rush hours of a fewroads may last for a whole day.
Definition 3. (Route): A route R is a set of connectedroad segments, i.e., š : š1 ā š2 ā ā ā ā ā šš, where šš+1.š =šš.š, (1 ā¤ š < š). The start point and end point of a routecan be represented as š .š = š1.š and š .š = šš.š.
Definition 4. (Taxi Trajectory): A taxi trajectory šš is asequence of GPS points pertaining to one trip. Each pointš consists of a longitude, latitude and a time stamp š.š”, i.e.,šš : š1 ā š2 ā ā ā ā ā šš, where 0 < šš+1.š” ā šš.š” < ā³š(1 ā¤ š < š). ā³š defines the maximum sampling intervalbetween two consecutive GPS points.
Problem Definition: Given a user query with a start pointšš , a destination šš and a departure time š”š, find the fastestroute š in a dynamic road network šŗš = (šš, šøš) which islearned from a trajectory archive š“.
100
3. OVERVIEWAs shown in Figure 3, the architecture of our system con-
sists of three major components: Trajectory Preprocessing,Landmark Graph Construction, and Route Computing. Thefirst two components operate offline and the third is runningonline. The offline parts only need to be performed onceunless the trajectory archive is updated.
Trajectory Preprocessing : This component first seg-ments GPS trajectories into effective trips, then matcheseach trip against the road network. 1) Trajectory segmenta-tion: In practice, a GPS log may record a taxiās movementof several days, in which the taxi could send multiple pas-sengers to a variety of destinations. Therefore, we partitiona GPS log into some taxi trajectories representing individualtrips according to the taximeterās transaction records. Thereis a tag associated with a taxiās reporting when the taximeteris turn on or off, i.e., a passenger gets in or out of the taxi. 2)Map matching : We employ our IVMM algorithm [14], whichhas a better performance than existing map-matching algo-rithms when dealing with the low-sampling-rate trajectories,to map each GPS point of a trip to the corresponding roadsegment where the point was recorded. As a result, a taxitrajectory is converted to a sequence of road segments.
Landmark Graph Construction : We separate the week-day trajectories from the weekend ones, and build a land-mark graph for weekdays and weekends respectively. Whenbuilding the graph, we first select the top-š road segmentswith relatively more projections (i.e., being frequently tra-versed by taxis) as the landmarks. Then, we connect twolandmarks with a landmark edge if there are a certain num-ber of trajectories passing these two landmarks. Later, weestimate the distribution of travel time of each landmarkedge by using the VE-clustering algorithm. Now, a time-dependent landmark graph is ready for online computation.Figure 4 demonstrates the key concept of our work.
Route Computing : Given a query (šš , šš, š”š), we carryout a two-stage routing algorithm to find out the fastestroute. In the first stage, we perform a rough routing thatsearches the time-dependent landmark graph for the fastestrough route represented by a sequence of landmarks. Inthe second stage, we conduct a refined routing algorithm,which computes a detailed route in the real road network tosequentially connect the landmarks in the rough route.
4. TIME-DEPENDENTLANDMARKGRAPHThis section first describes the construction of the time-
dependent landmark graph, and then details the travel timeestimation of landmark edges.
4.1 Building the Landmark Graph
Figure 3: System overview
Figure 4: Hierarchical architecture
Definition 5. (Landmark): A landmark is one of the top-šroad segments that are frequently traversed by taxi driversaccording to the trajectory archive.
The reason why we use ālandmarkā to model the taxidriversā intelligence is that: 1) The notion of landmarks fol-lows the natural thinking pattern of people, and can giveusers a more understandable and memorable presentationof driving directions beyond detailed descriptions. For in-stance, the typical pattern that people introduce a route toa driver is like this ātake I-405 South at NE 4th Street, thenchange to I-90 at exit 11, and finally exit at Qwest Fieldā.Instead of giving turn-by-turn directions, which a driver can-not remember, people prefer to use a few landmarks (like NE4th Street) that highlight key directions to the destination.2) The sparseness and low-sampling-rate of the taxi trajec-tories do not support the speed estimation for each roadsegment while we can estimate the traveling time betweentwo landmarks. Meanwhile, the low-sampling-rate trajecto-ries cannot offer sufficient information for inferring the exactroute traversed by a taxi (refer to Figure 2). Thus, we canonly use a road segment instead of their terminal points asa landmark.
Here, we detect the top-š road segments as the landmarksinstead of setting up a fixed threshold, since a threshold willvary in the scale of taxi trajectories. Later, we connect twolandmarks according to definitions 6, 7 and 8.
Definition 6. (Transition): Given a trajectory archive š“,a time threshold š”ššš„, two landmarks š¢, š£, arriving time š”š,leaving time š”š, we say š = (š¢, š£; š”š, š”š) is a transition if thefollowing conditions are satisfied:(I) There exists a trajectory šš = (š1, š2, . . . , šš) ā š“, aftermap matching, šš is mapped to a road segment sequence(š1, š2 . . . , šš). ā š, š, 1 ā¤ š < š ā¤ š s.t. š¢ = šš, š£ = šš .(II) šš+1, šš+2, . . . , ššā1 are not landmarks.(III) š”š = šš.š”, š”š = šš .š” and the travel time of this transitionis š”š ā š”š ā¤ š”ššš„.
Definition 7. (Candidate Edge and Frequency): Given twolandmarks š¢, š£ and the trajectory archive š“, let šš¢š£ be theset of the transitions connecting (š¢, š£). If šš¢š£ ā= ā , we sayš = (š¢, š£; šÆš¢š£) is a candidate edge, where
šÆš¢š£ = {(š”š, š”š)ā£(š¢, š£; š”š, š”š) ā šš¢š£}
records all the historical arriving and leaving times. Thesupport of š, denoted as š.š š¢šš, is the number of transitionsconnecting (š¢, š£), i.e., ā£šš¢š£ā£. The frequency of š is š.š š¢šš/š ,denoted as š.šššš, where š represents the total duration oftrajectories in archive š“.
Definition 8. (Landmark Edge): Given a candidate edge šand a minimum frequency threshold šæ, we say š is a landmarkedge if š.šššš ā„ šæ.
101
Definition 9. (Landmark Graph): A landmark graph šŗš =(šš, šøš) is a directed graph that consists of a set of landmarksšš (conditioned by š) and a set of landmark edges šø condi-tioned by šæ and š”ššš„.
The threshold šæ is used to eliminate the edges seldom tra-versed by taxis, as the fewer taxis that pass two landmarks,the lower accuracy of the estimated travel time (between thetwo landmarks) could be. Additionally, we set the š”ššš„ valueto remove the landmark edges having a very long travel time.Due to the low-sampling-rate problem, sometimes, a taximay consecutively traverse three landmarks while no pointis recorded when passing the middle (second) one. This willresult in that the travel time between the first and thirdlandmark is very long. Such kinds of edges would not onlyincrease the space complexity of a landmark graph but alsobring inaccuracy to the travel time estimation (as a fartherdistance between landmarks leads to a higher uncertainty ofthe traversed routes).
We observe (from the taxi trajectories) that different week-days (e.g., Tuesday and Wednesday) almost share similartraffic patterns while the weekdays and weekends have dif-ferent patterns. Therefore, we build two different landmarkgraphs for weekdays and weekends respectively. That is, weproject all the weekday trajectories (from different weeks andmonths) into one weekday landmark graph, and put all theweekend trajectories into the weekend landmark graph.
Figure 5 (A)-(C) illustrate an example of building thelandmark graph. If we set š = 4, the top-4 road segments(š1, š3, š6, š9) with more projections are detected as land-marks. Note that the consecutive points (like š3 and š4)from a single trajectory (šš4) can only be counted once for aroad segment (š10). This aims to handle the situation thata taxi was stuck in a traffic jam or waiting at a traffic lightwhere multiple points may be recorded on the same road seg-ment (although the taxi driver only traversed the segmentonce), as shown in Figure 5 (C). After the detection of land-marks, we convert each taxi trajectory from a sequence ofroad segments to a landmark sequence, and then connect twolandmarks with an edge if the transitions between these twolandmarks conform to Definition 8 (supposing šæ=1 in thisexample). Figure 6 shows the detailed algorithm for land-mark graph building where š is a collection of road segmentsequences (derived from original taxi trajectories).
4.2 Travel Time EstimationSince the road network is dynamic (refer to Definition 2),
we can use neither the same nor a predefined time partitionmethod for all the landmark edges. Meanwhile, as shownin Figure 7(a), the travel times of transitions pertaining to
Algorithm 1: LandmarkGraphConstruction
Input: a road network šŗš, a collection of road segmentsequences š , the number of landmarks š, thethresholds šæ and š”ššš„
Output: landmark graph šŗš
š ā ā , šø ā ā , Count[ ]ā 0;1foreach road segment sequence š ā š do2
foreach road segment š ā š do3Count[r]++4
šš ā Top-š(Count[ ], š);5foreach road segment sequence š ā š do6
š āConvert(š,šš)/* Converted to landmark sequence */7for š ā 1 to ā£šā£ ā 1 do8
š¢ ā š[š], š£ ā š[š + 1] ;9if š£.š” ā š¢.š” < š”ššš„ then10
if šš¢š£ = (š¢, š£; šÆš¢š£) /ā šø then11šø.Insert(šš¢š£)12
šš¢š£.š š¢šš++;13šÆš¢š£ .Add((š¢.š”, š£.š”))14
foreach edge š = (š¢, š£; šÆš¢š£) ā šø do15if š.šššš < šæ then šø.Remove(š) ;16
return šŗš ā (šš, šø);17
Figure 6: Landmark graph construction algorithm
a landmark edge clearly gather around some values (like aset of clusters) rather than a single value or a typical Gaus-sian distribution, as many people expected. This may beinduced by 1) the different number of traffic lights encoun-tered by different drivers, 2) the different routes chosen bydifferent drivers traveling the landmark edge, and 3) driversāpersonal behavior, skill and preferences. Therefore, differ-ent from existing methods [9, 12] regarding the travel timeof an edge as a single-valued function based on time of day,we consider a landmark edgeās travel time as a set of distri-butions corresponding to different time slots. For example,as shown in the bottom part of Figure 5 D), 41 percent ofdrivers need to spend 10ā¼14 minutes (refer to the red bar)to traverse š16 (š1 ā š6) in the time slot 17:00ā19:00, while44 percent of drivers can accomplish this transition with a5ā¼10 minute cost (represented by the yellow bar) and therest of them need less than 5 minutes (denoted as the greenbar).
Moreover, the distribution will change over time of day,e.g., the time slots 9:00ā14:00 and 14:00ā17:00 have differ-ent distributions. Additionally, the distributions of differentedges, such as š13 and š16, change differently over time. Toaddress this issue, we develop the VE-Clustering algorithm,which is a two-phase clustering method, to learn differenttime partitions for different landmark edges based on thetaxi trajectories. In the first phase, called V-clustering, we
7 9 14 17 19 24
0.20.40.60.8
1
time of day (hour)
pro
porti
on 3-5min
5-10min
10-14min
r2
Tr1 r3
r9
r8
r6
r1
Tr2
Tr5
Tr3
Tr4
A) Matched taxi trajectories B) Detected landmarks C) A landmark graph
r9
r3r1
r6
r9
r3r1
r6
p1 p2
D) Travel time estimation
p3 p4
r4
r5r7
r10
e16
e96
e93
e13
e637 11 16 19 21 24
0.20.40.60.8
1
time of day (hour)
prop
ortio
n
2-4min
4-9min
e13
e16
Figure 5: Landmark graph construction
102
0 5 10 15 200
100
200
300
400
500
time of day (hour)
tra
ve
l tim
e (
se
co
nd
s)
(a) Transitions of a landmark Edge
0 5 10 15 200
100
200
300
400
500
time of day (hour)
tra
ve
l tim
e (
se
co
nd
s)
cluster 0cluster 1cluster 2
(b) V-Clustering result
6.9 8.6 10.9 15.5 19.1 21.7 240
0.2
0.4
0.6
0.8
time of day (hour)
pro
po
rtio
n
cluster 0cluster 1cluster 2
(c) VE-Clustering result
Figure 7: An example of VE-Clustering Algorithm
cluster the travel times of transitions pertaining to a land-mark edge into several categories based on the variance ofthese transitionsā travel times. In the second phase, termedE-clustering, we employ the information gain to automati-cally learn a proper time partition for each landmark edge.Later, we can estimate the distributions of travel times indifferent time slots of each landmark edge.
Figure 7 demonstrates an example of the VE-Clusteringalgorithm. Given a landmark edge š = (š¢, š£; šÆš¢š£), our goalis to estimate the travel time from š¢ to š£ based on šÆš¢š£ (šš¢š£
is the collection of (š”š, š”š) pairs of š defined in Definition 7).Figure 7(a) plots the travel time of the transitions (on week-days during 3 months) pertaining to a real landmark edge ina two dimensional space, where the š„ and š¦ axes denote thearriving time (š”š) and travel time (š”š ā š”š) respectively. Asthe number of clusters and the boundary of these clustersvary in different landmark edges, we conduct the followingV-Clustering instead of using some k-means-like algorithmor a predefined partition.
V-Clustering : We first sort šÆš¢š£ according to the valuesof travel time (š”š ā š”š), and then partition the sorted listšæ into several sub-lists in a binary-recursive way. In eachiteration, we first compute the variance of all the travel timesin šæ. Later, we find theābestāsplit point having the minimalweighted average variance (WAV) defined as Equation 1:
WAV(š;šæ) =ā£šæ1(š)ā£
ā£šæā£Var(šæ1(š)) +
ā£šæ2(š)ā£
ā£šæā£Var(šæ2(š)) (1)
where šæ1(š) and šæ2(š) are two sub-lists of šæ split at thešth element and Var represents the variance. This best split
point leads to a maximum decrease of
ā³š (š) = Var(šæ) ā WAV(š;šæ). (2)
The algorithm terminates when maxš{ā³š (š)} is less thana threshold. As a result, we can find out a set of splitpoints dividing the whole list šæ into several clusters š¶ ={š1, š2, . . . , šš}, each of which represents a category of traveltimes.1 As shown in Figure 7(b), the travel times of the land-mark edges have been clustered into three categories plottedin different colors and symbols.
E-Clustering : This step aims to split the x-axis intoseveral time slots such that the travel times have a rela-tively stable distribution in each slot. After V-Clustering,we can represent each travel time š¦š with the category itpertains to (š(š¦š)), and then sort the pair collection šš„š ={(š„š, š(š¦š))}
šš=1 according to š„š (arriving time). The infor-
mation entropy of the collection šš„š is given by:
Ent(šš„š) = ā
šā
š=1
šš log(šš) (3)
where šš is the proportion of a category šš in the collection.The E-Clustering algorithm runs in a similar way to the V-Clustering to iteratively find out a set of split points. Theonly difference between them is that, instead of the WAV,we use the weighted average entropy of šš„š defined as:
WAE(š;šš„š) =ā£šš„š
1 (š)ā£
ā£šš„šā£Ent(šš„š
1 (š)) +ā£šš„š
2 (š)ā£
ā£šš„šā£Ent(šš„š
2 (š))
in the E-Clustering, where šš„š1 and šš„š
2 are two subsets ofšš„š when split at the šth pair. The best split point inducesa maximum information gain[6] which is given by
ā³šø(š) = Ent(šš„š) ā WAE(š;šš„š).
The recursive partition within a set stops iff the MDLPCcriterion is satisfied[6]. As demonstrated in Figure 7(c), wecan compute the distribution of the travel times in each timeslot after the E-Clustering process.
Figure 8 shows the framework of the VE-Clustering al-gorithm where š„š = š”š, š¦š = š”š ā š”š. Figure 9 details theprocedure of V-Clustering. As E-Clustering is similar to V-Clustering, we skip the details of the E-Clustering algorithm.
Algorithm 2: Variance-Entropy-Based Clustering
Input: a set of points š = {(š„š, š¦š)šš=1} ā R Ć R
Output: a sequence of distributions š·1, š·2, . . . , š·š
šš¦ ā sorted sequence {š¦š}šš=1 order by š¦š asending;1
š¦ š šššš” ā ā ;2š¦ š šššš” āV-Clustering(šš¦,šæš£,š¦ š šššš”);3š¶ = {š1, š2, . . . , šš} āConvert(šš¦, š¦ š šššš”);4/* Convert šš¦ into clusters according to š¦_š šššš” */
šš„š ā sort {(š„š, š(š¦š))šš=1} order by š„š asending;5
/* š(š¦š) ā š¶ is the cluster of š¦š */
š„ š šššš” ā ā ;6š„ š šššš” āE-Clustering(šš„š,šæš,š„ š šššš”);7/* Divide x-axis into several slots */
for š ā 1 to ā£š„ š šššš”ā£ do8š·š āComputeDistribution(šš„š,š,š„ š šššš”);9/* Compute the distribution of slot š */
return š· = {š·1, š·2, . . . , š·š};10
Figure 8: Variance-Entropy-Based Clustering
1We can use some outlier detection algorithms or intervalestimation approaches to handle noisy points.
103
Algorithm 3: V-Clustering
Input: sorted sequence šæ = {š¦š}šš=1, threshold šæš£ , a set of
split points &š¦ š šššš”; /* a global variable */
Output: a set of split points š¦ š šššš”š āVar({š¦š}
šš=1); /* the initial variance */1
š ā² ā minš{WAV(š;šæ)};2š ā argminš{WAV(š;šæ)};3if š ā š ā² < šæš£/ā£šæā£ then return š¦ š šššš”;4else5
š¦ š šššš”.Add(š);6
š¦ š šššš” āV-Clustering(šæš1,šæš£,š¦ š šššš”);7
š¦ š šššš” āV-Clustering(šæš2,šæš£,š¦ š šššš”);8
Figure 9: V-Clustering procedure
5. ROUTE COMPUTINGThis section introduces the routing algorithm, which con-
sists of two stages: rough routing in the landmark graph andrefined routing in the real road network.
5.1 Rough RoutingBesides the traffic condition of a road, the travel time of a
route also depends on drivers. Sometimes, different driverstake different amounts of time to traverse the same routeat the same time slot. The reasons lie in a driverās drivinghabit, skills and familiarity of routes. For example, peoplefamiliar with a route can usually pass the route faster thana new-comer. Also, even on the same path, cautious peoplewill likely drive relatively slower than those preferring todrive very fast and aggressively. To catch the above factorcaused by individual drivers, we define the optimism indexas follows:
Definition 10. (Optimism Index ) The optimism index š¼indicates how fast a person would like to drive as comparedto taxi drivers. The higher rank (position in taxi drivers),the faster the person would like to drive.
For example, š¼ = 0.9 means a person usually drives asfast as the top 10% (i.e., 1-0.9) fast-driving taxi drivers. 0.2means that drivers can only outperform the bottom 20% oftaxi drivers. In practise, the š¼ can be learned from a driverāshistorical trajectories, or set by themselves, or be configuredto the mean expected value (or the median) if not given.
Given a userās optimism index š¼, we can determine his/hertime cost for traversing a landmark edge š in each time slotbased on the learnt travel time distribution. For example,Figure 10(a) depicts the travel time distribution of an land-mark edge in a given time slot (š1 ā¼ š5 denotes 5 categoriesof travel times). Then, we convert this distribution into acumulative frequency distribution function and fit a contin-uous cumulative frequency curve [1] shown in Figure 10(b).
c1 c2 c3 c4 c50
0.1
0.2
0.3
0.4
clusters
0.10
0.30 0.32
0.25
0.03
pro
po
rtio
n
(a) Travel time distribution
150170 225 255 305 3500
0.3
0.7
1
0.10
0.40
0.72
0.97 1.00
travel time (seconds)
Ī±
272197
(b) Cumulative frequency
Figure 10: Travel time w.r.t. optimism index
Note this curve represents the distribution of travel time in agiven time slot. That is, the travel times of different driversin the same time slot are different. So, we cannot use asingle-valued function. For example, given š¼=0.7, we canfind out the corresponding travel time is 272 seconds, whileif we set š¼=0.3 the travel time becomes 197 seconds.
Now the rough routing problem becomes the typical time-dependent fastest path problem. The complexity of solv-ing this problem depends on whether the network satisfiesthe āFIFOā (first in, first out) property2: āIn a networkšŗ = (š,šø), if A leaves node š¢ starting at time š”1 and Bleaves node š¢ at time š”2 ā„ š”1, then B cannot arrive at š£ be-fore A for any arc (u,v) in šøā. In practise, many networks,particularly transportation networks, exhibit this behavior[3]. If a driverās route spans more than one time slot, we usethe method proposed in[4] to refine the travel time cost tobe FIFO.
In the rough routing, we first search š (in our system,we set š = 3) nearest landmarks for šš and šš respec-tively (a spatial index is used), and formulate š Ć š pairof landmarks. For each pair of landmarks, we find the time-dependent fastest route on the landmark graph by using theLabel-Setting algorithm [3]. For any visited landmark edge,we use the optimism index (or expected travel time) to de-termine the travel time. The time costs for traveling fromšš and šš to their nearest landmarks are estimated in termsof speed constraint.
For example, in Figure 11 (A), if we start at time š”š = 0,the fastest route from šš to šš is šš ā š3 ā š4 ā šš. Whenwe arrive at š3, the time stamp is 0.1, the travel time ofš34 is 1, then the total time of this route is 0.1+1+0.1=1.2.However, if we start at š”š = 1, the route šš ā š1 ā š2 ā ššnow becomes the fastest rough route since when we arrive atš3, the travel time of the š34 becomes 2 and the total timeof the previous route is now 2.2.
Figure 11: Route computing
5.2 Refined RoutingThis stage finds in the real road network a detailed fastest
route that sequentially passes the landmarks of a rough routeby dynamic programming. Assume š1, š2, . . . , šš is the land-mark sequence obtained from the rough routing. RecallDefinition 1, each landmark šš has its start point šš.š andend point šš.š. Let šš (š) and šš(š) be the earliest leav-ing times (after traversing šš) at nodes šš.š and šš.š re-spectively. Let š (š, š, š) be the travel time of the fastestroute from road node š to š without crossing node š. Letš”š š(š) = šš.ššššš”ā/šš.š šššš, i.e., the time (estimated basedon speed constraint) for traveling from šš.š to šš.š, and
š”šš (š) =
{š”š š(š) if šš is bidirectional
ā if šš is one-way.
2If the network is non-FIFO, the problem is at least NP-Hardwhen waiting at a node is not allowed[12].
104
Using these notations, we have the initial states šš (1) andšš(1) as follows:
šš (1) = š (šš , š1.š, š1.š ) + š”šš (1)
šš(1) = š (šš , š1.š , š1.š) + š”š š(1)(4)
As shown in Figure 11 (B), let š šš š = š (šš.š , šš+1.š, šš+1.š )
denote the time of the fastest route (using speed constraintin real road network) which starts from point šš.š and endsat point šš+1.š without crossing šš+1.š in road network šŗš.Then š š
šš, š šš š , š š
šš can be similarly defined. Now we havethe state transition equations:
šš (š + 1) = min{šš (š) + š šš š, šš(š) + š š
šš} + š”šš (š + 1)
šš(š + 1) = min{šš (š) + š šš š , šš(š) + š š
šš } + š”š š(š + 1)(5)
After šš (š) and šš(š) are computed, the total travel timefor the optimal route in the real road network is:
min{šš (š) + š (šš.š , šš, šš.š), šš(š) + š (šš.š, šš, šš.š )}
In practise, we can compute š šš š, š š
šš, š šš š , š š
šš and corre-sponding routes in parallel (for 1 ā¤ š ā¤ š ā 1) by utilizingthe Dijkstra or A*-like Algorithms with a simple modifica-tion (by ignoring node š). Then the final route is a by-product of the dynamic programming since we only need todetermine the direction for each landmark road segment.
6. EVALUATIONIn this section, we conduct extensive experiments using
both synthetic queries and in-the-field evaluations.
6.1 Settings
6.1.1 DataRoad Network: We perform the evaluation based on the
road network of Beijing, which has 106,579 road nodes and141,380 road segments.Taxi Trajectories: We build our system based on a real
trajectory dataset generated by over 33,000 taxis over a pe-riod of 3 months. The total distance of the data set is morethan 400 million kilometers and the total number of GPSpoints reaches 790 million. The average sampling interval ofthe data set is 3.1 minutes per point and the average distancebetween two consecutive points is about 600 meters. Afterthe preprocessing, we obtain a trajectory archive containing4.96 million trajectories.Real-User Trajectories: We use a 2-month driving history
of 30 real drivers recorded by GPS trajectories to evaluatetravel time estimation. This data is a part of the releasedGeoLife dataset[17, 16], and the average sampling interval isabout 10s. That is, we can easily determine the exact roadsegments a driver traversed and corresponding travel times.
6.1.2 Evaluation FrameworkWe evaluate our work according to the following 3 steps.1) Evaluating landmark graphs: We build a set of land-
mark graphs with different values of š ranging from 500 to13000. The threshold šæ is set to 10, i.e., at least ten timesper day traversed by taxis (in total over 900 times in a periodof 3 months) and š”ššš„ is set to 30 minutes. We project eachreal-user trajectory to our time-dependent landmark graph,and use the landmark graph to estimate the travel time of
the trajectory. We study the accuracy of the time estima-tion changing over š and š¼. We also investigate the accuracychanging over the scale of the taxi trajectory dataset.
2) Evaluation based on synthetic queries: We generate1200 queries with different geo-distances and departure times.The geo-distances between the start point and destinationranges from 3 to 23km and follows a uniform distribution.The departure time ranges from 6am to 10pm and was gen-erated randomly in different time slots.
We compare our approach with the speed-constraint-based(denoted as SC) method and a real-time-traffic-analysis-based(termed RT) method in the aspects of efficiency and effec-tiveness. The SC method (offered by Google and Bing Maps)is based on the shortest path algorithm like A* using thespeed constraint of each road segment. The RT method firstestimates the speed of each segment at a given time accord-ing to the road sensor readings and the GPS readings of thetaxis traversing on the road segment, and then calculates thefastest route according to the estimated speeds.
3) In-the-field evaluation: We conduct two types of in thefield studies: 1) The same driver traverses the routes sug-gested by our method and baselines at different times. 2)Two drivers (with similar driving skills and habits) traveldifferent routes (recommended by different methods) simul-taneously. As shown in Table 1, we conducted both types ofevaluations multiple times and recorded each traverse witha GPS logger. Later, we perform a statistical comparison inthe aspects of travel time and distance.
Table 1: Trajectories of the In-the-field Study
Evaluation 1 Evaluation 2
Num. Trajectories 360 60Num. Users 30 2Total Distance (km) 5304 814Total Duration (hour) 165.24 25.09Evaluation Days 10 6
6.2 Evaluating Landmark GraphsFigure 12 visualizes two landmark graphs when š = 500
and š = 4000. The red points represent landmarks andblue lines denote landmark edges. Generally, the graph (š =4000) well covers Beijing city, and its distribution follows ourcommonsense knowledge.
(a) š=500 (b) š=4000
Figure 12: Visualized landmark graphs
In Table 3, the second column (SR(N)) denotes the storageratio between the number of landmarks and that of originalroad nodes. The third column shows the number of land-mark edges. The last column (SR(E)) is the storage ratiobetween the number of landmark edges and that of road seg-ments. Clearly, our landmark graph is only a small subset of
105
Table 3: Storage Cost of Landmark Graphs
š SR(N) Num. Edges SR(E)
2,000 0.019 5,518 0.0394,000 0.038 10,999 0.0786,000 0.056 15,850 0.1128,000 0.075 21,219 0.1501,0000 0.094 25,901 0.1831,2000 0.113 30,901 0.219
the original road network and the storage cost is lightweight.
By mapping the real-user trajectories to the road seg-ments, we calculate the travel times of these trajectoriesbased on our landmark graph. Then, we compare our esti-mated time with the real travel time (logged by GPS) usingthe criteria error ratio (ER) defined by:
ER =estimated timeāreal travel time
real travel time.
For example, as shown in Figure 13, a route is comprisedof 7 road segments, where š1, š3, š5 and š7 are landmarksand š1 ā š3 and š5 ā š7 are two landmark edges. Let š”1represent the travel time of landmark edge š1 ā š3 givena departure time š”0 and an optimism index š¼. If a roadsegment is not covered by a landmark edge, like š4, we use itslength divided by the speed constraint to estimate the timecost. So, the estimated time cost of this route is š”1 + š”2 + š”3,and ER=((š”1 + š”2 + š”3) ā š )/š where š is the real traveltime of this route.
Figure 13: Time estimation for usersā routes
Figure 14 shows the ER changing over š and š¼ basedon the real-user trajectories. Clearly, when š¼ = 0.6, theerror ratio achieves the best performance (especially whenš = 10000, ERā¤ 1%). The results validate that the land-mark graph can well model the dynamic road network andprecisely estimate the travel time of a route.
0 0.2 0.4 0.6 0.8 1ā0.2
0
0.2
0.4
0.6
optimism index Ī±
err
or
ratio
k=4000
k=6000
k=8000
k=10000
Figure 14: Error ratio w.r.t. optimism index
Figure 15 illustrates the ER changing over the the scale oftaxi trajectories (measured by the number of taxis per km2).The results show that we can get an acceptable performanceas long as there are over 5 taxis in a region of 1km2.
6.3 Evaluation Based on Synthetic QueriesWe use two criteria (Fast Rate 1 and Fast Rate 2) to com-
pare the effectiveness between method A and method B (B
5 10 15 20ā0.1
0
0.1
0.2
0.3
number of taxis /km2
err
or
ratio
Ī±=0.7
Ī±=0.6
Ī±=0.5
(a) š=4000
5 10 15 20ā0.1
0
0.1
0.2
0.3
number of taxis /km2
err
or
ratio
Ī±=0.7
Ī±=0.6
Ī±=0.5
(b) š=10000
Figure 15: Error ratio over scale of taxi trajectories
is the baseline):
FR1 =Number(Aās travel time<Bās travel time)
Number(queries)
FR2 =Bās travel timeāAās travel time
Bās travel time.
FR1 represents how many routes suggested by method A arefaster than that of baseline method B, and FR2 reflects towhat extent the routes suggested by A are faster than thebaselineās. Meanwhile, we use SR to represent the ratio ofmethod Aās routes being equivalent to the baselineās.
Figure 16 and 17 and Table 4 show the overall performance(FR1, FR2 and SR) of our method. When calculating theFR1, FR2, and SR, both our method and the RT approachuse the SC method as a baseline.
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
Ī±
FR
1
k=3000
k=6000
k=9000
(a) w.r.t. š¼
1000 3000 5000 7000 900011000130000
0.2
0.4
0.6
0.8
1
k
ratio FR1
SR
FR1+SR
(b) w.r.t. š
Figure 16: Overall performance
Table 4: FR1, SR ofTDrive and RT methods
š¼ š FR1 SR
0.4 6,000 0.509 0.2810.4 9,000 0.647 0.2220.6 6,000 0.511 0.2720.6 9,000 0.653 0.2160.7 6,000 0.544 0.227
TDrive
0.7 9,000 0.672 0.214
RT approach 0.206 0.671
>0 >0.1 >0.2 >0.3 >0.4 >0.50
0.2
0.4
0.6
pro
port
ion o
f ro
ute
s
FR2
k=5000
k=7000
k=9000
RT
Figure 17: FR2 over š
Figure 16 studies the overall FR1 of our method changingover š and š¼. When š = 9000, the lowest FR1 is still over60%, i.e., 60% of the routes suggested by our method arefaster than that of the SC approach. Figure 16(b) furtherdetails the FR1 and SR of our method when š¼ = 0.7 (due tothe page limitation, we only present the results of a few š¼ inthe later evaluations). Here, FR1 is being enhanced with theincrease of š when š < 9000, and becomes stable when š >9000. That is, it is not necessary to keep on expanding thescale of a landmark graph to achieve a better performance.Also, as shown in Table 4, our method outperforms the RTapproach in terms of FR1, and most routes (67%) suggestedby the RT approach are the same as that of the SC method.Figure 17 plots the FR2 of ours and RT. For example, when
106
š = 9000, over 50% routes suggested by our method are atleast 20% faster than the SC approach.
We further study the FR1 of our approach and RT in dif-ferent time slots in Figure 18, and investigate their perfor-mance influenced by the geo-distance between the start pointand destination of a query in Figure 19. As shown in Figure18, both our method and the RT approach have a stable per-formance in different time slots on weekdays and weekends.Moreover, our method has a 30% (on average) improvementover the RT approach when š ā„ 5000. As depicted in Figure19, the FR1 of both our method and the RT approach growas the distance increases, and our method is more capableof answering queries with a longer distance, e.g., FR1>0.8when distance>20KM. As a longer distance between a sourceand destination means that more road segments with vari-ous traffic conditions are involved, our method and RT bothshow a clear advantage over the SC approach.
6 9 12 15 18 21
0.2
0.4
0.6
time of day (hour)
FR
1
k=5000
k=7000
k=9000
RT
(a) Weekdays
6 9 12 15 18 21
0.2
0.4
0.6
time of day (hour)
FR
1
k=5000
k=7000
k=9000
RT
(b) Weekends
Figure 18: FR1 w.r.t. time of day
6 9 12 15 18 210
0.2
0.4
0.6
0.8
1
distance (km)
FR
1
k=5000
k=7000
k=9000
RT
Figure 19: FR1 overgeo-distance
4000 6000 8000 10000 120000
10
20
30
40
k
ave
rage a
ccess
# (
Ć 1
03)
rough routing
refined routing
our total
SC
RT
Figure 20: Average accesscost per query
The reason why our method outperforms the RT approachis: 1) Coverage: Many road segments have neither embeddedroad sensors nor taxis traveling on them at a given time.At this moment, the speed constraint of a road segment isused to represent the real time traffic on the road segment.That is also the reason why the RT approach returns manyof the same routes as the SC method. 2) Spareness: ina time interval, the estimated travel time is still not veryaccurate if the number of the taxis is not large enough. 3)Open challenges: As compared to the history-based method,the RT approach is more vulnerable to noise, such as trafficlights, human factors (pedestrians crossing a street), andtaxis looking for parking places and passengers.
Though outperforming the baselines, our method still hasless than 12% (see Table 4, š¼=0.7, š=9000) of routes fallingbehind the SC method. The reason is that: 1) our timeestimation method is not perfect, and 2) the three challengesmentioned in the Introduction cannot be fully tackled 100%.However, after studying these fall-behind routes, we find thatthey are only slightly (on average, FR2=-3%) slower than the
SC method.Figure 20 reports the efficiency of our method by using
the average node accesses per query. Obviously, our two-stage routing approach is more efficient than the baselines.The reasons are that: 1) The landmark graph is a smallsubset of the original road network; 2) the rough route hasspecified the key directions, hence, reduces the searching area(Figure 21 gives an example); 3) the detailed route betweentwo consecutive landmarks can be computed in parallel.
(a) The SC method (b) T-Drive
Figure 21: An example of searching areas
6.4 In-the-Field EvaluationTable 5 and Table 6 respectively show the results of the
two types of in-the-field evaluations (refer to Section 6.1.2for details). In these two tables, the symbol ā³ stands forthe difference value of distance or duration, š 1 representsthe ratio of our routes outperforming the baseline (GoogleMaps), and š 2 denotes to what extent our routes are beyondthat of the baseline. For example, as shown in Table 5, 80.8%of the routes suggested by our T-Drive system are faster thanthat of Google Maps and on average our routes save 11.9%of time (T-test: š < 0.005). In Table 6, we also recordthe wait time of a route which indicates how long a driveremained stationary due to the traffic lights or traffic jams,e.g., on average, our routes save 26.7% more time than thebaseline approach. When doing the in-the-field evaluations,we set š¼=0.6 and š=9000, where š¼ is learned from theseusersā driving histories (trajectories, refer to Figure 14).
Table 5: In-the-field Evaluation 1T-Drive Google ā³ R1 R2
Distance 13.91km 15.56km 1.65km 0.517 0.106Duration 25.80min 29.28min 3.48min 0.808 0.119
Table 6: In-the-field Evaluation 2T-Drive Google ā³ R1 R2
Distance 13.58km 13.55km -0.03km 0.367 -0.002Duration 23.18min 27.00min 3.82min 0.750 0.141WaitTime 4.77min 6.50min 1.73min 0.633 0.267
7. RELATEDWORK
7.1 Time-Dependent Fastest RouteThe time-dependent fastest route problem is first consid-
ered in [2]. [5] suggested a straightforward generalization ofthe Dijkstra algorithm but the authors did not notice it doesnot work for a non-FIFO network[12]. However, under theFIFO assumption, paper [3] provides a generalization of Di-jkstra algorithm that can solve the problem with the sametime complexity as the static fastest route problem.
7.2 Traffic-Analysis-Based Approaches
107
As a very complex problem, urban traffic flow analysishas been studied based on the readings of road sensors andfloating-car-data [9, 13]. These works follow the paradigmofāsensor dataātraffic flowādriving directionā, and are use-ful in detecting unexpected traffic jams and accidents. Themajor challenge of such kinds of solutions is the small cov-erage and sparse density of the sensor data. For example,the traversing speed of a highway with enough road sen-sors or floating cars can be accurately estimated, while theinferred speed of many service roads, streets and lanes (with-out enough sensors) are not that precise[8]. Given that userscan select any locations as destinations, sometimes the pathfinding algorithms based on the inferred real-time trafficmight not perform as well as we expect.
Different from the above methods, our approach is basedon many taxi driversā intelligence mined from their histor-ical trajectories. This intelligence has implied all the keyfactors (including traffic flows and signals, etc.) for findinga fast driving route. Actually, GPS-embedded taxis can beregarded as mobile sensors probing real-time traffic on roads,and the accumulated historical GPS trajectories reflect thelong-term traffic patterns of a city. As the traffic flows of acity follow some patterns in most cases, our method is veryvaluable in finding practically fast driving routes for users.
7.3 History-Learning-Based ApproachesZheng et al.[17, 16, 15] propose several novel approaches
to learn the transportation modes from GPS data. Papers[10, 18] present some probabilistic based methods to predicta driverās destination and route based on historical GPS tra-jectories. Although [18] also uses GPS trajectories generatedby 25 taxis, this work aims to predict a driverās destinationinstead of providing the fastest route that a user can follow.Paper [7] computes the fastest route by taking into accountthe driving and speed patterns learned from historical GPStrajectories. Our method differs from this work in the fol-lowing aspects. First, we do not explicitly detect speed anddriving patterns from the taxi trajectories. Instead, we usethe concept of landmarks to summarize the intelligence oftaxi drivers. The notion of landmarks follows peopleās nat-ural thinking patterns, and can improve efficiency of routefinding. Second, our approach is driven by a real datasetwhile paper [7] is based on the assumption of synthetic data.Actually, real data causes some challenges, e.g., low sam-pling rate and sparseness of trajectories. Moreover, we con-sider the time-variant and location-dependent properties ofreal-world traffic flows.
8. CONCLUSIONThis paper presents an approach that finds out the practi-
cally fastest route to a destination at a given departure timein terms of taxi driversā intelligence learned from a largenumber of historical taxi trajectories. In our method, wefirst construct a time-dependent landmark graph, and thenperform a two-stage routing algorithm based on this graphto find the fastest route. We build a real system with real-world GPS trajectories generated by over 33,000 taxis in aperiod of 3 months, and evaluate the system with extensiveexperiments and in-the-field evaluations. The results showthat our method significantly outperforms both the speed-constraint-based and the real-time-traffic-based method inthe aspects of effectiveness and efficiency. Given over 5 taxisin a region of 1km2, more than 60% of our routes are faster
than that of the speed-constraint-based approach, and 50%of these routes are at least 20% faster than the latter. Onaverage, our method can save about 16% of time for a trip,i.e., 5 minutes per 30-minutes driving.
We agree that a recommended route would become crowdedif many people take it. This is the common problem ofpath-finding, and this problem is even worse (than ours)in present shortest-path and real-time-traffic-based methods(as our method can be customized for different drivers). Inthe future, we can reduce this problem by using some strate-gies, such as load balance (offer top three routes) and dataupdate (in a relatively fast frequency). Another direction inwhich we are going to move forward is combining real-timetraffic information with our approach.
9. REFERENCES[1] R. Chhikara and L. Folks. The inverse Gaussian
distribution: theory, methodology, and applications. 1989.[2] K. Cooke and E. Halsey. The shortest route through a
network with time-dependent internodal transit times. J.Math. Anal. Appl, 14(492-498):78.
[3] B. C. Dean. Continuous-time dynamic shortest pathalgorithms. Masterās thesis, Massachusetts Institute ofTechnology, 1999.
[4] B. Ding, J. Yu, and L. Qin. Finding time-dependentshortest paths over large graphs. In Proc. EDBT, pages205ā216. ACM, 2008.
[5] S. Dreyfus. An appraisal of some shortest-path algorithms.Operations Research, 17(3).
[6] Fayyad and Irani. Multi-interval discretization ofcontinuous-valued attributes for classification learning.Proc. IJCAI, pages 1022ā1027, 1993.
[7] H. Gonzalez, J. Han, X. Li, M. Myslinska, and J. Sondag.Adaptive fastest path computation on a road network: Atraffic mining approach. In Proc. VLDB, 2007.
[8] A. Guhnemann, R. Schafer, K. Thiessenhusen, andP. Wagner. Monitoring traffic and emissions by floating cardata. Institute of transport studies Australia, 2004.
[9] E. Kanoulas, Y. Du, T. Xia, and D. Zhang. Finding fastestpaths on a road network with speed patterns. In Proc.ICDE, 2006.
[10] J. Krumm and E. Horvitz. Predestination: Inferringdestinations from partial trajectories. LNCS, 4206:243ā260.
[11] Y. Lou, C. Zhang, Y. Zheng, X. Xie, W. Wang, andY. Huang. Map-matching for low-sampling-rate GPStrajectories. In Proc. GIS. ACM, 2009.
[12] A. Orda and R. Rom. Shortest-path and minimum-delayalgorithms in networks with time-dependent edge-length.JACM, 37(3):625, 1990.
[13] D. Pfoser, S. Brakatsoulas, P. Brosch, M. Umlauft,N. Tryfona, and G. Tsironis. Dynamic travel time provisionfor road networks. In Proc. GIS. ACM, 2008.
[14] J. Yuan, Y. Zheng, C. Zhang, and X. Xie. Aninteractive-voting based map matching algorithm. In ProcMDM, 2010.
[15] Y. Zheng, Y. Chen, Q. Li, X. Xie, and W. Ma.Understanding transportation modes based on GPS datafor Web applications. ACM Transactions on the Web,4(1):1ā36, 2010.
[16] Y. Zheng, Q. Li, Y. Chen, X. Xie, and W. Ma.Understanding mobility based on GPS data. In Proc.Ubicomp, pages 312ā321, 2008.
[17] Y. Zheng, L. Liu, L. Wang, and X. Xie. Learningtransportation mode from raw gps data for geographicapplications on the web. In Proc. WWW, pages 247ā256,2008.
[18] B. Ziebart, A. Maas, A. Dey, and J. Bagnell. Navigate like acabbie: Probabilistic reasoning from observed context-awarebehavior. In Proc. Ubicomp, pages 322ā331.
108