Traffic Prediction in a Bike- Sharing System Yexin Li, Yu Zheng, Huichu Zhang, Lei Chen The Hong...
-
Upload
joshua-cunningham -
Category
Documents
-
view
215 -
download
1
Transcript of Traffic Prediction in a Bike- Sharing System Yexin Li, Yu Zheng, Huichu Zhang, Lei Chen The Hong...
Traffic Prediction in a Bike-Sharing System
Yexin Li, Yu Zheng, Huichu Zhang, Lei ChenThe Hong Kong University of Science and Technology
Microsoft Research, Beijing, China
Bike-sharing systems are widely available Bike-Sharing System
Origin station
Destination station
Check out a bike
Check in the bike
Ride
Check out a bike Ride to destination Check in the bike
Current Problem
Spatial distribution
Skewed distributions of Bike Usage
05
10152025
Temporal distribution
No bikes No docks
Predict bike usages at each station
An Idea Solution
Reallocate bikes by trucks
8am 9am 10am 11am
S1
S2
S1
S2
Bike usage is chaotic at an individual station !
1st 4th 7th 10th 13th 16th 19th 22th 25th 28th 31th
A Practical Solution
day hourhour0 5 10 15 25
1
1.5
2
0.5
S1
S2
C2
C1
Tra
nsit
ion
Var
.
E) Transition Var.
Tran
sitio
n Va
r.
1 20 40
60
120
180
dayD) check-out of C1
Chec
k-ou
t 7-8am C1
Observations Bike usage of a cluster is more predictable. Inter-cluster transition is more stable.
Prediction for each station is unnecessary Users check out/in bikes at a random station Events affect an area instead of a station
Our solution Cluster stations into groups Predict bike usage of each station cluster Reallocate bike between station clusters
8am 9am 10am
Impacted by multiple factors Meteorology
Challenges Cluster definition
Features considered when clustering
Data imbalance # Sunny hours >> # Rainy hours (11.7, 4.6 mph) never happened in
NYC, during 01/4-31/9, 2014 Weather distribution
Snowy
Sunny
Foggy
Rainy
Weather distributionTime/ hTemp/ oC
WS/
mp
h
5×1033110
30
10
20
0
Temperature & Wind Speed sample
Correlation between clusters Events
Larger check-out at A Larger check-in at B
AB
Correlation between clusters
Framework of Our Solution
Bipartite station clustering
Check-outPredict bike usage of the entire city
… …
0.2
0.1
Hierarchical Prediction
Predict check-out proportion
Check-in
Learning
Check-in Inference
Probability & Expectation
Check-inTransition matrix
Trip duration
Check-out
Motivation of Bipartite Station Clustering Stations in one cluster should be closed to each other. Stations in one cluster should perform similarly.
Inter-cluster transition is more stable. Check-out proportion is more stable.
Less stable
10000
More stable
25.025.025.025.00
C1
C2
C3
C4
C5
C1
C2
C3
C4
C5
Bipartite Station Clustering Procedure
Geo-clustering, i.e., K1 Clusters T-matrix generation T-clustering, i.e., K2 Clusters
I
TS1: (0.1, 0.3, 0.2, 0.4)
0.1
T-matrix of
42
31 1
12
3
0.3
0.2
0.4
0.1
42
31 1
12
3
0.2
0.4
0.3
0.5
42
31 1
12
3
0.2
0.2
0.1
TS2: (0.1, 0.2, 0.4, 0.3) TS7: (0.5, 0.2, 0.2, 0.1)I
Ax=
SxSx Sx
Sx
T-matrix Generation
I
TS1: (0.1, 0.3, 0.2, 0.4)
0.1
T-matrix of
42
31 1
12
3
0.3
0.2
0.4
0.1
42
31 1
12
3
0.2
0.4
0.3
0.5
42
31 1
12
3
0.2
0.2
0.1
TS2: (0.1, 0.2, 0.4, 0.3) TS7: (0.5, 0.2, 0.2, 0.1)I
Ax=
SxSx Sx
Sx
ۏێێێێۍ
0.1 0.3 0.2 0.40.1 0.2 0.4 0.3. . . .. . . .. . . .0.5 0.2 0.2 0.1 ےۑۑۑۑې
… …
Motivation of Hierarchical Prediction Bike usage in the entire city is more regular
can be predicted more accurately. Bound the total prediction error in the lower level
Entire Traffic
Che
ck-o
ut
1 20 40
1.2
2.4
3.6×103
dayEntire Traffic
day
Check-out of a cluster
1 20 40
60
120
180
day
Che
ck-o
ut
day
Predict bike usage of the entire city
Predict check-out proportion
… …
0.2
0.1
Hierarchical Prediction
Bike Usage of the Entire City Solution Gradient Boosting Regression Tree, i.e., GBRT
Day Hour
Weather
Ent
ire
traf
fic
6:00am-7:00am 7:00am-8:00am 8:00am-9:00am1
0.5
×103
2
1
×103
0 10 20 30
3.5
2
×103
day day day0 10 20 30 0 10 20 30
4
2
× 103
0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20 0 10 20
4
2
× 103
4
2
× 103
4
2
× 103
3.52
× 103
2.5
1
× 103
2.5
1
× 103Mon. Tue. Wed. Thu. Fri. Sat. Sun.
13th , Aug. Rainy 25th , Sep. Windy
0 50 100 150 200
×104
4
3
2
1
Feb. – Aug., 2014
day
Aug.Feb.
Ent
ire
traf
fic 1.4
0.9
×103
0.4
0 10 20 30
8:00am-9:00am
day0 50 100 150 200
×104
4
3
2
1
Feb. – Aug., 2014
day
Aug.Feb.
Ent
ire
traf
fic 1.4
0.9
×103
0.4
0 10 20 30
8:00am-9:00am
day
Temperature keeps increasing
Wind speed
Temperature
Features Extraction
Check-out Proportion Prediction�� t=
∑𝑖=1
𝐻
𝑊 ( 𝑓 𝑖 , 𝑓 𝑡 )× 𝑃𝑖
∑𝑖=1
𝐻
𝑊 ( 𝑓 𝑖 , 𝑓 𝑡)𝑃 t𝑃 𝑡− 1𝑃 𝑡−𝐻 𝑃 𝑡−𝐻+2𝑃 𝑡−𝐻+1
… …
W(𝑓𝑖 , 𝑓𝑡 ) = 𝜆1(𝑖, 𝑡) × 𝜆2(𝑤𝑖 , 𝑤𝑡) × 𝐾((𝑝𝑖 , 𝑣𝑖 ), (𝑝𝑡, 𝑣𝑡 )) λ1 (𝑡 1 ,𝑡 2 )=1𝑡 1 ,𝑡 2
× 𝜌1∆ h(𝑡1 , 𝑡2 )× 𝜌2
∆ 𝑑(𝑡 1 ,𝑡 2) Time
Weather
snowy
rainy
foggy
sunny
snowy rainy foggy sunny1 α1 α2 α3
1 α4 α5
1 α6
1
=
Temperature & Wind speed
foggy
foggy
1 12
Transition Matrix & Trip Duration
Transition Probability. The probability that a bike will be checked in to cluster 𝐶𝑗 given it is checked out from 𝐶𝑖 in time . 𝑡
Inter-cluster transition
210
C1 C3
Data
Density
x_1 data
fit 1
1.5 ×10-3
Data
Density
fit 1
1.6 ×10-3
1.20.80.4
3 54×103/s
210 3 54Trip duraion Trip duraion ×103/s
C1 C2
Den
sity
0.5
1.0
Trip duration Using a log-normal distribution to fit
C1 C2
C3C4
0.1
0.39
0.5
0.65
0.150.15
0.6
0.1
0.29
0.88
0.05
0.05
0.010.05
0.01 0.02
02.005.088.005.0
29.001.06.01.0
15.015.005.065.0
5.039.01.001.0
Check-in Inference Check-out
Check-in
=
t
jiCCtt
m
j
tC
i DTO
Eij
j
60
0
,,
60
1 1
,
,2 60
iitC EEIi ,2,1,
iBCCtiB jijBj
DTP ,,,,
ijBi, PE
,1
Expectation of on-road bikes to each cluster
Bikes on road
C1
C2
C4
C3
... ...1st min 2nd min 60th min
C1 C2 C4C3
0.10.5
0.30.1
iB jP ,
jB
0.4 0.2 0.3 0.1
2 2 2
Bikes will be borrowed
t<t t+< t+
Experiments Datasets
Citi-Bike Data in New York City Meteorology Data in New York City Capital Bikeshare in Washington D.C. Meteorology Data in Washington D.C.
Metric Error Rate
m
i tC
m
i tCtC
i
ii
X
XXER
1 ,
1 ,, |ˆ|
Data Released: http://research.microsoft.com/apps/pubs/?id=255961
ExperimentsCheck-out All Hours Anomalous Hours Methods GC BC GC BC
HA 0.353 0.355 1.964 1.968ARMA 0.346 0.346 2.276 2.273GBRT 0.311 0.314 0.696 0.683
HP-KNN 0.298 0.299 0.692 0.685HP-MSI 0.288 0.282 0.637 0.503
Clustering Results
Check-in All Hours Anomalous Hours
Methods GC BC GC BC
HA 0.347 0.352 1.837 1.835
ARMA 0.340 0.344 2.152 2.143
GBRT 0.309 0.309 0.681 0.671
HP-KNN 0.302 0.295 0.694 0.684
HP-MSI 0.297 0.290 0.642 0.506
P-TD 0.335 0.302 0.498 0.445
Accuracy improvement >0.03 for all hours
>0.18 for anomalous hours
Conclusions
Bipartite station clustering Cluster stations based on locations and transitions
Hierarchical prediction improves the accuracy Bound the total error in the lower level >0.03 improvement for all hours
Multi-similarity-based model Deal with data imbalance >0.18 improvement for anomalous hours
Thanks !
Contact: Dr. Yu Zheng [email protected] Released Data: http://research.microsoft.com/apps/pubs/?id=255961