Post on 13-Jan-2015
description
Model-based clustering for BSSusage mining,a case study with the velib’ system of Paris
Etienne Côme
15/10/2012
Outline
Bike Sharing Systems (BSS)
What is fun with BSS ?
Relatively new systemsRapidly diffusing (EU and US nowadays, Hangzhou, ...)Important sucessesAbundant usage dataIn interesting and original forms :
I Origins / Destinations + timestampI Real-time stations balances
Interesting and new problematics
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 2 / 75
Outline
Outline
1 IntroductionProblematicsUsage data : trips recordsVelib’ in few numbers and picturesTools and approach
2 Stations clustering using temporal usage profilesData representation : count time seriesGenerative model : naive Poisson mixtureAnalysis of the results on the Velib’ dataset
3 Latent Dirichlet Allocation (LDA) for trips activity recognitionData representation : dynamical O/D matricesGenerative model under LDAAnalysis of the results on the Velib’ dataset
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 3 / 75
Introduction Problematics
Problematics
Operational objectives
Planning new systems : position, size of the stationsQuality of service : bikes re-dispatch,......
Mining objectives
Building predictive model of usageFinding spatio-temporal patternsBetter understanding of the usages...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 4 / 75
Introduction Usage data : trips records
Raw data
Trips data
departure time stampdeparture stationarrival time stamparrival stationtype of subscription
! Will be converted in contingency tables (i.e. tensors of counts)
Data sources
! Velib’, 2 monthOpen data : Barclays (Londre), Boston, ...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 5 / 75
Introduction Velib’ in few numbers and pictures
in few numbers
BSS size :
1200 stations≈ 40000 slots≈ 16000 bikes≈ 100 000 trips/day27% trips = day subscription73% trips = year subscription
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 6 / 75
Introduction Velib’ in few numbers and pictures
Global behavior
Distances (Km)
Trip
s
20 000
0
40 000
60 000
80 000
100 000
0 5 10Duration (min)
20 000
0
40 000
60 000
80 000
100 000
120 000
140 000
0 20 40 60 80 100
Day subscriptionfree use limit
Year subscriptionfree use limit
FIG. 1: Histograms of trips lengths and durations
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 7 / 75
Introduction Velib’ in few numbers and pictures
Temporal effects
Time
Trip
s
5 000
10 000
15 000
20 000
25 000
30 000
35 000
Monday Tuesday Wednesday Saturday SundayThursday Friday
Subscription :
Short
Long
FIG. 2: Number of Trips / hour (short / long subscriptions)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 8 / 75
Introduction Velib’ in few numbers and pictures
Temporal effects
0
2 500
5 000
7 500
0 2 4 6 8 10 12 14 16 18 20 22
Hours
Aver
age
num
ber
of t
rips
FIG. 3: Number of trips in week day / en week-end
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 9 / 75
Introduction Velib’ in few numbers and pictures
Spatial effects
FIG. 4: Incoming trips map [6h,7h] for week days
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 10 / 75
Introduction Velib’ in few numbers and pictures
Spatial effects
Distance from the center ("Les Halles") in Km
Mea
n ac
tivi
ty /
hou
r
4
8
12
16
20
24
2 4 6 8 10
FIG. 5: Stations activities / distance to "Les Halles"
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 11 / 75
Introduction Tools and approach
Approach, exploratory data analysis
General methodologie
Use clustering algorithms to find interesting patterns in the dataConfront the found clusters to the city geography and sociology⇒ Extract important factors that influence BSS system behavior.
2 developments :
1 Find clusters of stations with similar temporal usage pattern2 Find latent activities that govern the BSS system dynamics
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 12 / 75
Introduction Tools and approach
Tools, model based clustering
General methodologie
Imagine a data generation process⇒ which include non-observed or latent variablesLatent variables can be discrete or continuous
Examples of latent variables
Species for flowersTopics for textsCommunities for graph vertices...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 13 / 75
Introduction Tools and approach
Generative approach
Clustering
Model-based clustering :
1 Draw the cluster of sample (i)2 Depending on the cluster draw the observed values of (i)
0 20 40-20-40-60-800
0.01
0.02
0.03
0.04
0.05
x
f(x)
FIG. 6: Example of 1D Gaussian mixture model
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 14 / 75
Introduction Tools and approach
Data generation process
Graphical model representation
1. Draw the cluster of sample (i)
Zi ∼M(1,π)
⇒ π prior proportions of the clusters.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 15 / 75
Introduction Tools and approach
Data generation process
Graphical model representation
2. Depending on the cluster draw the observed values of (i)
p(x|Zik = 1) = f (x; θk ), ∀k ∈ {1, . . . ,K}.
⇒ f can be tuned to exploit specificities of the problem.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 16 / 75
Introduction Tools and approach
Model based clustering framework
Task and tools
Inferring the parameters :⇒ EM algorithm or Variational EM for complex models
Finding the clustering⇒ Byproducts of EMFixing the number of clusters⇒ Model selection criterion : BIC, AIC, ICL, perplexity.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
Introduction Tools and approach
Model based clustering framework
Task and tools
Inferring the parameters :⇒ EM algorithm or Variational EM for complex modelsFinding the clustering⇒ Byproducts of EM
Fixing the number of clusters⇒ Model selection criterion : BIC, AIC, ICL, perplexity.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
Introduction Tools and approach
Model based clustering framework
Task and tools
Inferring the parameters :⇒ EM algorithm or Variational EM for complex modelsFinding the clustering⇒ Byproducts of EMFixing the number of clusters⇒ Model selection criterion : BIC, AIC, ICL, perplexity.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 17 / 75
Stations clustering using temporal usage profiles
Stations clustering usingtemporal usage profiles
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 18 / 75
Stations clustering using temporal usage profiles
Stations clustering using temporal usage profiles
Objectives :
Find groups of stations with similar temporal usage profilesTemporal usage profiles = incoming, outgoing activity / hourTaking into account the week-days /week-end discrepancyWith a model for counts dataCross the results with possible explanatory variables :population, employments, amenities, ...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 19 / 75
Stations clustering using temporal usage profiles Data representation : count time series
Data representation : count time series
Observed data :
X outsdt : # of bikes taken at station s during day d at hour t
X insdt : # of bikes returned at station s during day d at hour t
Xsd = (X insd1, . . . ,X
insd24,X
outsd1, . . . ,X
outsd24)
⇒ X tensor of size N × D × T .⇒ temporal behavior / stations.
Variables
Xsd (observed) : # of bike leaving/comingZs (latent) : cluster of station sWd (observed) : cluster of days (week / week-end)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 20 / 75
Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
Generative model : naive Poisson mixture
FIG. 7: Graphical model representation
Parameters, Θ
αs = stations attractivity effectsπ = (π1, . . . , πK ) cluster proportionsλ = (λklt ) temporal profiles of the clusters
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 21 / 75
Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
Generative model
Naive Poisson mixture
Zs ∼ M(1, π)
Xsd1 ⊥⊥ . . . ⊥⊥ XsdT | {Zsk = 1,Wdl = 1}Xsdt |{Zsk = 1,Wdl = 1} ∼ P(αsλklt )
Constraints
∑l,t
Dlλklt = DT ,∀k ∈ {1, . . . ,K},
with Dl number of day in cluster l .
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 22 / 75
Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
Parameters estimation, likelihood
Marginal likelihood
L(Θ; X) =∑
s
log
∑k
πk∏d ,t ,l
p(Xsdt ;αsλklt )Wdl
(1)
Completed likelihood
Lc(Θ; X,Z) =∑s,k
Zsk log
πk∏d ,t ,l
p(Xsdt ;αsλklt )Wdl
(2)
where Z is unknown.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 23 / 75
Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
EM algorithm⇒ Straightforward solution for parameters estimation EM :
E step
Conditional expectation of Lc given the current parameters
E[Lc(Θ,x,Z)|x,Θ(q)] =∑s,k
tsk log
πk∏d ,t ,l
p(xsdt ;αsλklt )Wdl
(3)
with tsk the posteriori probabilities :
tsk =π
(q)k∏
d ,t ,l p(xsdt ;α(q)s λ
(q)klt )Wdl∑
k π(q)k∏
d ,t ,l p(xsdt ;α(q)s λ
(q)klt )Wdl
(4)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 24 / 75
Stations clustering using temporal usage profiles Generative model : naive Poisson mixture
EM algorithm⇒ Straightforward solution for parameters estimation EM :
M step
Maximization of the lower bound with respect to the parametersαs : mean station activity α̂s = 1
DT∑
d ,t Xsdt ,
πk : proportion of cluster k , π̂k = 1N∑
s tsk
λklt : activity of time frame t for cluster k , for week day or duringthe week-end (day cluster l)
λ̂klt =1∑
s tskαs∑
d Wdl
∑s,d
tskWdlXsdt (5)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 25 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Results
Setting
One month of data (September)Number of clusters (K=8) set manually⇒ good trade off between interpretability and fit of the clustering
Outputs
Zs : station s clustersλk : temporal profile of cluster kαs : stations s attractivity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 26 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Railway stations
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 27 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Railway stations
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 28 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Parks
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 29 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Parks
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 30 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 31 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 32 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night and week-end
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 33 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Spare time, night and week-end
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 34 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Housing
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 35 / 75
Inhabitants / ha
0200400600800
1 0001 200
Housing
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Employment (1)
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 37 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Employment (2)
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 38 / 75
Jobs / ha0
5001 0001 5002 000
Employment (1 and 2)
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Mixed usage
Hours
Activity
0
1
2
3
4
5
0
1
2
3
4
5
Week
0 5 10 15 20
Week-end
0 5 10 15 20
Departures
Arrivals
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 40 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Mixed usage
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 41 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Crossing with population/employments/services rates
hab/ha emp/ha serv/ha com/ha162 237 4.2 3.7
Spare time (1) 367 189 6.3 4.4Spare time (2) 261 322 7.7 6.9Parks 172 90 2 1.7Railway stations 209 206 2.4 1.8Housing 375 108 3.8 2.7Employment (1) 138 409 4.5 2.8Employment (2) 157 456 5.7 5.6Mixed usage 301 163 3.8 2.8
TAB. 1: Mean of each cluster with respect to population, employment,services and shops densities . Sources "Recensement 2008", "Basepermanente des équipements", Insee.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 42 / 75
Stations clustering using temporal usage profiles Analysis of the results on the Velib’ dataset
Conclusion on stations clustering
Discussion on the model
Model adapted to countsScaling factors for stations importantStations described by incoming and outgoing flow dynamicsTaking into account week-day week-end differences
Discussion on the results
Clusters are interpretablePopulation, employment and amenities densities are highlyexplanatory for the clustersTemporal profiles are also interpretable and informative
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 43 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition
Latent Dirichlet Allocation(LDA),
for trips activity recognition
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 44 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition
Objectives
Decompose, the trips into interpretable clusters⇒ look for stationarities and change points in the OD dynamicsLDA with documents = small bags of successive trips
Analyse the found clusters with respect to their :
Temporal positions, cyclesSpatial distribution of flowsSpatial distribution of incoming / outgoing flows per stations
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 45 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Data representation : dynamical O/D matrices
Data representation : dynamical O/D matrices
Observed data :
Xijt : # of bikes that were1 taken at station i2 returned at station j3 at time t
t ∈ {1, . . . ,Nt} :
i , j ∈ {1, . . . ,Ns} : set of stations
⇒ Xijt tensor of dimension Ns × Ns × Nt .⇒ taking into account spatial and temporal BSS behavior
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 46 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA
LDA, background
LDA = Latent Dirichlet Allocation
Bayesian mixture for discrete data⇒ originally to find topics in text corpusEach document (bag of words) is a mixture of topicsEach topic has its own words probabilities vector
FIG. 8: Graphical model representation of LDA.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 47 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA
LDA for dynamical O/D matrices analysis
Hypothesis :
Local stationarity of BSS behaviour / ODCyclostationarity : week, day
Small bags of successive trips ≈ stationarity of OD
⇒ Documents (bags of words) = bags of successive trips (5000)
, with :
Words = Origin/Destination couplesTopics = Latent activities
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 48 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Generative model under LDA
LDA, for dynamical O/D matrices analysis
For each activity a, draw an O/D matrices generator :
Λa ∼ D(β)
For each "bag of trips" t ∈ {1, . . . ,Nt} :
1 Draw the activities proportions : πt ∼ D(α)
2 For each trips of the bag t :I Draw its activity A : A ∼M(1, πt )I Draw an O/D couple D using activity A generator :
D ∼M(1,ΛA)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 49 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Fixing the number of activities
perplexity analysis
Perplexity = f( likelihood of test data )Clear drop off at K=5
●●
●
●●
●
●
●
●
●
●
●
●
●
155000
160000
165000
4 8 12K
perp
lexi
ty
FIG. 9: Perplexity on the September dataset with respect to the number oflatent activities.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 50 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Temporal results : πt
0
3000
avril 11
trip
s/
hour
avril 18
6000
9000
avril 25
FIG. 10: Temporal evolution of πt
Remarks :
Cyclostationarity clearly visible (even holidays)Low mixture between the latent activitiesInterpretable temporal clusters : Home↔Work, Lunch,...
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 51 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
FIG. 11: Latent activity "House→Work commute", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 52 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
FIG. 12: Latent activity "Lunch", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 53 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
FIG. 13: Latent activity "Work→House commute", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 54 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
FIG. 14: Latent activity "Evening", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 55 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : Λa as flows
FIG. 15: Latent activity "Spare time", flows (blue for f=10/10 000)
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 56 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Incoming / Outgoing specificities, question :
Which stations have an increased in/out-degree for a latent activity a ?
Introduce stations incoming specificities ISas and outgoing
specificities OSas :
ISas = log(pina
s/pings ), OSa
s = log(poutas /poutg
s ), (6)
with pinas ,pouta
s the probabilities that a trips end/start in station sfor activity a :
pinas =
∑j
Λajs, pouta
s =∑
j
Λasj ,
and pings ,poutg
s the global probabilities that a trips end/start instation s :
pings =
∑j,t Xjst∑i,j,t Xijt
, poutgs =
∑j,t Xsjt∑i,j,t Xijt
.
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 57 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : incoming specificities
FIG. 16: Latent activity "House→Work commute", stations incomingspecificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 58 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : outgoing specificities
FIG. 17: Latent activity "House→Work commute", stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 59 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Expected bike balance, question :
Positive/negative bike balance of stations for a latent activity a ?
The O/D matrix D follow a multinomial law of parameter Ndep(number of trips) and Λa :
D ∼M(Ndep,Λa),
The bike balance Bs for a station s is thus given by :
Bs =
Incoming bikes︷ ︸︸ ︷∑j
Djs −
Outgoing bikes︷ ︸︸ ︷∑j
Dsj
And the expectation of the balance vector B is thus equal to :
E[B] = Ndep((Λa)t − Λa)v, (7)
with v = (1, . . . ,1)t .
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 60 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Spatial results : expected bike balance
-30-20-100102030
Balance
FIG. 18: Latent activity "House→Work commute", stations expected balanceswith Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 61 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Lunch", incoming specificity
FIG. 19: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 62 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Lunch", outgoing specificity
FIG. 20: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 63 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Lunch", balance
-30-20-100102030
Balance
FIG. 21: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 64 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Work→House commute", incoming specificity
FIG. 22: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 65 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Work→House commute", outgoing specificity
FIG. 23: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 66 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Work→House commute", balance
-30-20-100102030
Balance
FIG. 24: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 67 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Evening" incoming specificity
FIG. 25: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 68 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Evening", outgoing specificity
FIG. 26: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 69 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Evening", balance
-30-20-100102030
Balance
FIG. 27: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 70 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Spare time", incoming specificity
FIG. 28: Stations incoming specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 71 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Spare time", outgoing specificity
FIG. 29: Stations outgoing specificity
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 72 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
"Spare time", balance
-30-20-100102030
Balance
FIG. 30: Stations expected balances with Ndep = 10 000
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 73 / 75
Latent Dirichlet Allocation (LDA) for trips activity recognition Analysis of the results on the Velib’ dataset
Conclusion on LDA for activities recognition
Interpretable latent activitiesGive good picture of city "pulse" and geographyBetter understanding of the system behaviourStrong evidence of cyclostationarityWeek-day / Week-end pattern
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 74 / 75
Thanks for your attention
@comeetie, etienne.come@ifsttar.fr
IfsttarCentre de Marne-la-ValléeBatiment le “Descartes 2”2, rue de la Butte Verte F-93166 Noisy le Grand cedex
Mél. etienne.come@ifsttar.frTél. +33 (0)1 45 92 56 57
Site : www.ifsttar.fr
Etienne Côme (IFSTTAR) Model-based clustering for BSS usage mining 15/10/2012 75 / 75