An interface between the Ministry of Transport's real time ...
Techniques for Inferring Mileage from the Department for Transport's MOT data set
-
Upload
r-eddie-wilson -
Category
Science
-
view
75 -
download
4
Transcript of Techniques for Inferring Mileage from the Department for Transport's MOT data set
Techniques for Inferring Mileage from the Departmentfor Transport’s MOT Data Set
R. Eddie WilsonJillian Anable (Aberdeen), Sally Cairns (TRL/UCL), Tim Chatterton (UWE),
Oliver Turnbull (Bristol) and others
EPSRC grants EP/J004758/1 EP/K000438/1
Faculty of EngineeringUniversity of Bristol
March 25, 2015
UK MOT (Ministry of Transport) test
I MOT: the UK’s annual safetyinspection for all road vehiclesolder than 3 years
I Since 2005: the results have beencaptured and stored digitially
I Since November 2010 — the DfThas published this data online -spanning back to 2005.
I Key interest: the odometerreading recorded at each test.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 2 / 36
A sample of the published data
I But the tests are grouped by year and do not “link” the vehicles(a problem fixed in more recent releases — at my prompting!)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 3 / 36
Here’s a trick . . .
I Concatenate all files and sort by the “mystery” identifier.You get lots of blocks like this:
I We can follow individuals around and infer their mileage (rate)between consecutive test dates!!!!
I For example, in the interval from 2008-08-11 to 2009-08-05(359 days), I drove 132,299-123,259 = 9,040* miles,at an average rate of 25.18 miles per day.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 4 / 36
Here’s a trick . . .
I Concatenate all files and sort by the “mystery” identifier.You get lots of blocks like this:
I We can follow individuals around and infer their mileage (rate)between consecutive test dates!!!!
I For example, in the interval from 2008-08-11 to 2009-08-05(359 days), I drove 132,299-123,259 = 9,040* miles,at an average rate of 25.18 miles per day.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 4 / 36
Here’s a trick . . .
I Concatenate all files and sort by the “mystery” identifier.You get lots of blocks like this:
I We can follow individuals around and infer their mileage (rate)between consecutive test dates!!!!
I For example, in the interval from 2008-08-11 to 2009-08-05(359 days), I drove 132,299-123,259 = 9,040* miles,at an average rate of 25.18 miles per day.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 4 / 36
Basic analysis object: intervals and their attributes
I Re-arrange blocks of same-vehicle data into consecutive pairs of tests:
Interval First test Second testdate t1 miles x1 place1 date t2 miles x2 place2
1 2005-08-26 99777 BS 2006-08-18 105420 BS2 2006-08-18 105420 BS 2007-08-13 113709 BS3 2007-08-13 113709 BS 2008-08-11 123259 BS4 2008-08-11 123259 BS 2008-08-11 123259 BS5 2008-08-11 123259 BS 2009-08-05 132299 BS
I To which can be linked vehicle-specific attributes:VAUXHALL, ASTRA LS 8V, WHITE, P (fuel), 1598 (cc), 1999 (year)
I (Eg) during interval 3 — I drove at an average rate of(123259− 113709)/364 = 26.24 miles per day, but we don’t knowhow my mileage was distributed during that period.
I These mileage rates are (more or less) complete across the vehiclepopulation — even after cleaning.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 5 / 36
Population level statistics: straddling rate r(t)
t
test
test
avera
ge m
ileag
e r
ate
t*
I Select all N intervals that straddle agiven observation date t∗
I Each interval yields an average (pervehicle) rate ri .
I Straddling rate r(t∗) is thendefined by the averageaverage
r(t∗) =1
N
N∑i=1
ri .
I It is fine for annual statistics:choose t∗ = 1/7/2007,1/7/2008, 1/7/2009 etc.
I But r(t∗) actuallyincorporates miles drivenover the two year spant∗ − 1 ≤ t < t∗ + 1.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 6 / 36
Population level statistics: straddling rate r(t)
t
test
test
avera
ge m
ileag
e r
ate
t*
I Select all N intervals that straddle agiven observation date t∗
I Each interval yields an average (pervehicle) rate ri .
I Straddling rate r(t∗) is thendefined by the averageaverage
r(t∗) =1
N
N∑i=1
ri .
I It is fine for annual statistics:choose t∗ = 1/7/2007,1/7/2008, 1/7/2009 etc.
I But r(t∗) actuallyincorporates miles drivenover the two year spant∗ − 1 ≤ t < t∗ + 1.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 6 / 36
Population level statistics: straddling rate r(t)
t
test
test
avera
ge m
ileag
e r
ate
t*
I Select all N intervals that straddle agiven observation date t∗
I Each interval yields an average (pervehicle) rate ri .
I Straddling rate r(t∗) is thendefined by the averageaverage
r(t∗) =1
N
N∑i=1
ri .
I It is fine for annual statistics:choose t∗ = 1/7/2007,1/7/2008, 1/7/2009 etc.
I But r(t∗) actuallyincorporates miles drivenover the two year spant∗ − 1 ≤ t < t∗ + 1.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 6 / 36
Mileage distributions: new(ish) vehicles
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
Daily mileage
Nor
mal
ised
freq
uenc
yWest London vs Kirkcaldy: First registration 2004
West London (W)
Mean, Median
18.2768, 14.8481
Kirkcaldy (KY)
Mean, Median
25.5864, 22.6945
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 7 / 36
Mileage distributions: older vehicles
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
Daily mileage
Nor
mal
ised
freq
uenc
yWest London vs Kirkcaldy: First registration 2000
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 8 / 36
Mileage distributions: even older vehicles
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Daily mileage
Nor
mal
ised
freq
uenc
yWest London vs Kirkcaldy: First registration 1996
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 9 / 36
Mileage distributions: old vehicles
0 10 20 30 40 50 60 70 80 90 1000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Daily mileage
Nor
mal
ised
freq
uenc
yWest London vs Kirkcaldy: First registration 1992
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 10 / 36
From the Straddling Rate to the Census Date Rate
I Progression of a vehicle’s odometer with time
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 11 / 36
From the Straddling Rate to the Census Date Rate
I Progression of a vehicle’s odometer with time — with tests
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 12 / 36
From the Straddling Rate to the Census Date Rate
I The tests do not allow you to distinguish the 2 trajectories.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 13 / 36
From the Straddling Rate to the Census Date Rate
I Distributions derived from straddling rate suffer anomalous variancebecause some intervals are very short
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 14 / 36
From the Straddling Rate to the Census Date Rate
I Solution is to interpolate onto some given census dates . . .
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 15 / 36
From the Straddling Rate to the Census Date Rate
I . . . and use the rates between the census dates.(Also neatly synchronises the data into calendar year comparisons.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 16 / 36
Five digit odometer problem
FR
EQ
UE
NC
Y (
10 m
ile b
ins)
JUMP
ODOMETER READING
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 17 / 36
Cleaning: How to Deal with Bad Odometers
Solution 1: don’t worry about it too much
I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:
I Below 0I Above 150 miles per day (?)
I Scale population statistics up for the intervals of vehicles thusdiscarded
(*) Nomenclature: will talk of intervals as Bad or Good.
Solution 2: try to identify which individual odometer entries are bad andremove them instead
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36
Cleaning: How to Deal with Bad Odometers
Solution 1: don’t worry about it too much
I Compute rates as if all odometers are perfectly correct
I Reject intervals (*) if rates which are outside a reasonable range:I Below 0I Above 150 miles per day (?)
I Scale population statistics up for the intervals of vehicles thusdiscarded
(*) Nomenclature: will talk of intervals as Bad or Good.
Solution 2: try to identify which individual odometer entries are bad andremove them instead
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36
Cleaning: How to Deal with Bad Odometers
Solution 1: don’t worry about it too much
I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:
I Below 0I Above 150 miles per day (?)
I Scale population statistics up for the intervals of vehicles thusdiscarded
(*) Nomenclature: will talk of intervals as Bad or Good.
Solution 2: try to identify which individual odometer entries are bad andremove them instead
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36
Cleaning: How to Deal with Bad Odometers
Solution 1: don’t worry about it too much
I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:
I Below 0I Above 150 miles per day (?)
I Scale population statistics up for the intervals of vehicles thusdiscarded
(*) Nomenclature: will talk of intervals as Bad or Good.
Solution 2: try to identify which individual odometer entries are bad andremove them instead
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36
Cleaning: How to Deal with Bad Odometers
Solution 1: don’t worry about it too much
I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:
I Below 0I Above 150 miles per day (?)
I Scale population statistics up for the intervals of vehicles thusdiscarded
(*) Nomenclature: will talk of intervals as Bad or Good.
Solution 2: try to identify which individual odometer entries are bad andremove them instead
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36
When two (or more) Bads make a Good
miles d
riven
x
time t
BB
negative
mileage
mileage rate
too high
I The middle odometer entry is (probably) erroneous —due to a missing digit in the data entry?
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 19 / 36
When two (or more) Bads make a Good
miles d
riven
x
time t
BB
negative
mileage
mileage rate
too high
good
G
I The middle odometer entry is (probably) erroneous — due to amissing digit?
I The spanning interval without the middle test is (probably) ok.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 20 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
Syntactic games
I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.
I Try to remove tests to end up with a sequence that is all G.
I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.
I Only remaining problem is singleton B —which end of the bad interval should be removed?
I Endpoint B: delete the end test (yes, you then need infill)
I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.
I Look at removing either or both ends so as to generate G.Repeat
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36
How to deal with multiple tests on the same day (I)(need to pare down to a single odometer reading per test day)
miles d
riv
en
x
time
B
B
t1 t2
I We want to complete previous syntactic procedure before decidingwhich test to select for each date.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 22 / 36
How to deal with multiple tests on the same day (II)
I Compute 4 rates, from the odometer pairs
(xmin1 , xmin
2 ) (xmax1 , xmax
2 ) (xmin1 , xmax
2 ) (xmax1 , xmin
2 )
I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix
I The D are rare — no great loss in calling them B
I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare
I Proceed with previous procedure using certainly Bad and Good.
I Finally — decide which odometer at each t to use at the end.(For example: the median value.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36
How to deal with multiple tests on the same day (II)
I Compute 4 rates, from the odometer pairs
(xmin1 , xmin
2 ) (xmax1 , xmax
2 ) (xmin1 , xmax
2 ) (xmax1 , xmin
2 )
I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix
I The D are rare — no great loss in calling them B
I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare
I Proceed with previous procedure using certainly Bad and Good.
I Finally — decide which odometer at each t to use at the end.(For example: the median value.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36
How to deal with multiple tests on the same day (II)
I Compute 4 rates, from the odometer pairs
(xmin1 , xmin
2 ) (xmax1 , xmax
2 ) (xmin1 , xmax
2 ) (xmax1 , xmin
2 )
I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix
I The D are rare — no great loss in calling them B
I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare
I Proceed with previous procedure using certainly Bad and Good.
I Finally — decide which odometer at each t to use at the end.(For example: the median value.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36
How to deal with multiple tests on the same day (II)
I Compute 4 rates, from the odometer pairs
(xmin1 , xmin
2 ) (xmax1 , xmax
2 ) (xmin1 , xmax
2 ) (xmax1 , xmin
2 )
I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix
I The D are rare — no great loss in calling them B
I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare
I Proceed with previous procedure using certainly Bad and Good.
I Finally — decide which odometer at each t to use at the end.(For example: the median value.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36
How to deal with multiple tests on the same day (II)
I Compute 4 rates, from the odometer pairs
(xmin1 , xmin
2 ) (xmax1 , xmax
2 ) (xmin1 , xmax
2 ) (xmax1 , xmin
2 )
I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix
I The D are rare — no great loss in calling them B
I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare
I Proceed with previous procedure using certainly Bad and Good.
I Finally — decide which odometer at each t to use at the end.(For example: the median value.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36
How to deal with multiple tests on the same day (II)
I Compute 4 rates, from the odometer pairs
(xmin1 , xmin
2 ) (xmax1 , xmax
2 ) (xmin1 , xmax
2 ) (xmax1 , xmin
2 )
I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix
I The D are rare — no great loss in calling them B
I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare
I Proceed with previous procedure using certainly Bad and Good.
I Finally — decide which odometer at each t to use at the end.(For example: the median value.)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36
Central Question for Remainder of Talk
Recall that I cannot possibly say anything aboutan individual’s mileage on finer time scales thanone year.
But can I derive something about population levelmileage over shorter time scales — eg a month?
Possible application: detect the sharp drop in driving in Autumn 2008following Lehman brothers collapse.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 24 / 36
Central Question for Remainder of Talk
Recall that I cannot possibly say anything aboutan individual’s mileage on finer time scales thanone year.
But can I derive something about population levelmileage over shorter time scales — eg a month?
Possible application: detect the sharp drop in driving in Autumn 2008following Lehman brothers collapse.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 24 / 36
How to compute temporal evolution of mileage rates?
I Erm, isn’t it obvious?
I Take a given sequence ti , i = 1, 2, . . .
I Compute corresponding r(ti ) using straddling procedure
I Pairs (ti , r(ti )) reconstruct r(t)
I Actually . . . this process is flawed. . .But just look what we can do with it!!!
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36
How to compute temporal evolution of mileage rates?
I Erm, isn’t it obvious?
I Take a given sequence ti , i = 1, 2, . . .
I Compute corresponding r(ti ) using straddling procedure
I Pairs (ti , r(ti )) reconstruct r(t)
I Actually . . . this process is flawed. . .But just look what we can do with it!!!
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36
How to compute temporal evolution of mileage rates?
I Erm, isn’t it obvious?
I Take a given sequence ti , i = 1, 2, . . .
I Compute corresponding r(ti ) using straddling procedure
I Pairs (ti , r(ti )) reconstruct r(t)
I Actually . . . this process is flawed. . .But just look what we can do with it!!!
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36
How to compute temporal evolution of mileage rates?
I Erm, isn’t it obvious?
I Take a given sequence ti , i = 1, 2, . . .
I Compute corresponding r(ti ) using straddling procedure
I Pairs (ti , r(ti )) reconstruct r(t)
I Actually . . . this process is flawed. . .But just look what we can do with it!!!
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36
How to compute temporal evolution of mileage rates?
I Erm, isn’t it obvious?
I Take a given sequence ti , i = 1, 2, . . .
I Compute corresponding r(ti ) using straddling procedure
I Pairs (ti , r(ti )) reconstruct r(t)
I Actually . . . this process is flawed. . .But just look what we can do with it!!!
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36
How to compute temporal evolution of mileage rates?
I Erm, isn’t it obvious?
I Take a given sequence ti , i = 1, 2, . . .
I Compute corresponding r(ti ) using straddling procedure
I Pairs (ti , r(ti )) reconstruct r(t)
I Actually . . . this process is flawed. . .But just look what we can do with it!!!
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36
Example of temporal evolution via straddling (WRONG)
J F M A M J J A S O N D J F M A M J J A S O N D12
14
16
18
20
22
24
26
28
date: 2007−2008
aver
age
aver
age
mile
age
rate
1991199319951997199920012003
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 26 / 36
Basic postulate: the population spot rate φ(t)
I Suppose there is a population-level spot rate φ(t) that modulates allvehicles’ mileage (alt. restrict to a population segment).
I Then each vehicle i has an individual spot rate φi (t) with
φi (t) = ciφ(t) + noise.
Here ci=const.; 〈ci 〉 = 1; and 〈noise〉 = 0, so that φ = 〈φi 〉.
I Let ψi (τ) denote miles driven by i between testsat times τ − 1/2 and τ + 1/2. Then
ψi (τ) =
∫ τ+1/2
τ−1/2(ciφ(s) + noise) ds, = ci
∫ τ+1/2
τ−1/2φ(s)ds.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 27 / 36
Basic postulate: the population spot rate φ(t)
I Suppose there is a population-level spot rate φ(t) that modulates allvehicles’ mileage (alt. restrict to a population segment).
I Then each vehicle i has an individual spot rate φi (t) with
φi (t) = ciφ(t) + noise.
Here ci=const.; 〈ci 〉 = 1; and 〈noise〉 = 0, so that φ = 〈φi 〉.
I Let ψi (τ) denote miles driven by i between testsat times τ − 1/2 and τ + 1/2. Then
ψi (τ) =
∫ τ+1/2
τ−1/2(ciφ(s) + noise) ds, = ci
∫ τ+1/2
τ−1/2φ(s)ds.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 27 / 36
Basic postulate: the population spot rate φ(t)
I Suppose there is a population-level spot rate φ(t) that modulates allvehicles’ mileage (alt. restrict to a population segment).
I Then each vehicle i has an individual spot rate φi (t) with
φi (t) = ciφ(t) + noise.
Here ci=const.; 〈ci 〉 = 1; and 〈noise〉 = 0, so that φ = 〈φi 〉.
I Let ψi (τ) denote miles driven by i between testsat times τ − 1/2 and τ + 1/2. Then
ψi (τ) =
∫ τ+1/2
τ−1/2(ciφ(s) + noise) ds, = ci
∫ τ+1/2
τ−1/2φ(s)ds.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 27 / 36
From the spot rate to the straddling rateI Thus by averaging over tests that straddle t:
r(t) =
∫ t+1/2
t−1/2〈ψi (τ)〉i dτ =
∫ t+1/2
t−1/2〈ci 〉
∫ τ+1/2
τ−1/2φ(s) ds dτ .
I Simplify integral by 〈ci 〉 = 1 and reverse the order of integration
r(t) =
∫ t+1
t−1w(s; t)φ(s)ds,
s
w(s;t)
1
t−1 t t+1
Triangular kernel
I Thus φ(t) leads to r(t).But we want to derive φ(t) from r(t) (which is derivable from data).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 28 / 36
From the spot rate to the straddling rateI Thus by averaging over tests that straddle t:
r(t) =
∫ t+1/2
t−1/2〈ψi (τ)〉i dτ =
∫ t+1/2
t−1/2〈ci 〉
∫ τ+1/2
τ−1/2φ(s) ds dτ .
I Simplify integral by 〈ci 〉 = 1 and reverse the order of integration
r(t) =
∫ t+1
t−1w(s; t)φ(s) ds,
s
w(s;t)
1
t−1 t t+1
Triangular kernel
I Thus φ(t) leads to r(t).But we want to derive φ(t) from r(t) (which is derivable from data).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 28 / 36
From the straddling rate to the spot rate
I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:
r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).
I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)
I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.
I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.
I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36
From the straddling rate to the spot rate
I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:
r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).
I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)
I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.
I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.
I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36
From the straddling rate to the spot rate
I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:
r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).
I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)
I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.
I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.
I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36
From the straddling rate to the spot rate
I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:
r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).
I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)
I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.
I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.
I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36
Refinement of the straddling rate idea
t
av
era
ge
mil
ea
ge
ra
te
t* t*+a
I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.
I Call resulting average averagestraddle rate rα(t)
I Crank the handle to give:
r ′′α(t) =1
α[φ(t + α)− φ(t)]
− 1
α[φ(t − 1 + α)− φ(t − 1)]
I Gives time-stepping scheme:but only 1 + α years ofinitial data required.
I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)
I α→ 0 means fewer andfewer intervals, means noisyrα(t)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36
Refinement of the straddling rate idea
t
av
era
ge
mil
ea
ge
ra
te
t* t*+a
I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.
I Call resulting average averagestraddle rate rα(t)
I Crank the handle to give:
r ′′α(t) =1
α[φ(t + α)− φ(t)]
− 1
α[φ(t − 1 + α)− φ(t − 1)]
I Gives time-stepping scheme:but only 1 + α years ofinitial data required.
I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)
I α→ 0 means fewer andfewer intervals, means noisyrα(t)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36
Refinement of the straddling rate idea
t
av
era
ge
mil
ea
ge
ra
te
t* t*+a
I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.
I Call resulting average averagestraddle rate rα(t)
I Crank the handle to give:
r ′′α(t) =1
α[φ(t + α)− φ(t)]
− 1
α[φ(t − 1 + α)− φ(t − 1)]
I Gives time-stepping scheme:but only 1 + α years ofinitial data required.
I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)
I α→ 0 means fewer andfewer intervals, means noisyrα(t)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36
Refinement of the straddling rate idea
t
av
era
ge
mil
ea
ge
ra
te
t* t*+a
I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.
I Call resulting average averagestraddle rate rα(t)
I Crank the handle to give:
r ′′α(t) =1
α[φ(t + α)− φ(t)]
− 1
α[φ(t − 1 + α)− φ(t − 1)]
I Gives time-stepping scheme:but only 1 + α years ofinitial data required.
I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)
I α→ 0 means fewer andfewer intervals, means noisyrα(t)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36
Refinement of the straddling rate idea
t
av
era
ge
mil
ea
ge
ra
te
t* t*+a
I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.
I Call resulting average averagestraddle rate rα(t)
I Crank the handle to give:
r ′′α(t) =1
α[φ(t + α)− φ(t)]
− 1
α[φ(t − 1 + α)− φ(t − 1)]
I Gives time-stepping scheme:but only 1 + α years ofinitial data required.
I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)
I α→ 0 means fewer andfewer intervals, means noisyrα(t)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36
Synthetic data set-up
I Choose spot rateφ(t) = 8000 + 500t − 1000 cos 2πt
− 1000[t − 2
]+
(t − 2)2,
I 106 vehicles with tests 1 yearapart, test dates uniformlydistributed through calendaryear
I Vehicle i daily mileage drawnfrom a distribution modulatedby φ(t) and (random) ci .
I Odometer readings on test datesare synthesised by addingindividual vehicle daily totals
−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 44000
5000
6000
7000
8000
9000
10000
11000
Time (years)M
iles
per
year
START
phirbar: alpha=1.0rbar: alpha=0.25rbar: alpha=0.1
I Periodic component in spot rateφ(t) is suppressed in straddlingrates rα(t)
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 31 / 36
Results with synthetic data: α = ∆t = 0.1 years
−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 43000
4000
5000
6000
7000
8000
9000
10000
Numerical solutionExact value
I Reconstructed φ(t) almost indistinguishable from ground truth.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 32 / 36
Straddling rates rα(t) for real-world data
Jan07 Jan08 Jan09 Jan10
7300
7400
7500
7600
7700
7800
7900
8000
8100
8200
Time
Alp
ha−
wei
ghte
d rb
ar (
mile
s pe
r ye
ar)
13 wks4 wks
I Seasonal component shouldn’t be there: underlying assumptions ofthe theory are broken
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 33 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.
I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify
seasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.
I In fact — marginal failure of this assumption can be used to quantifyseasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify
seasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify
seasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify
seasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify
seasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Implicit assumptions in the theory. . .
A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify
seasonal variation.
A2 We assume that tests occur at same frequency on average throughoutyear.
I Not true — but easy to fix theory.
A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).
I Completely wrong. And very hard to fix.
On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36
Conclusions and Further Work (I)
I Incidental data is beautiful! (and useful and cheap)
I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere(at least in this quantity and detail)
I Other data sources might enable huge extensions:
1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36
Conclusions and Further Work (I)
I Incidental data is beautiful! (and useful and cheap)
I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere
(at least in this quantity and detail)
I Other data sources might enable huge extensions:
1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36
Conclusions and Further Work (I)
I Incidental data is beautiful! (and useful and cheap)
I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere(at least in this quantity and detail)
I Other data sources might enable huge extensions:
1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36
Conclusions and Further Work (I)
I Incidental data is beautiful! (and useful and cheap)
I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere(at least in this quantity and detail)
I Other data sources might enable huge extensions:
1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36
Conclusions and Further Work (II)
I Methods developed which extract population-level spot rate mileagefrom widely spaced individual vehicle odometer readings. Successwith synthetic data.
I UK MOT data set: some fixes/patches to theory are needed.
I Please contact me if you know of other datasets (international) inwhich odometer readings are systematically collected.
I These methods have the potential to complement / replace existingsurvey-based / link-flow techniques for estimating population-levelmileage.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 36 / 36
Conclusions and Further Work (II)
I Methods developed which extract population-level spot rate mileagefrom widely spaced individual vehicle odometer readings. Successwith synthetic data.
I UK MOT data set: some fixes/patches to theory are needed.
I Please contact me if you know of other datasets (international) inwhich odometer readings are systematically collected.
I These methods have the potential to complement / replace existingsurvey-based / link-flow techniques for estimating population-levelmileage.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 36 / 36
Conclusions and Further Work (II)
I Methods developed which extract population-level spot rate mileagefrom widely spaced individual vehicle odometer readings. Successwith synthetic data.
I UK MOT data set: some fixes/patches to theory are needed.
I Please contact me if you know of other datasets (international) inwhich odometer readings are systematically collected.
I These methods have the potential to complement / replace existingsurvey-based / link-flow techniques for estimating population-levelmileage.
R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 36 / 36