Lessons From the Trenches: using Mobile Phone Data for ... · Lessons From the Trenches: using...

Post on 15-Sep-2020

0 views 0 download

Transcript of Lessons From the Trenches: using Mobile Phone Data for ... · Lessons From the Trenches: using...

Lessons From the Trenches: using Mobile Phone Data for

Official Statistics

Maarten Vanhoof

Orange Labs/Newcastle University

M.vanhoof1@newcastle.ac.uk

@Metti Hoof

MaartenVanhoof.com

Mobile Phone Data (Call Detail Records)

Mobile Phone Data (Signaling)

Mobile Phone Data (Call Detail Records)

Metadata • Caller (phone)

• Called phone

• Timestamp

• Type of event

• Duration of call/Length of text

• Location of celltower

• …

Mobile Phone Data (Call Detail Records)

Toole et al. (2015) Coupling Human Mobilities and Social Ties.

Individual indicators: Bandicoot

https://github.com/yvesalexandre/bandicoot

http://bandicoot.mit.edu/demo/

• active days • number of contacts • number of interactions • call duration • percent nocturnal • percent initiated interactions • response delay text • entropy of contacts • balance of contacts • interactions per contact • inter-event time • percent pareto interactions • percent pareto durations • number of antennas • entropy of antennas • percent at home • radius of gyration • frequent antennas

Individual indicators for official statistics

Behavioural

Individual mobility

e.g. diversity of mobility

Contextual

• Car ownership

• Access to public transport

• Income

• Marital status

• Membership

• Home location

• Etc.

Individual indicators for official statistics

Pappalardo,Vanhoof, et al. (2016) An Analytical Framework to Nowcast Well-Being using Mobile Phone Data.

Individual indicators for official statistics

Pappalardo,Vanhoof, et al. (2016) An Analytical Framework to Nowcast Well-Being using Mobile Phone Data.

Individual indicators for official statistics

(Geographical) Veracity

Spatial allocation

Spatial delineation

Spatial aggregation

(Geographical) Veracity

Spatial allocation

Spatial delineation

Spatial aggregation

Spatial allocation: Home detection

Pappalardo,Vanhoof, et al. (2016) An Analytical Framework to Nowcast Well-Being using Mobile Phone Data.

Spatial allocation: Home detection

Uncertainty of home allocation algorithms

• No knowledge on how certain we can geographically pinpoint users

• Because no ground truth is available

Spatial allocation: Home detection

Spatial allocation: Home detection

Performance

Uncertainty

Vanhoof et al. (Submitted) Investigating Performance and Spatial Uncertainty of Home Detection Criteria for CDR data

Spatial allocation: Solution?

• In short term, we need to: • Create a better understanding on the uncertainty that comes with home detection

• Test heuristics for home detection on different databases and for different countries

• Design surveys to gather ground truth at the individual level

• In long term, we need to: • Understand how change in mobile phone use/available datasets influence allocation

• Decide on standardizing home detection and error assessment

• Design a platform where all operators, researchers, policy makers can easily do this and compare results between different datasets

(Geographical) Veracity

Spatial allocation

Spatial delineation

Spatial aggregation

Spatial delineation

• Uneven delineations of space • Between antennas (high-density vs. low-density, operator 1 vs. operator 2,..)

• Between antennas and administrative regions (cell-tower coverage vs. municipalities)

• Between different definitions of urban areas (Urban Units vs. Urban Areas)

• Create errors that are poorly understood and challenging to address

• Is relevant for • Population Density Estimations

• Mobility Derivation

• Parameter estimation (e.g. for urban scaling laws) in statistical analysis

• Error/uncertainty assessment

Spatial delineation: Mobility Entropy

Vanhoof, et al. (Submitted) Correcting Mobility Entropy from CDR data for large-scale comparison of individual movement patterns

Spatial delineation: Mobility Entropy

Vanhoof, et al. (Submitted) Correcting Mobility Entropy from CDR data for large-scale comparison of individual movement patterns

Spatial delineation: Urban scaling laws

Cottineau et al. (2016) Paradoxal Interpretations of Urban Scaling Laws

Spatial delineation: Solution?

• In short term, we need to work on : • Minimizing the influence of spatial delineations on our measurements

• Techniques that allow translation between different spatial delineations

• Assessments of the influence of spatial delineation (geo-computation)

• In long term, we need to: • Overthink possibilities to standardize spatial delineations

• Develop practices in Official Statistics that express the effect of spatial delineation

Spatial delineation: Urban scaling laws

Cottineau et al. (2016) Paradoxal Interpretations of Urban Scaling Laws

(Geographical) Veracity

Spatial allocation

Spatial delineation

Spatial aggregation

Spatial aggregation

• Scale does matter for: • Unintended selective filtering (e.g. highly active persons, communities)

• Objective construction of indicators (e.g. 5 km in Paris or in the Pyrenees)

• Representativeness of single operators (e.g. distorted market shares)

• Personal behaviour (e.g. long-distance vs. Short-distance trips)

• Geographical, economical, sociological, ecological,etc. context (e.g. transport infrastructure)

• Still, there is no single evidence that current (spatial) aggregation practices take into account any of these when studying mobile phone data.

• In addition, given the highly changing nature of mobile phone use, it is my hypothesis that behavioral data is even more prone to this fallacy.

Spatial aggregation

Cell-tower level IRIS level

Population Density Estimation vs. Official Statistics

Relations between indicators

Spatial aggregation: Solution?

• In short term, we need to work on : • Techniques that define the best spatial scale for studying certain processes

• Both empirical, quantitative (e.g. optimal raster sizes for population density estimations)

• As theoretical, qualitative (e.g. expert judgment)

• Techniques that express changing nature of observations when (spatially) aggregating • E.g. Representativeness in population terms of single operator data at different scales

• Techniques that investigate, or even incorporate sensitivity of definitions to spatial scale • E.g. Fragmented definitions of distance according to scale

• Techniques that investigate sensitivity of data to spatial aggregation • E.g. Spatial autocorrelations

• In long term, we need to: • See how all of this evolves over time as human behaviour & mobile phone use will change

Thoughts

• Why starting from individual indicators? • Privacy issues (newer datasets don’t allow this)

• Computationally expensive treatment

• Temporal resolution is far from optimal

• Difficult to communicate/visualise

• Why not using the ‘big’ aspect of the data and use patterns? • Activity patterns of cell-towers

• High-level communication/commuting patterns

• Population presence registration

High-level analysis: Learning Urban Areas

Combes, de Bellefon and Vanhoof (Submitted) Understanding urban centers organization and influence with mobile phone data

High-level analysis: Learning Urban Areas

Combes, de Bellefon and Vanhoof (Submitted) Understanding urban centers organization and influence with mobile phone data

Don’t be Batman.

The same problems and scientific questions will persist. Only now less visible, and as such, less provoked.

Conclusion

• ‘Work from the trenches’ on individual data identifies problems but • Is done by a limited amount of researchers

• Not a priority for operators (never was, never will be)

• Lack of data and knowledge at the institutions (but they are catching up)

• Limited rewards in academics, limited scientific community

• Is threatened by protective measurements on data • Impossibility to continue pursuing in-depth research

• Fled to African data, but limited quality of official statistics there

• Development of shared platforms for analysis, but simplifies workflows

• Is mostly limited to one-dataset, one-operator • Comparison of findings is absolutely necessary for better insights and methods

• Dream to have full coverage of population is feasible but needs strong policy

Thank you,

The end.

M.vanhoof1@newcastle.ac.uk

@Metti Hoof

MaartenVanhoof.com