The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

61
Big Data for Social Good Nuria Oliver, PhD Scientific Director Telefonica R&D

Transcript of The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Page 1: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Big Data for Social Good

Nuria Oliver, PhD

Scientific Director

Telefonica R&D

Page 2: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

7+ billion mobile phones worldwide

97% of world’s population (ITU)

Mobile penetration of 120% to 89% of population (ITU)

Emerging and developed regions

More time spent on our phones than watching TV or with our with

our partner (US and UK)

Page 3: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 4: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 5: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Mobile Network Infrastructure

● A cell tower (also known as BTS or base station) provides an approximate location

● Spacing between towers is 500m-1km in urban areas, 2-3 km in rural areas

● Each tower has a “Voronoi cell”; this is the area to which the tower provides best coverage i.e. tower is closest

Page 6: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 7: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Typical Mobile Data• CDR

• SMS

Consumption Social Network Mobility

Call duration In/Out Degree Radius of gyration

N. Events Delta w.r.t time window Travelled distance

Lapse between events Unique Calls per dayRate of popular

antennas

Reciprocated events Unique SMS per dayRegularity of popular

antennas

… … …

HR_ORG TLFN_A TLFN_B CD_GEO_A CD_GEO_B DT_ORG CD_SNTD CD_ERB CD_CCC QT_DUR

20:05:31 XXX YYY 3 11 20140519 2 1562 568 33

… … … … … … … … … …

HR_ORG TLFN_A TLFN_B CD_GEO_A CD_GEO_B DT_ORG CD_SNTD QT_TRFG

15:53:54 XXX ZZZ 3 25 20140506 2 1

… … … … … … … …

Page 8: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 9: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 10: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Cell

Phone

Data

BEHAVIORAL

INSIGHTS

Urban

Planning

Tools

Official

Statistics

Tools

TELEFONICA RESEARCH INSTITUTIONS & POLICY MAKERS

Crisis

Management

Tools

Public

Health Tools

Page 11: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Big Data for Social Good @ Telefonica

Crime Prediction

Analysis of impact of floodshttp://www.wired.co.uk/news/archive/2013-10/17/nuria-oliver

70% accuracy

Page 12: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Natural Disasters

Page 13: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Flooding Through the Lens of Mobile Phone Activity

David Pastor-Escuredo1, Alfredo Morales-Guzman1, Yolanda Torres-Fernandez1, Jean-Martin Bauer2, Amit Wadhwa2, Carlos

Castro-Correa3, Liudmyla Romanoff4, Jong Gun Lee4, Alex Rutherford4, Vanessa Frias-Martinez5, Nuria Oliver6, Enrique Frias-

Martinez6, Miguel Luengo-Oroz4

1. Universidad Polite cnica de Madrid

2. Vulnerability Analysis and Mapping, World Food Programme

3. Coordinacion de Estrategia Digital Nacional, Presidencia de la Repu blica de Me xico

4. United Nations Global Pulse

5. University Maryland

6. Telefonica Research

Page 14: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

UN Global Pulse is an innovation initiative of the United Nations Secretary-

General, driving a big data revolution for development.

Its vision is to see big data used safely and responsibly for public good; its

mission to accelerate discovery, development and adoption of big data

and real-time analytics for sustainable development and humanitarian

action.

Page 15: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Tabasco Floods

• Tabasco state frequently suffers flooding

• 800mm of rain 31st October – 3rd November 2009 (4x

normal)

• 214,000 people affected

• State of emergency declared

Page 16: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Reconstructing the Floods

• We combined mobile data with

NASA LANDSAT satellite imagery

data, governmental surveys and

census data

• NASA LANDSAT geolocation of

the most affected areas

• Census data representativeness

of mobile data

• Precipitations data and civil

protection reports timeline of

people’s awareness and response

Page 17: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

CDR Representativeness ● Mobile user

representativeness validated against census data

● Population estimates based on CDRs are strongly correlated with official Census data and population statistics

● Mobile data proxy of population distribution in areas where other data sources are not available or reliable

R2=0.97

Page 18: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Event Detection

x(t) is raw number of unique phones

placing or receiving calls in each

antennae per day

● Quantify impact of floods by comparing behavior of BTS against baseline activity using z-score like metric

● BTS with higher variations in the number of calls made during the floods were located in the most affected regions

● Special events (e.g. Christmas) led to similar changes in behavior wrtbaseline, but across the entire region as opposed to in specific locations

Page 19: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 20: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Actionable Insights

● Patterns in BTS activity can be used to measure the impact of floodings

● Behavioral insights are important for emergency services ● People communicated more as a

result of the initial impacts of the floodings, rather than following the recommendations of public protection

● Civil protection warnings did not seem to be an effective way to raise awareness

● Spikes in activity are observed only after the situation was critical

Page 21: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Big Data for Social Good @ Telefonica

Crime Prediction

Analysis of impact of floodshttp://www.wired.co.uk/news/archive/2013-10/17/nuria-oliver

70% accuracy

Page 22: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Pandemics

Vanessa Frias-Martinez, Enrique Frias-Martinez et al

Page 23: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 24: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 25: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 26: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 27: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Data Analysis

• Call Records from 1st Jan till 31st May 2009

• Compute mobility as different number of BTS visited

• Stages

• Medical Alert - Stage 1 (17th-27th April)

• Closing Schools - Stage 2 (28th-1st May)

• Suspension of Essential Activities - Stage 3 (1st May-6th May)

• Baselines

• same periods, different year (2008)

Page 28: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

80% 55%0%

Page 29: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Agent-based Disease Model

Mobility

Model

Social

Network

Model

Disease

Model

Page 30: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Impact on disease propagation

Page 31: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

10%

40h

Impact on disease propagation

Page 32: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Big Data for Social Good @ Telefonica

Crime Prediction

Analysis of impact of floodshttp://www.wired.co.uk/news/archive/2013-10/17/nuria-oliver

70% accuracy

Page 33: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Crime

Work with Bogomolov, A., Lepri, B., Staiano, J., Pianesi, F., Pentland, A.

Page 34: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Affects quality of life and economic development both

at the national and local level

Several studies explore relationships between crime

and socio-economic variables: education, income,

unemployment, ethnicity, …

Several studies have shown significant concentrations

of crime in small geographical areas: crime hotspots

Crime

Page 35: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

T1: Natural surveillance as key deterrent for crime:

people moving around are eyes on the street (Jacobs, 1961)

high diversity among the population and high

number of visitors -> less crime

T2: Defensible space theory (Newman, 1972)

high mix of people -> more crime

Crime and Urban Environment

Page 36: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

People-centric perspective vs Place-centric perspective

people-centric perspective used for

individual or collective criminal profiling

place-centric perspective used for

predicting crime hotspots

Crime Prediction

Page 37: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Data-driven and place-centric approach to

crime prediction

Multimodal approach: people dynamics

derived from mobile network data and

demographics

European metropolis: London

Prediction of crime hotspots and not criminals

profiling

Our Approach

Page 38: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Smartsteps Dataset:

for each of the Smartsteps cells a variety of demographic and

human dynamics variables were computed every hour for 3

weeks (from December 9 to December 15, 2012 and from

December 23, 2012 to January 5, 2013)

Criminal Cases Dataset: criminal cases for December 2012 and for January 2013

London Borough Profiles Dataset:

open dataset containing 68 metrics about the population of a

particular geographic area

Multimodal Approach: Data

Page 39: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

• Footfall count: Shows the trend in footfall in a

specified area hourly, daily, weekly and monthly.

Provides a basic profile of the crowd.

• Catchment area: Shows which postal sectors are

your customers coming from by hour, day, week

and month. Shows the “battleground” for two sites.

• Transport mode: Shows flows of crowds from any

two points, segmented by road, air, train, etc.

SmartSteps

Page 40: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

For each cell and for each hour the dataset contains:

an estimation of how many people are in the cell

the percentage of these people at home, at work or

just visiting the cell

the gender splits (male vs. female)

the age splits (0-20 years, 21-30 years, 31-40 years, …)

SmartSteps Data

Page 41: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Crime geolocation for 2 months (December 2012 –

January 2013)

All reported crimes in UK specifying month and year and

not specific day/time

Median crime value (=5) used as threshold

Spatial granularity of borough profiles is at LSOA levels:

LSOA are small geographical areas defined by UK Office

for National Statistics (mean population: 1500)

Crime Data

Page 42: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

68 metrics about the population of a specific geographical area:

demographics, households, migrant population, employment,

earnings, life expectancy, happiness levels, house prices, etc.

Spatial granularity of borough profiles is at LSOA levels:

LSOA are small geographical areas defined by UK Office for

National Statistics (mean population: 1500)

London Borough Profiles Data

Page 43: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

From Smartsteps data we extract

1st order features (mean, median, min., max.,

entropy, etc.)

2nd order features on sliding windows of variable

length (1 hour, 4 hours, 1 day, etc.) to account for

temporal patterns

Feature Extraction

Page 44: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Feature Selection

Mean decrease in Gini coefficient of inequality

the feature with maximum mean decrease in Gini

coefficient is expected to have the maximum

influence in minimizing the out-of-the-bag error

The feature selection process produced a

reduced subset of 68 features (from an initial pool

of about 6000 features)

Page 45: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Classification Approach

Binary classification task: high crime area vs low crime

area

10-fold cross-validation approach

Classifier: Random Forest (RF)

RF overcomes logistic regression, support vector

machines, neural networks, decision trees

Page 46: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Smartsteps-based classifier significantly outperforms baseline

majority and borough profiles-based classifiers

Experimental Results

Page 47: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

ground-truth

Experimental Results ~70% accuracy in predicting crime hotspots

predictions

Page 48: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Features encoding daily dynamics have more predictive powerthan features extracted on a monthly basis

Relevance of high number of residents to predict crime areas

increased ratio of residents -> more crime (in contrast withNewman’s thesis)

Entropy-based features are useful for predicting the crimehotspots

high diversity of functions (home vs work) and high diversity ofpeople (gender and age) act as eyes on street decreasingcrime (in line with Jacobs’ thesis)

Relevant Features

Page 49: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Only 6 out of 68 features in the joint model areLondon Borough features, namely

%working population claiming out of work benefits

Largest migrant population

% overseas nationals entering the UK

% resident population born abroad

Relevant Features

Page 50: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Our method captures the dynamics of a place rather

than making extrapolations from previous crime

histories. We can use it in areas where people are less

inclined to report crimes

Our method provides new ways of describing

geographical areas: novel risk-inducing or risk-reducing

features of geographical areas

Implications

Page 51: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Relevant Publications“A gender-centric analysis of calling behavior in a developing country using

CDRS”, Vanessa Frias-Martinez, Enrique Frias-Martinez and Oliver, N., Proceedings of AAAI, 2009

"Prediction of Socioeconomic Levels using Cell Phone Records", Victor Soto and Vanessa Frias-Martinez and Jesus Virseda and Enrique Frias-Martinez, International Conference on User Modeling, Adaptation and Personalization, UMAP'11, Industrial Track, Girona, Spain, 2011

"An Agent-Based Model Of Epidemic Spread Using Human Mobility and Social Network Information", Vanessa Frias-Martinez et al, 3rd International Conference on Social Computing, SocialCom '11, Boston, USA, 2011.

Talk at WIRED 2013. London UK

http://www.wired.co.uk/news/archive/2013-10/17/nuria-oliver

Page 52: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Relevant Publications

“Moves on the street: Predicting Crime Hotspots using aggregated anonymized data on people dynamics” - A. Bogomolov, B. Lepri, J. Staiano, Leouze, E., N. Oliver, F. Pianesi, A. Pentland Journal of Big Data (Big Data Journal 2015)

“Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data” - A. Bogomolov, B. Lepri, J. Staiano, N. Oliver, F. Pianesi, A. Pentland 16th ACM International Conference on Multimodal Interaction (ICMI 2014)

"Flooding through the Lens of Mobile Phone Activity"Pastor-Escuredo, D., Torres Fernandez, Y., Bauer, J.M., Wadhwa, A., Castro-Correa, C., Romanoff, L., Lee, J.G., Rutherford, A., Frias-Martinez, V., Oliver, N., Frias-Martinez, E. and Luengo-Oroz, M. Proceedins of IEEE Global Humanitarian Technology Conference, GHTC 2014, Silicon Valley, CA, Oct 2014

Page 53: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Opportunities

Page 54: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Temporal and Spatial Granularity• Big Data can be available in real-time or if not in real time much more frequently than how

data is typically collected (every 5-10 years for example for census data);

• Some kinds of Big Data (e.g. data about the city, collected by sensors placed in the urban

infrastructure) can be collected with significantly finer grained spatial granularity than with

traditional methods;

Cost and Effort

Accuracy• It could be argued that some kinds of data that are relevant for the public sector (e.g.

migrations) can be collected more accurately by automatic means through Big Data

platforms than by manual means as it is the state of the art.

• In addition, given that there isn’t a human-in-the-loop, the data is less prone to human errors

and potential biases introduced by humans.

• Most of the Big Data that could be used for the public sector is data that

has been collected already for other purposes. In addition, Big Data is

typically collected by automatic means which makes its collection very

cost-efficient;

Page 55: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Challenges

Page 56: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect
Page 57: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Risk/Benefit Analysis• Even while adopting all sorts of precautions, if by any chance there is an error in the data

extraction such that there could be a privacy leak, the consequences for the business could

be devastating

• Hence, the potential benefit would have to be really large to compensate for the risk.

• As this is an emergent research area, the benefits are still to be defined

Regulatory/Social• Lack of updated regulation

• Lack of clear guidelines regarding safe data handling, processing and sharing forhumanitarian purposes

• Risk of potential unintended social consequences

• Risk of creating a digital divide, unbalanced access to data and-or expertise on how to

analyze it and make sense of it

Internal Barriers

• Big Data for Development/Social Good projects are typically not part of

any business unit in the Telcos;

Page 58: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Technical

• Representativeness of the data, generalization

• Combination of data from multiple sources

• Real-time analysis and prediction• Lack of ground truth intervention to validate

• Significant vs substantially significant

• Correlations vs causality

Privacy/Security• Potential privacy risks need to be minimized and understood.

Control and transparency

• Security and traceability of the data

• Clear code of conduct and ethical principles when dealing

with data

• Strict access control when appropriate

Page 59: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

Framework for Data Sharing

Remote Access

Question and Answers

Limited License

Pre-computed Indicators &

Synthetic Data

SecuritySecure access to the data

AnonymizationLimited or aggregated data release

Da

ta p

rote

cte

d t

hro

ug

h

ApplicativeExploratoryLow to medium

number of users

Medium to large number

of users and open data

Development stage

Page 60: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

What can we all do to responsibly turn this

opportunity into a reality ?

Page 61: The emergent opportunity of Big Data for Social Good - Nuria Oliver @ PAPIs Connect

[email protected]

@nuriaoliver