Post on 14-Dec-2014
description
www.unglobalpulse.org@UNGlobalPulse
Download at:
http://www.unglobalpulse.org/BigDataforDevWhitePaper
• DATA INTENT AND CAPACITY• SOCIAL SCIENCE AND POLICY
APPLICATIONS
TABLE OF CONTENTS
Section II: Challenges• DATA CHALLENGES• ANALYTICAL CHALLENGES
Section III: Applications• WHAT NEW DATA STREAMS BRING TO
THE TABLE• MAKING BIG DATA WORK FOR
DEVELOPMENT
Section I: Opportunities
Big data• The three V’s of the digital data
deluge:• Exponential growth in
volume• Increasing velocity of data
flow• Bewildering variety of new
data types
Section I: OpportunityThe Data Revolution
Real-time operations in the private sector• Real-time analysis, real-time decision-
making, real-time customer feedback
• Malnutrition Months
• Starvation Weeks
• Cholera Days
• Earthquake Hours
Global Pulse Definition:“Information about a phenomenon available quickly enough to maintain an accurate reflection of its current state, such that effective action may be taken in response.”
Timeframe for intervention is relative to context:
What Do We Mean by Real-Time?
• As of 2010: 4 billion of the world’s 5 billion mobile phones are in in developing countries
• Mobile Services: money transfers, job search, commerce, market prices, social media
Section I: OpportunityRelevance to the Developing World
Mobile Banking in East Africa: Kenya: 11,000 new users/day, Tanzania: 15,000, Uganda 18,000
Facebook in Senegal: 100,000 new users per month
• Drivers of Volatility: financial shocks, climate change, hyperconnectivity
• Early Warning Today: local impacts invisible or impossible to track as they happen.
• Growing Intent: policy makers are recognizing both the costs of volatility and the need for greater agility.
Section I: OpportunityIntent in an Age of Growing Volatility
2011 OECD Report: “[d]isruptive shocks to the global economy are likely to become more frequent and cause greater economic and societal hardship. The economic spill-over effect of events like the financial crisis or a potential pandemic will grow due to the increasing interconnectivity of the global economy and speed with which people, goods and data travel”.
• The availability of real-time digital data is increasing every second.
• Slowly but surely, intent to leverage it as a public good is growing.
• Yet there must also be capacity to understand it -- and use it to change outcomes.
Section I: OpportunityData Mining and Data Science
“Data is the new oil. Like oil, it must be refined before it can be used.”
- Andreas Weigend
Illustration: Coping strategies of a hypothetical household facing rising commodity prices and unemployment
Section I: OpportunityBig Data for Development: Getting Started
OFFLINE BEHAVIORS• Buy cheaper foods• Work longer hours• Reduce energy use• Draw down savings• Sell assets• Borrow from relatives
DIGITAL SIGNATURES• Depletion of airtime credit• Smaller mobile airtime
purchases• Failure to repay microloans
via mobile financial services• Changes in calling patterns• Inbound money transfers• Searches for jobs, health• Sales of livestock via mobile
trading network• Venting frustrations on
social media
1. Data Exhaust. Mobile usage, purchases, search, app usage.
2. Online Information. New stories, blogs, Twitter, Facebook, obituaries, job postings, ecommerce.
3. Physical Sensors. Satellite imagery, video, traffic sensors, etc.
4. Crowdsourced Reports. Information actively generated by citizens through mobile phone-based surveys, hotlines, online maps, etc.
A Loose BD4D Taxonomy:
Section I: OpportunityBig Data for Development: Getting Started
1. Stream Analytics: Continuous analysis over real-time streaming data (social media, calling patterns, online prices, search)
2. Data Mining: Online digestion of semi-structured and unstructured historical data (news items, blog posts)
3. Real-Time Correlation: Integrating fast streams with historical records to provide context to new data
Data Analytics and “Reality Mining”
Section I: OpportunityCapacity: Big Data Analytics
Data Visualization Matters!
A word cloud of this whitepaper
Global legal timber trade:Top 5 exporters and costs
A growing body of evidence:• Mining mobile location data to detect job
loss, migration.• Mining mobile usage to detect mental
illness• Mining Twitter for misuse of antibiotics and
other medications• Mining Facebook for evidence of drinking
problems among college students• Remote sensing of nighttime light
emissions for a real-time estimation of GDP• Crowdsourcing citizen SMS reports to
estimate earthquake damage
Section I: OpportunitySocial Science and Policy Applications
Tracking Health-Related Behaviour Change:
Mining Twitter messages
Cholera in Haiti
H1N1 epidemic in the US
Tracking Health-Related Behaviour Change
Mining Google searches
Volume of real-time searches for symptoms predicts official # of cases of Dengue in Brazil
Section II: ChallengesData Privacy
1. Digital Data Privacy as a Human Right• Data acquisition• Storage• Retention• Use• Presentation
2. Privacy Risks in Big Data. • Awareness of consent to collect, • Reuse of public content, • Re-identification.
Section II: ChallengesData Access
Private sector barriers to sharing Big Data:• Legal constraints• Reputational risk• Competitive
advantage• Culture of secrecy• Lack of incentives• Technical
complexity• Level of effort
Data Philanthropy!
Section II: ChallengesAnalysis
Getting the picture right with user-generated data• Falsification, deliberate
distortion• Sensor network distribution • Perceptions vs. facts: Flu
Trends detects ILI, not Influenza.
• Sentiment Analysis: sarcasm, irony, hyperbole, humor, and the elusiveness of intent.
• Expressed vs. actual intentions
• Text mining: context and significance
Map of tweets in Jakarta
Section II: ChallengesAnalysis
Interpreting behavioral data• Selection bias: income, education,
age, gender, technical aptitude, service provider
• Media coverage drives behaviour change
• Apophenia: correlation is not causality
Section II: ChallengesAnalysis
Detecting and defining anomalies in human ecosystems• Establishing a baseline: how
stringent is your model?• Sensitivity vs. specificity: false
positive undermine credibility; false negatives reduce relevance.
Section III: ApplicationWhat New Data Streams Bring to the Table
Know your data!• Big Data is….just data. However…• News organizations have developed
verification methodologies• Perceptual data is useful for
detecting events• False perceptions drive population
behavior• Selection bias can be an advantage:
in developing countries, online inflation may precede offline inflation
Section III: ApplicationWhat New Data Streams Bring to the Table
• Sometimes correlation suffices: proxy indicators
• Accuracy vs. speed, cost, scale
• Real-time data saves lives
Applications of Big Data for Development
“Even if all you have got is a contemporaneous correlation, you’ve got a 6-week lead on the reported values. The hope is that as you take the economic pulse in real time, you will be able to respond to anomalies more quickly.” - Hal Varian, Chief Economist, Google
USGS Twitter Earthquake Detector
Section III: ApplicationWhat New Data Streams Bring to the Table
Global Pulse research: real-time proxy indicators
Tweets about the price of rice vs. official food prices in Indonesia
Section III: ApplicationWhat New Data Streams Bring to the Table
Global Pulse research: real-time proxy indicators
Correlation of mood changes and emerging topics in social media with official unemployment figures in the US and Ireland
Section III: ApplicationWhat New Data Streams Bring to the Table
A threefold opportunity for development1. Early warning: Faster detection of
anomalies at the onset of a crisis allows more agile responses to prevent harm.
2. Real-time awareness: A fine-grained and current representation of reality informs better design and targeting of programmes and policies;
3. Real-time feedback: Continuous monitoring for behaviour changes following programme implementation enables a more adaptive approach to development, in which rapid adjustments may be made until results are achieved.
Section III: ApplicationMaking Big Data Work for Development
Contextualization is key1. Data context: Indicators should not be
interpreted in isolation. Monitor for constellations of anomalies, triangulating across data sources.
2. Cultural context: Local knowledge of what is “normal” in a given population is a prerequisite for recognizing anomalies. Cultural practices and norms vary widely the world over and these differences certainly extend to the use of digital services. There is a deeply ethnographic dimension to using Big Data for development
Section III: ApplicationMaking Big Data Work for Development
Becoming sophisticated users of informationExample: FEMA tracking 2011 US tornado impacts through Twitter
1. “We aren’t making widgets”: Navigating the tradeoff between speed and accuracy.
2. Focus on changing outcomes. How can we leverage the real-time nature of the data to save lives?
“Disasters are like horseshoes, hand grenades and thermal nuclear devices, you just need to be close—preferably more than less.” – Craig Fugate, Administrator, US Federal Emergency Management Agency
Conclusion
How can Big Data fulfill its potential as a public good?1. Institutional and financial support from
public sector actors2. Creating incentives for corporations to
share data3. Creating opportunities for academic
researchers to collaborate4. Developing new models, technologies
and policies for safe and responsible sharing and reuse of data for the public good
5. New types of partnerships
UN Global Pulsewww.unglobalpulse.org
@unglobalpulse
Image credit: Aaron Koblin24 hours of AT&T phone calls and
Internet traffic flowing through New York City