Post on 26-Aug-2018
Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data
Pascaline DupasDepartment of Economics, Stanford University
Data for Development Initiative @ Stanford Center on Global Poverty and Development
Goals:1-Use satellite imagery to identify basic measures of
physical infrastructure and provision of public goods
2-Use these measures of physical infrastructure as
“dependent variables” in economic analyses
Introduction: Why measure infrastructure access?
● Better understand quality of life
and its spatial distribution
● Effectively plan & distribute
resources
● Keep leaders aware and
accountable
● Support developing regions
Background / Related Work
1. Using satellite images to predict land use● Albert, et al. (2017) used state-of-art deep convolutional neural nets (VGG-16
& ResNet) to analyze patterns in land use in urban settings with large scale
satellite data. The prediction accuracy ranged between 0.7 to 0.8
2. Using other data sources to detect infrastructure● Mnih and Hinton (2011) used a Restricted Boltzmann Machines structure by
feeding in images. They predicted whether a small block of pixels was a road or
not, and were able to get around 0.87 test accuracy
3. Using night lights to proxy for development (Economics)
Background / Related Work
1. Using satellite images to predict land use● Albert, et al. (2017) used state-of-art deep convolutional neural nets (VGG-16
& ResNet) to analyze patterns in land use in urban settings with large scale
satellite data. The prediction accuracy ranged between 0.7 to 0.8
2. Using other data sources to detect infrastructure● Mnih and Hinton (2011) used a Restricted Boltzmann Machines structure by
feeding in images. They predicted whether a small block of pixels was a road or
not, and were able to get around 0.87 test accuracy
3. Using night lights to proxy for development (Economics)
Afrobarometer Round 6 (2014-2015)
● Field surveys
● 36 African countries
● 7022 enumeration areas (EAs)
○ surveyor-assessed measures of
access to basic infrastructure
(piped water, sewerage, etc.)
long
eapipedwater:
Satellite Imagery
satellite Landsat 8 (l8) Sentinel 1 (s1)
# bands 6 5
resolution 30m 15m
original image size 500 x 500 pixels 500 x 500 pixels
interpretation reflectance backscatter
6 Band Landsat 8 Results
Value Balance Accuracy F1 ROC
Sewerage 0.33 0.83 0.74 0.89
Electricity 0.67 0.82 0.86 0.85
Piped Water
0.58 0.78 0.81 0.83
Road 0.54 0.74 0.76 0.78
Post Office 0.24 0.79 0.49 0.76
Bank 0.25 0.78 0.48 0.76
● Meaningful predictions, far surpassing
random chance and with ROCs good
quality.
● Best performance on sewerage,
electricity, and piped water access.
● Weak performance on fields hard to
detect from imagery.
● On par with state of the art
classification results (Albert et al 2017,
Step 2: Using the new measures to fight poverty
● Apply trained CNN on all inhabited pixels on the African continent
● Generate predictions
● Study distribution
○ Targeting -- Areas lagging behind?
○ Determinants of infrastructure placement, patronage, ethnic politics
Relevant Metrics
● F1-score (F1)
● Area under ROC curve (ROC)
probability that classifier will rank a randomly chosen
positive example higher than a randomly chosen
negative example
6 Band Landsat 8 Results
Value Balance Accuracy F1 ROC
Sewerage 0.33 0.83 0.74 0.89
Electricity 0.67 0.82 0.86 0.85
Piped Water
0.58 0.78 0.81 0.83
Road 0.54 0.74 0.76 0.78
Post Office 0.24 0.79 0.49 0.76
Bank 0.25 0.78 0.48 0.76
● Meaningful predictions, far surpassing
random chance and with ROCs good
quality.
● Best performance on sewerage,
electricity, and piped water access.
● Weak performance on fields hard to
detect from imagery.
● On par with state of the art
classification results (Albert et al 2017,
Comparing to Baselines: OSM
Value Balance Accuracy F1 ROC
Sewerage 0.33 0.83 0.74 0.89
Electricity 0.67 0.82 0.86 0.85
Piped Water 0.58 0.78 0.81 0.83
Value Balance Accuracy F1 ROC
Sewerage 0.32 0.74 0.73 0.77
Electricity 0.67 0.68 0.66 0.73
Piped Water 0.61 0.67 0.67 0.73
Model OSM Baseline
● The Model surpasses the OSM
baseline on all three of its most
successful measures.
Comparing to Baselines: Nightlights
Value Balance Accuracy F1 ROC
Sewerage 0.33 0.83 0.74 0.89
Electricity 0.67 0.82 0.86 0.85
Piped Water 0.58 0.78 0.81 0.83
Value Balance Accuracy F1 ROC
Sewerage 0.32 0.79 0.64 0.74
Electricity 0.67 0.75 0.79 0.78
Piped Water 0.61 0.72 0.74 0.73
Model Nightlights Baseline
● The model surpasses nightlights, even
on electricity.
Comparing to Baselines: Oracle
Value Balance Accuracy F1 ROC
Sewerage 0.33 0.83 0.74 0.89
Electricity 0.67 0.82 0.86 0.85
Piped Water 0.58 0.78 0.81 0.83
Model OracleValue Balance Accuracy F1 ROC
Sewerage 0.33 0.82 0.82 0.89
Electricity 0.67 0.81 0.80 0.89
Piped Water 0.58 0.81 0.80 0.89
● The model is on par with the Oracle,
demonstrating that is finding almost as
much signal as it can.
Goals
● Inclusion of previous Afrobarometer Rounds
● Scaling project with OSM Data
● Model interpretability
● Experiments for the Paper
Afrobarometer
Tasks:
1. Improve base model with previous rounds of the Afrobarometer dataset
2. Predict previous time spans from future time spans (predict rounds 1-3 with
rounds 4-6)
3. Test for temporal aspects in repeat areas (if there are any)
DeepOSM for Infrastructure
Premise,
● Afrobarometer dataset remains limited and noisy (quality is subjective)
● OSM might be the best chance to scale this project (infrastructure is a huge
category and we should leverage all existing sources)
● Google Static Maps API (25,000 free images / day) has satellite images at all
scales
Proposal
● Choose the most relevant tags in OSM related to infrastructure
● Align tags with satellite imagery
● Use R-CNN to detect tags
Then, use all Afrobarometer rounds as validation data
Open question: how to relate trained OSM model to Afrobarometer prediction