2010 Census: Uses of the Census and Census Outreach Libby Starling Metropolitan Council.
Development of Dynamic Census -...
Transcript of Development of Dynamic Census -...
Development of Dynamic Census:�Estimating demographics and trajectories of actual populations in Bangladesh using CDR data �
University of Tokyo Shibasaki & Sekimoto Lab.
Dynamic Census Development Team Ayumi Arai*
Apichon Witayangkurn Hiroshi Kanasugi
Zipei Fan Ryosuke Shibasaki
��
What is CDR data? �
��
Localization Trajectory
How the data look like Starting time of calls
Location of antenna
CDR data can provide partial views of large-scale human mobility and distribution
Call Detail Records = CDR data
��
Motivation �¡ Popula'onsta's'csareimportantforac'vi'esbothinprivateandpublicsectors.Butaretheseenoughforunderstandinghumanac'vity?
¡ CDRdataareusefulforunderstandinghumanmobility.But..¡ Interpreta'onofanalysisresultsmaybemisleadingifCDRscan
representlimitedpartofsociety(James&Versteeg,2007;Tatem&Smith,2010)
¡ Difficulttoexaminetheimpactofrepresenta'vebiaswithoutknowingwhichpartofsocietyCDRsdepict(Wesolowskietal.,2013).
Canwedevelophumantrajectorydata,whicharelabeledwithdemographicaSributesandrepresentactualpopula'onsusingCDR?
��
Advantages and challenges of CDR data �¡ Advantages
1. Poten'allyhighpopula'oncoverage2. Nearreal-'mehumanmobility3. Rou'nelycollectedbythemobilenetworkoperator(MNO)
¡ Challenges
1. Recordedatirregularintervals2. Spa'alresolu'ondependingoncellantennaloca'ons3. Anonymized4. Representa'venessbias
A novel data set “Dynamic Census” is developed by addressing these challenges
��
What is “Dynamic Census”? �
¡ Trajectoriesanddemographicsofactualpopula3oninareascoveredbyCDRdata
¡ Griddedpopula3onsta3s3csonkeydemographicaSributesathourlybasis,e.g.workingmale,housewife,student,andother
Working males’ 24 hour population distribution in Dhaka
CDRs �Census � Dynamic Census�x�
y �
Years �
x�
y �
x�
y �
Hours� Hours�
T0 �
T0+5 �
t0 �
t0+1 �
t0+2
t0+3 �
t0 �
t0+3 �
�
Impacts and Uniqueness of Dynamic Census �
CancaptureBOPwhichhasnon-marginalimpactsoneconomy.Difficult-to-reachpopula'onforfieldsurveycanbealsocaptured. �
95% World’s cellular
network coverage�
Applicableanywherecoveredwithcellularnetworks �
�1% Cost necessary for
developing Dynamic Census�
40% Those who belong to Base of Pyramid �
Timeandfinancialcostsaremuchlowerthanconduc'ngconven'onalcensus�
�
CDR data �
Interpola3on
Spa3aldisaggrega3on&routeinterpola3on
Es3ma3onoftheunobservablepopula3on
How to address challenges in CDR data�
Dynamic Census�
Loca3onlabeling
Irregularrecordinterval
Non-uniformresolu3on
Anonymized
Challenges in CDR data
DemographicaDributees3ma3on
Representa3veness
① �
④ �
② �
③ �
Fieldsurveydata�mobilephoneusers�
Fieldsurveydata&buildingdata
(users&non-users)
Supplement data
Buildingmapdata&Roadnetworkdata
��
CDR data �
Interpola3on
Spa3aldisaggrega3on&routeinterpola3on
Es3ma3onoftheunobservablepopula3on
Dynamic Census�
Loca3onlabeling
Irregularrecordinterval
Non-uniformresolu3on
Anonymized
Challenges in CDR data
DemographicaDributees3ma3on
Representa3veness
① �
④ �
② �
③ �
Fieldsurveydata�mobilephoneusers�
Fieldsurveydata&buildingdata
(users&non-users)
Supplement data
Buildingmapdata&Roadnetworkdata
���
1. Irregular record interval�
Home
Office
Calledat8:40am
Calledat3:15pm
Interpolate CDRs based on the routine observed from longer-term data
H
W
Time� Place�8:00~8:59 � H �9:00~9:59 � W�
10:00~10:59 � W�11:00~11:59 � W�12:00~12:59 � W�13:00~13:59 � W�14:00~14:59 � W�15:00~15:59 � W�
Interpola3on
Time� Place�8:00~8:59 � H �9:00~9:59 �
10:00~10:59 �
11:00~11:59 �
12:00~12:59 �
13:00~13:59 �
14:00~14:59 �
15:00~15:59 � W�
CDR data Actual behavior
8:00 Departure�
8:45 Arrival�
Noinforma'oninCDRdata
���
Interpolation �¡ Extracting routine patterns
¡ Atopicmodelisemployed
¡ Rou'nepaSernisexpressedastheprobabilitydistribu'onofkeyloca'ons(Home,Work,andOther)
¡ Spatiotemporal interpolation
¡ HiddenMarkovModelisemployed('mingoftransi'onisiden'fied)
Topic model Hidden Markov Model
Collaborative filtering approach
+
���
CDR data �
Interpola3on
Spa3aldisaggrega3on&routeinterpola3on
Es3ma3onoftheunobservablepopula3on
Dynamic Census�
Loca3onlabeling
Irregularrecordinterval
Non-uniformresolu3on
Anonymized
Challenges in CDR data
DemographicaDributees3ma3on
Representa3veness
① �
④ �
② �
③ �
Fieldsurveydata�mobilephoneusers�
Fieldsurveydata&buildingdata
(users&non-users)
Supplement data
Buildingmapdata&Roadnetworkdata
���
2. No demographic attribute info�
Popula'onsinCDRsdonotalwaysrepresentthepopula'onunderstudy
Canspecifythepopula'onunderstudy
DemographicaDributesarees3mated
Target population group�
���
Demographic attribute estimation�¡ Approach
¡ RandomForestisemployedforbuildinganes'ma'onmodel
¡ One-month-call-recordsfrom58volunteersareusedastrainingdata
¡ One-day-call-recordsfrom922mobilephoneusersareusedforexaminingrela'onshipbetweencallingbehavioranddemographicaSributes
¡ Estimated features
¡ Workingmale,housewife,student,andother
¡ Incomelevel(individual)andAgegroup(-20/21-35/36-60/61-)←Resultstobeimproved
Class � Accuracy � Precision� Recall�Working male� 0.79� 0.63� 0.70�
Housewife� 0.67� 0.47� 0.82�
Student� 0.89� 0.40� 0.22�
Other� 0.63� 0.20� 0.07�
Estimation results
���
Calling behavior survey to relate demographic attributes and CDR data�¡ Purpose
¡ Relatecallingbehavior(callrecords)anddemographicaSributes¡ Surveyed area and population
¡ 15Wardsarechosenbasedonlanduse.ForeachWard,18HHseacharechosenfrom3incomegroupsinGreaterDhaka(Two-stagestra'fiedsampling)
¡ Allmembersareinterviewed¡ InterviewedondemographicaSribute,travel-ac'vity,andmobilephoneuse
¡ Key of this survey ¡ Incomelevelisdeterminedbasedonthetypeofbuildings
Interviewataslumhousehold �
Interviewatahighincomehousehold �
���
CDR data �
Interpola3on
Spa3aldisaggrega3on&routeinterpola3on
Es3ma3onoftheunobservablepopula3on
Dynamic Census�
Loca3onlabeling
Irregularrecordinterval
Non-uniformresolu3on
Anonymized
Challenges in CDR data
DemographicaDributees3ma3on
Representa3veness
① �
④ �
② �
③ �
Fieldsurveydata�mobilephoneusers�
Fieldsurveydata&buildingdata
(users&non-users)
Supplement data
Buildingmapdata&Roadnetworkdata
��
3. Non-uniform spatial resolution�
Spatial disaggregation
Loca'oninCDRdataisatantennalevel
Disaggregatedbasedonthedistribu'onofbuildings
Disaggregated
��
Stay point reallocation �¡ Modifying spatial resolution
¡ Stay points are reallocated to building POIs ¡ Antennabasisloca'onsarereallocatedtobuildingPOIswithinvoronoi¡ Eachvoronoicellisconsideredtobeanareacoveredbyanantenna
¡ Allocation probability is based on the area size of building
¡ Types of buildings are used as the proxy of the income level
��2016.03.11�
Distribution of POIs�
Stay point (antenna) �
Voronoi cell generated from antenna location �
POIs within voronoi�
���
CDR data �
Interpola3on
Spa3aldisaggrega3on&routeinterpola3on
Es3ma3onoftheunobservablepopula3on
Dynamic Census�
Loca3onlabeling
Irregularrecordinterval
Non-uniformresolu3on
Anonymized
Challenges in CDR data
DemographicaDributees3ma3on
Representa3veness
① �
④ �
② �
③ �
Fieldsurveydata�mobilephoneusers�
Fieldsurveydata&buildingdata
(users&non-users)
Supplement data
Buildingmapdata&Roadnetworkdata
���
4. Representativeness�
mobile users �
Populations in CDR data
Suppose you have CDRs from Dhaka … �
Those who are not
included in CDRs �
Unobservable in CDR data
Entire population in Dhaka
Scalingfactorsarecomputedtoes'mate
thispart�
Entire living populations �
(B) People in HHs which do NOT include any GP users Unobservables�
(A) People in HHs which include GP users GP users + unobservables�
Understanding population covered by CDRs on household basis
���
Estimation of the unobservable �¡ Scaling factor
¡ Approx.householdnumberiscalculatedbasedonthenumberofbuildings
¡ ScalingfactoriscomputedfromthetypicalHHstructure,obtainedthroughfieldsurvey
Real population of areas covered by CDR data Non users �
Users�
Non users (Not appear in CDR
data)�
Users�
Non users �
User
x scaling factor �
(a) HHs including users
User
x scaling factor �
(b) HHs consisting of no users alone User
x (scaling factor �+�)
���
Purpose • Inves'gatethepopula'onstructureforeachincomelevel• Obtaindatatocalculatescalingfactorstocomputethenumberof
popula'onsfromthedistribu'onofbuildingbyincomelevel
Surveyed Voronoi area
Surveyed area and population • En'repopula'onsinaVoronoicellwere
surveyedinDecember2014• 2,839HHsconsis'ngof11,521people
from366buildings(outof367buildings)
Key of SCC • Incomelevelisdeterminedbasedonthe
typeofbuildings
Small-scale census survey (SSC) to see population structure for each income level
Average HH structures obtained from survey
HHs not including GP users HHs including GP users
���
Type of building and income level
Contents of the map data • Approx.650,000buildings(withthetypeofbuildings)• Residen'albuildingsareclassifiedintofourgroupsbytheheightof
buildings
Sample of the map
Criteria of the type of buildings • High(Sevenormorestories)• Middle(Morethantwostories)• Low(Onetotwostories)• Slum(Onestory)
:High
:Middle
:Low
:Slum
Legend of the type of building
���
Future work �
CDR data �
Dynamic Census�
Spa3otemporalinterpola3on
DemographicaDributees3ma3on
Es3ma3onoftheunobservablepopula3on
Staypointextrac3on&loca3onlabeling
Spa3aldisaggrega3on&routeinterpola3on
Willbeimprovedwiththeuseofsmartphones
Resultsdependondataquality→Compara3vestudies
Buildingmapdataiscostly→Satelliteimageprocessing
Varietyinlife-styles→Compara3vestudies
Anyques'ons/sugges'[email protected]