Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

22
Geodemographic modelling collaboration Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009- 04-21

description

Geodemographic modelling collaboration. Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21. Overview. Background MoSeS Population Reconstruction Dynamic Simulation Summary of on-going projects/effort Future Work Next Steps Acknowledgements. - PowerPoint PPT Presentation

Transcript of Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Page 1: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Geodemographic modelling collaboration

Alex Voss, Andy Turner

Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Page 2: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Overview

• Background• MoSeS

– Population Reconstruction– Dynamic Simulation

• Summary of on-going projects/effort• Future Work• Next Steps• Acknowledgements

Page 3: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Background

• Social science does not traditionally use advanced ICTs but emergence of new analytical methods is driven by:– Increased availability of data about social phenomena– Issues with data management and integration– Challenges to analyse social phenomena at scale– Challenges to inform practical policy and decision making

(e.g., evidence-based policy making)• National Centre for e-Social Science (NCeSS) in the UK

is investigating ways to respond to these challenges.• EUAsiaGrid is supporting e-Social Science amongst

other application domains

Page 4: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

MoSeS

• MoSeS develops modeling and simulation approaches for social science– First phase research node of NCeSS, now continued through

second round node GENeSIS• Contemporary demographic modeling of the UK based on

UK census data and other datasets• Using agent-based simulation to project population forward in

time by 25 years• Simulate the impact of distinct demographic processes such

as mortality, fertility, health status, household formation, migration

• E.g., to inform policy making – what impact do policy decisions have

Page 5: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Population Reconstruction

• Generation of an individual level population data for the UK– Based on 2001 census data– Works with ‘public release’ versions of census that are restricted,

• Census Aggregate Statistics at Output Area Level• 1% of population (anonymisation)

– Reconstructed data has same attributes as real population and same number of individuals but is still anonymised

– Uses a genetic algorithm to select a well fitting set of sample of anonymised records to assign to an output area

– Need for attributes in the SAR to be matched with those in the CAS

• This is often complicated because of different categories– Aggregation to a lowest common categorisation

Page 6: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Population Reconstruction (II)

DisclosureControl

Validation*

Page 7: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Population Reconstruction (III)

• If was 0, we would have a dataset that happens to match the raw census

• We are looking for a synthetic population with a ‘reasonable’

• How can we check if is ‘reasonable’?

Page 8: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Input Data

• Example: Individual-level Samples of Anonymised Records (ISARs)

• Downloadable for UK academics from http://www.ccsr.ac.uk/sars/2001/indiv/

• Click-through End-User License• Disclosure control through:

– Selection– Aggregation– Permutation

• No additional output checking required

Page 9: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Output Variables

• Demographics• Employment• Health status• Household composition• Two types of constraints:

– Control constraints (have to be met)– Optimisation constraints (used in fitness function)

Page 10: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Control Constraint

Page 11: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Optimisation constraint

Page 12: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Running Pop. Reconstruction• Can use geographic segmentation to compute each output area in

parallel• Good example of high-throughput computing using master-slave

architecture• Each output area takes about 5 minutes to compute (on average)• UK contains about 223,060 output areas, so a complete run would take

about 775 days to complete• Different control constraints lead to different runtime behaviour for each

output area• This can be reduced by applying, e.g., 128 processors, reducing the

runtime to 6 days• More than likely to have errors in such a large compute job, need robust

error-recovery• Realistically, it takes 2 weeks to compute a complete output

Page 13: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Application Porting

Currently porting population reconstruction code to EGEE, investigating TW data assets and exploring other links, e.g., with healthcare research

Setting up e-Social Science VO

Page 14: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Demo…

Page 15: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

p-Grade Portal

Page 16: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

p-Grade Portal

Page 17: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

p-Grade Portal

Page 18: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Experiences

• Integrating existing code into grid environment required some changes to source code– management of input arguments– code scalability– log management– error handling

• Finding the right input size and parameters for testing to keep execution times low

• Making sense of execution failures– lack of ways to debug code in distributed environments

Page 19: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Experiences II

• Step-wise process works well, – ensures we encounter problems piece by piece– allows us to comply with data protection / licensing

• Population reconstruction is resource intensive– may run up against limits on wall clock time

• Importance of ‘at elbow’ support – but hindered by data protection/licensing issues

• Licensing means we need to limit execution to UK resources

• No e-Social Science VO for EGEE porting (yet)– Needed to get support from other VOs in the meantime

Page 20: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Dynamic Modelling

• Daily activity modelling– Commuting– Retail modelling– Transportation

• Population Forecasting– Annual time step– Birth– Death– Migration

Page 21: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Future Work

• Next steps until code runs in Taiwan with Taiwanese data– Proof of concept execution on Quanta cluster at ASGC– Definition of data outputs from

• Modularising computation so it can exploit multiple NGS nodes or EGEE CEs

• Improving data and code staging• Moving from population reconstruction to

supporting the simulation process• Integration into ‘science gateway’ for the social

sciences and developing a repository for models

Page 22: Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21

Acknowledgements• National Centre for e-Social Science

– MoSeS Node: Mark Birkin (PI)– GENeSIS Node: Mike Batty (PI)– NCeSS Hub: Peter Halfpenny and Rob Procter

• EUAsiaGrid Consortium– Marco Paganoni (Project Director)

• CPC at Westminster University– Gabor Szmetanko– Gabor Terstyanszky– Tamas Kiss

• GridPP– Jens Jensen and Jeremy Coles

• National Grid Service– Jason Lander and Shiv Kaushal (Leeds), Steven Young (Oxford), Mike Jones

(Manchester)