Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer...

34
Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Gutiérrez Universidad Complutense de Madrid

Transcript of Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer...

Page 1: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Re-thinking Modelling: a Call for the Use of Data Mining in

Data-driven Social Simulation

Samer HassanJavier ArroyoCelia Gutiérrez

Universidad Complutense de Madrid

Page 2: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 2

Contents

Data-driven ABM

DM-assisted Methodology

Case Study: Mentat

Application

Conclusions

Page 3: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 3

Research Aim

Page 4: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 4

Research Aim

Theoretical

KISS

Structural Validation

Abstract

General

Page 5: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 5

Research Aim

Data-driven

Non-KISS

Empirical Validation

Specific (case study)

Expressive

Theoretical

KISS

Structural Validation

Abstract

General

Page 6: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 6

Classical Logic of Simulation

Page 7: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 7

Data-Driven Logic

Page 8: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 8

Data-driven Approach

Complexity

Large amounts of Data

Auxiliary AI: Fuzzy Logic Ontologies Evolutionary Computation Data Mining

Page 9: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 9

Data Mining

Data Mining Extracting patterns and relevant information from large

amounts of data

Pre-processing of empirical data Cluster finding Discovery of hidden patterns Locates redundancies

Post-processing of simulation output Clustering:

• Discovery of hidden patterns • Validation of clusters• Locates inconsistencies

Classification• Cluster matching

Page 10: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 10

Contents

Data-driven ABM

DM-assisted Methodology

Case Study: Mentat

Application

Conclusions

Page 11: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 11

Methodology for DM-assisted ABM

Page 12: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 12

Methodology for DM-assisted ABM

Data Collection Initial point Validation points

• Necessarily ≠ initial

Type Explicit Externalised

Empirical distributions• Secondary sources

Methods Quantitative

• E.g. surveys Qualitative

• E.g. interviews

Page 13: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 13

Methodology for DM-assisted ABM

Analysis Preprocessing of empirical data

Roles Domain expert

• Guide DM exploration• Interpretation

DM expert• Confirm or refine theories

Page 14: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 14

Methodology for DM-assisted ABM

Selection of Relevant Data Filtering Adaptation of data

• Normalisation• Discretisation

Domain Expert• Theory

DM• Redundancies• Overlooked independent

variables

Page 15: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 15

Methodology for DM-assisted ABM

Data Analysis Large data collections Guided by theory

Types Cluster analysis Principal Component Analysis Time series methods Association rules

Page 16: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 16

Methodology for DM-assisted ABM Interpretation of results

Theory expert• Relate results to theory

New findings are added to the findings base

Page 17: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 17

Methodology for DM-assisted ABM

ABM Building Based on Findings Modeller

Steps Formalisation Data-driven Design Implementation Initialisation

Page 18: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 18

Methodology for DM-assisted ABM

Simulation Fine tuning the ABM

• Sensitivity analysis• Intensive testing

Output• Record agent trace

Page 19: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 19

Methodology for DM-assisted ABM

Validation Analysis of the results

• Empirical validation • Theoretical consistency

Roles• DM expert

• Analyse the data

• Domain expert• Extract conclusions

Iterative cycle

Page 20: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 20

Contents

Data-driven ABM

DM-assisted Methodology

Case Study: Mentat

Application

Conclusions

Page 21: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 21

The Problem

Aim: simulate the process of change in social values in a period in a society

Plenty of factors involved

Inertia of generational change: To which extent the demographic dynamics explain the

mental change?

Inter-generational: Agent characteristics remain constant Macro aggregation evolves

Page 22: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 22

Mentat: architecture

Agent:

Mental State attributes

Life cycle patterns

Demographic micro-evolution: • Couples• Reproduction• Inheritance

Page 23: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 23

Mentat: architecture

World: 3000 agents

Grid 100x100

Demographic model

8 indep. parameters

Social Network: Communication with

Moore Neighbourhood

Friends network

Family network

Page 24: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 24

Contents

Data-driven ABM

DM-assisted Methodology

Case Study: Mentat

Application

Conclusions

Page 25: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 25

Data Collection in Mentat

Initial data: EVS-1980

• Representative sample of Spain Qualitative info Empirically-grounded demographic equations

Validation data: EVS-1990 EVS-1999

Page 26: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 26

Analysis in Mentat

Selection of relevant data EVS-1980,1990,1999

Options:1. Algorithm for the best

subset of variables2. Rely on domain expert

Tested domain knowledge• (2) chosen

Variables adaptation• Normalisation

Name Type Range

gender categorical

age numeric ≥18

studies numeric ≥5

civil state categorical

economy numeric real

ideology ordinal 1-10

conf. church ordinal 1-4

church att. Ordinal 1-7

relig. person categorical

Page 27: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 27

Analysis in Mentat

Data Analysis Algorithm selection

• Wrapped k-means• Explore different k (# of clusters)

Discarded variables• Gender & Age provokes appearance of irrelevant clusters

• E.g. widowed women

• Economy is redundant• High correlation with Education

Page 28: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 28

Analysis in Mentat

Interpretation Sociological research

Religious typology (RLGTYPE)• Based on 3 variables• Ecclesiastical, low-intensity, alternatives & non-religious

Clusters found (1980, 1999)• Based on the 9-3=6 variables• 5 clusters with sociological meaning• Consistent with RLGTYPE

Theoretical observations of the pattern evolution:• Religiosity strength falls • Ideological spectrum twists to the left

• education & economy • Newest type of religiosity, “alternatives” rise

• youngsters

Page 29: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 29

Analysis in Mentat

Page 30: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 30

Validation in Mentat

Mentat re-building & simulation explored

Mentat output clusterised Same 5 clusters found Similar evolution trends 3 theoretical observations shown

Inconsistencies detected Liberal cluster % do not match

• although aggregated they do Graphics show less youngsters

• Liberal clusters deeply affected

Guide to re-design

Page 31: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 31

Contents

Data-driven ABM

DM-assisted Methodology

Case Study: Mentat

Application

Conclusions

Page 32: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 32

Conclusions

DM-assisted ABM methodology Suitable for DDABM

• Complexity• Large amounts of data

Limitations • KISS• Qualitative sources

Uses Build new ABM Re-thinking existing DDABM

• Revealing hidden facts• Detect inconsistencies

Page 33: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 33

Thanks for your attention!

Samer [email protected]

Universidad Complutense de Madrid

Page 34: Re-thinking Modelling: a Call for the Use of Data Mining in Data-driven Social Simulation Samer Hassan Javier Arroyo Celia Guti é rrez Universidad Complutense.

Samer Hassan SS@IJCAI 2009 34

Contents License

This presentation is licensed under a

Creative Commons Attribution 3.0 http://creativecommons.org/licenses/by/3.0/

You are free to copy, modify and distribute it as long as the original work and author are cited

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor TIFF (sin comprimir).