Post on 18-Jan-2016
Re-thinking Modelling: a Call for the Use of Data Mining in
Data-driven Social Simulation
Samer HassanJavier ArroyoCelia Gutiérrez
Universidad Complutense de Madrid
Samer Hassan SS@IJCAI 2009 2
Contents
Data-driven ABM
DM-assisted Methodology
Case Study: Mentat
Application
Conclusions
Samer Hassan SS@IJCAI 2009 3
Research Aim
Samer Hassan SS@IJCAI 2009 4
Research Aim
Theoretical
KISS
Structural Validation
Abstract
General
Samer Hassan SS@IJCAI 2009 5
Research Aim
Data-driven
Non-KISS
Empirical Validation
Specific (case study)
Expressive
Theoretical
KISS
Structural Validation
Abstract
General
Samer Hassan SS@IJCAI 2009 6
Classical Logic of Simulation
Samer Hassan SS@IJCAI 2009 7
Data-Driven Logic
Samer Hassan SS@IJCAI 2009 8
Data-driven Approach
Complexity
Large amounts of Data
Auxiliary AI: Fuzzy Logic Ontologies Evolutionary Computation Data Mining
Samer Hassan SS@IJCAI 2009 9
Data Mining
Data Mining Extracting patterns and relevant information from large
amounts of data
Pre-processing of empirical data Cluster finding Discovery of hidden patterns Locates redundancies
Post-processing of simulation output Clustering:
• Discovery of hidden patterns • Validation of clusters• Locates inconsistencies
Classification• Cluster matching
Samer Hassan SS@IJCAI 2009 10
Contents
Data-driven ABM
DM-assisted Methodology
Case Study: Mentat
Application
Conclusions
Samer Hassan SS@IJCAI 2009 11
Methodology for DM-assisted ABM
Samer Hassan SS@IJCAI 2009 12
Methodology for DM-assisted ABM
Data Collection Initial point Validation points
• Necessarily ≠ initial
Type Explicit Externalised
Empirical distributions• Secondary sources
Methods Quantitative
• E.g. surveys Qualitative
• E.g. interviews
Samer Hassan SS@IJCAI 2009 13
Methodology for DM-assisted ABM
Analysis Preprocessing of empirical data
Roles Domain expert
• Guide DM exploration• Interpretation
DM expert• Confirm or refine theories
Samer Hassan SS@IJCAI 2009 14
Methodology for DM-assisted ABM
Selection of Relevant Data Filtering Adaptation of data
• Normalisation• Discretisation
Domain Expert• Theory
DM• Redundancies• Overlooked independent
variables
Samer Hassan SS@IJCAI 2009 15
Methodology for DM-assisted ABM
Data Analysis Large data collections Guided by theory
Types Cluster analysis Principal Component Analysis Time series methods Association rules
Samer Hassan SS@IJCAI 2009 16
Methodology for DM-assisted ABM Interpretation of results
Theory expert• Relate results to theory
New findings are added to the findings base
Samer Hassan SS@IJCAI 2009 17
Methodology for DM-assisted ABM
ABM Building Based on Findings Modeller
Steps Formalisation Data-driven Design Implementation Initialisation
Samer Hassan SS@IJCAI 2009 18
Methodology for DM-assisted ABM
Simulation Fine tuning the ABM
• Sensitivity analysis• Intensive testing
Output• Record agent trace
Samer Hassan SS@IJCAI 2009 19
Methodology for DM-assisted ABM
Validation Analysis of the results
• Empirical validation • Theoretical consistency
Roles• DM expert
• Analyse the data
• Domain expert• Extract conclusions
Iterative cycle
Samer Hassan SS@IJCAI 2009 20
Contents
Data-driven ABM
DM-assisted Methodology
Case Study: Mentat
Application
Conclusions
Samer Hassan SS@IJCAI 2009 21
The Problem
Aim: simulate the process of change in social values in a period in a society
Plenty of factors involved
Inertia of generational change: To which extent the demographic dynamics explain the
mental change?
Inter-generational: Agent characteristics remain constant Macro aggregation evolves
Samer Hassan SS@IJCAI 2009 22
Mentat: architecture
Agent:
Mental State attributes
Life cycle patterns
Demographic micro-evolution: • Couples• Reproduction• Inheritance
Samer Hassan SS@IJCAI 2009 23
Mentat: architecture
World: 3000 agents
Grid 100x100
Demographic model
8 indep. parameters
Social Network: Communication with
Moore Neighbourhood
Friends network
Family network
Samer Hassan SS@IJCAI 2009 24
Contents
Data-driven ABM
DM-assisted Methodology
Case Study: Mentat
Application
Conclusions
Samer Hassan SS@IJCAI 2009 25
Data Collection in Mentat
Initial data: EVS-1980
• Representative sample of Spain Qualitative info Empirically-grounded demographic equations
Validation data: EVS-1990 EVS-1999
Samer Hassan SS@IJCAI 2009 26
Analysis in Mentat
Selection of relevant data EVS-1980,1990,1999
Options:1. Algorithm for the best
subset of variables2. Rely on domain expert
Tested domain knowledge• (2) chosen
Variables adaptation• Normalisation
Name Type Range
gender categorical
age numeric ≥18
studies numeric ≥5
civil state categorical
economy numeric real
ideology ordinal 1-10
conf. church ordinal 1-4
church att. Ordinal 1-7
relig. person categorical
Samer Hassan SS@IJCAI 2009 27
Analysis in Mentat
Data Analysis Algorithm selection
• Wrapped k-means• Explore different k (# of clusters)
Discarded variables• Gender & Age provokes appearance of irrelevant clusters
• E.g. widowed women
• Economy is redundant• High correlation with Education
Samer Hassan SS@IJCAI 2009 28
Analysis in Mentat
Interpretation Sociological research
Religious typology (RLGTYPE)• Based on 3 variables• Ecclesiastical, low-intensity, alternatives & non-religious
Clusters found (1980, 1999)• Based on the 9-3=6 variables• 5 clusters with sociological meaning• Consistent with RLGTYPE
Theoretical observations of the pattern evolution:• Religiosity strength falls • Ideological spectrum twists to the left
• education & economy • Newest type of religiosity, “alternatives” rise
• youngsters
Samer Hassan SS@IJCAI 2009 29
Analysis in Mentat
Samer Hassan SS@IJCAI 2009 30
Validation in Mentat
Mentat re-building & simulation explored
Mentat output clusterised Same 5 clusters found Similar evolution trends 3 theoretical observations shown
Inconsistencies detected Liberal cluster % do not match
• although aggregated they do Graphics show less youngsters
• Liberal clusters deeply affected
Guide to re-design
Samer Hassan SS@IJCAI 2009 31
Contents
Data-driven ABM
DM-assisted Methodology
Case Study: Mentat
Application
Conclusions
Samer Hassan SS@IJCAI 2009 32
Conclusions
DM-assisted ABM methodology Suitable for DDABM
• Complexity• Large amounts of data
Limitations • KISS• Qualitative sources
Uses Build new ABM Re-thinking existing DDABM
• Revealing hidden facts• Detect inconsistencies
Samer Hassan SS@IJCAI 2009 33
Thanks for your attention!
Samer Hassansamer@fdi.ucm.es
Universidad Complutense de Madrid
Samer Hassan SS@IJCAI 2009 34
Contents License
This presentation is licensed under a
Creative Commons Attribution 3.0 http://creativecommons.org/licenses/by/3.0/
You are free to copy, modify and distribute it as long as the original work and author are cited
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor TIFF (sin comprimir).