A microsimulation model for forecasting education
-
Upload
niels-erik-kaaber-rasmussen -
Category
Data & Analytics
-
view
204 -
download
2
description
Transcript of A microsimulation model for forecasting education
A microsimulation model for forecasting education
Niels Erik Kaaber Rasmussen, DREAM
European Meeting of the International Microsimulation AssociationMaastricht, 23-24 October
Why?
• Education is important• Rich educational data on individual level• Full population data for current and
forthcoming generations – extrogenous• Using machine learning technique to handle
transition probabilities• Light multi-threaded setup for fast computation
Population Education Labor market DREAM(Economy)
Part of the DREAM-system
What answers are we looking for?
• What will be the general level of education in Denmark in 2050?
• What if behavoir changes? – Increase in drop out rate– Decrease in enrollment– …
• How does this effect long term fiscal sustainability?
Model charateristics
• Dynamic microsimulation• Full Danish population• Longitudinal model• Unit-wise updating• Closed model• Stochastic• Discrete time
ProgramProbabilitiesStatistics
output
PopulationEducation
Person
Full Danish population from 2014 to 2130 (total of 18.8 million people) in 2.5 minutes
Simulation
[Show visualisation]
Transition probabilities
• Smoothing• Extrapolation• Grouping with decision trees
Smoothing transition probabilities
Extrapolation of trends
Raw transition probabilities
• Transition probabilty = historical frequency• Behavoir depends on many characteristics• Data is too sparse• Too much noise
• Transition probabilities depends on: Gender, origin, age, highest education, current participation in education (2x5x50x12x12=72.000). And more to come…
Conditional inference trees• Decision tree• Groups observations in a way so that there’s a:– minimum of variation within a group– maximum variation across groups
• Data-mining approach• Based on statistical tests
Origin = Immigrant from western country
Yes No
CTREE algorithm1. Test for independence between any of the
explanatory variables and the responsea) Stop if p>0.05
2. Select the input variable with strongest association to the response.
3. Find best binary split point for the selected input variable.
4. Recursively repeat from step 1 until a stop criterion is reached.
Current participation in education
17-65 years olds by highest level of education
Further developments
• Spartial dimension: 98 municipalities• Social inheritance: Educational level of parents
• Stronger path-dependencies?• In SMILE DK-context: Related educational
events to moving patterns, demography, labour market behavoir and more…