StatMine, visual exploration of output data

Post on 30-Apr-2015

151 views 1 download

description

Presentation given at OECD conference, Seoul 2012.

Transcript of StatMine, visual exploration of output data

StatMine – prototypeStatMine, an exploration of dissemination data

Edwin de Jonge

Statistics Netherlands

25 September 2012, Seoul

an exploration of dissemination data: StatMine 2

an exploration of dissemination data: StatMine 3

StatMine, from numbers to analysis 4

an exploration of dissemination data: StatMine 5

Why StatMine?

• Statistics Netherlands (SN) mission produce relevant information for:• Policy makers• Journalists• Citizens• Enterprises• Economists• Social scientists • Etc.

5

an exploration of dissemination data: StatMine 6

Numbers ≠ Information

StatLine is SN’s online DB (over 1 billion figures)

We know from a user study that:

1. Many interesting patterns in StatLine are not spotted by users

2. Many important topics in StatLine are scattered across multiple tables

6

an exploration of dissemination data: StatMine 7

Example of problem 2

• Policymaker interested in patients with diabetes:

• Visits to medical doctor• Hospital admissions• Mortality• Medication consumption (insuline)• Obesity

Are all different statistical products (from different sources)!

an exploration of dissemination data: StatMine 8

Data analysis = Data insight

Goal research project StatMine is to provide data insight by:

• (I) Using data visualisation• (II) Combining data table fragments• (III) Deriving variables

All hypotheses (will be) tested with a prototype with internal and external users.

(I), tested and succesful

(II, III,… ) is work in progress

8

an exploration of dissemination data: StatMine 9

Chart types

Bar chart

Line chart

Mosaic chart

Bubble/scatter chart

Comparison

Development

Structure

Correlation

an exploration of dissemination data: StatMine 10

Chart type – bar chart

an exploration of dissemination data: StatMine 11

Chart type – line chart

an exploration of dissemination data: StatMine 12

Chart type – mosaic chart

an exploration of dissemination data: StatMine 13

Chart type – bubble chart

an exploration of dissemination data: StatMine 14

Small multiples

Split chart into different subpopulations Goal: compare subpopulations Very little tools offer this functionality!

an exploration of dissemination data: StatMine 15

Small multiples

an exploration of dissemination data: StatMine 16

Composing a chart

Example:• Year x Region x Gender x Age

• Count• Mean income• Employment

Numeric variables / topics

categorical variables / dimensions

an exploration of dissemination data: StatMine 17

Prototype

• Built in php, javascript (d3)• Imported 10 StatLine example tables

• Complex tables, e.g.• Labor participation x gender x cohorts• Labor market flow per quarter (employed/unemployed)• Enterprise birth, death and growth x economic activity x quarter

• Tested on:• Internal users• Owners of data

an exploration of dissemination data: StatMine 18

Demo

an exploration of dissemination data: StatMine 19

Evaluation

• Part I : very succesful• Owners of data want prototype to check their own

data• Provides insights• Easy detection of anomalies

19

an exploration of dissemination data: StatMine 20

Work in progress

20

• II, Combination of different fragments• Testing with policymakers (end this year)• Or “How to glue statistical tables?”

• III, Derive variables + analysis• Absolute vs relative (per population unit)• Turnover / # employees• Etc

an exploration of dissemination data: StatMine 21

Questions?