StatMine, visual exploration of output data

21
StatMine – prototype StatMine, an exploration of dissemination data Edwin de Jonge Statistics Netherlands 25 September 2012, Seoul

description

Presentation given at OECD conference, Seoul 2012.

Transcript of StatMine, visual exploration of output data

Page 1: StatMine, visual exploration of output data

StatMine – prototypeStatMine, an exploration of dissemination data

Edwin de Jonge

Statistics Netherlands

25 September 2012, Seoul

Page 2: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 2

Page 3: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 3

Page 4: StatMine, visual exploration of output data

StatMine, from numbers to analysis 4

Page 5: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 5

Why StatMine?

• Statistics Netherlands (SN) mission produce relevant information for:• Policy makers• Journalists• Citizens• Enterprises• Economists• Social scientists • Etc.

5

Page 6: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 6

Numbers ≠ Information

StatLine is SN’s online DB (over 1 billion figures)

We know from a user study that:

1. Many interesting patterns in StatLine are not spotted by users

2. Many important topics in StatLine are scattered across multiple tables

6

Page 7: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 7

Example of problem 2

• Policymaker interested in patients with diabetes:

• Visits to medical doctor• Hospital admissions• Mortality• Medication consumption (insuline)• Obesity

Are all different statistical products (from different sources)!

Page 8: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 8

Data analysis = Data insight

Goal research project StatMine is to provide data insight by:

• (I) Using data visualisation• (II) Combining data table fragments• (III) Deriving variables

All hypotheses (will be) tested with a prototype with internal and external users.

(I), tested and succesful

(II, III,… ) is work in progress

8

Page 9: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 9

Chart types

Bar chart

Line chart

Mosaic chart

Bubble/scatter chart

Comparison

Development

Structure

Correlation

Page 10: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 10

Chart type – bar chart

Page 11: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 11

Chart type – line chart

Page 12: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 12

Chart type – mosaic chart

Page 13: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 13

Chart type – bubble chart

Page 14: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 14

Small multiples

Split chart into different subpopulations Goal: compare subpopulations Very little tools offer this functionality!

Page 15: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 15

Small multiples

Page 16: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 16

Composing a chart

Example:• Year x Region x Gender x Age

• Count• Mean income• Employment

Numeric variables / topics

categorical variables / dimensions

Page 17: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 17

Prototype

• Built in php, javascript (d3)• Imported 10 StatLine example tables

• Complex tables, e.g.• Labor participation x gender x cohorts• Labor market flow per quarter (employed/unemployed)• Enterprise birth, death and growth x economic activity x quarter

• Tested on:• Internal users• Owners of data

Page 18: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 18

Demo

Page 19: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 19

Evaluation

• Part I : very succesful• Owners of data want prototype to check their own

data• Provides insights• Easy detection of anomalies

19

Page 20: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 20

Work in progress

20

• II, Combination of different fragments• Testing with policymakers (end this year)• Or “How to glue statistical tables?”

• III, Derive variables + analysis• Absolute vs relative (per population unit)• Turnover / # employees• Etc

Page 21: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 21

Questions?