Download - StatMine, visual exploration of output data

Transcript
Page 1: StatMine, visual exploration of output data

StatMine – prototypeStatMine, an exploration of dissemination data

Edwin de Jonge

Statistics Netherlands

25 September 2012, Seoul

Page 2: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 2

Page 3: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 3

Page 4: StatMine, visual exploration of output data

StatMine, from numbers to analysis 4

Page 5: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 5

Why StatMine?

• Statistics Netherlands (SN) mission produce relevant information for:• Policy makers• Journalists• Citizens• Enterprises• Economists• Social scientists • Etc.

5

Page 6: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 6

Numbers ≠ Information

StatLine is SN’s online DB (over 1 billion figures)

We know from a user study that:

1. Many interesting patterns in StatLine are not spotted by users

2. Many important topics in StatLine are scattered across multiple tables

6

Page 7: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 7

Example of problem 2

• Policymaker interested in patients with diabetes:

• Visits to medical doctor• Hospital admissions• Mortality• Medication consumption (insuline)• Obesity

Are all different statistical products (from different sources)!

Page 8: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 8

Data analysis = Data insight

Goal research project StatMine is to provide data insight by:

• (I) Using data visualisation• (II) Combining data table fragments• (III) Deriving variables

All hypotheses (will be) tested with a prototype with internal and external users.

(I), tested and succesful

(II, III,… ) is work in progress

8

Page 9: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 9

Chart types

Bar chart

Line chart

Mosaic chart

Bubble/scatter chart

Comparison

Development

Structure

Correlation

Page 10: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 10

Chart type – bar chart

Page 11: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 11

Chart type – line chart

Page 12: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 12

Chart type – mosaic chart

Page 13: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 13

Chart type – bubble chart

Page 14: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 14

Small multiples

Split chart into different subpopulations Goal: compare subpopulations Very little tools offer this functionality!

Page 15: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 15

Small multiples

Page 16: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 16

Composing a chart

Example:• Year x Region x Gender x Age

• Count• Mean income• Employment

Numeric variables / topics

categorical variables / dimensions

Page 17: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 17

Prototype

• Built in php, javascript (d3)• Imported 10 StatLine example tables

• Complex tables, e.g.• Labor participation x gender x cohorts• Labor market flow per quarter (employed/unemployed)• Enterprise birth, death and growth x economic activity x quarter

• Tested on:• Internal users• Owners of data

Page 18: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 18

Demo

Page 19: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 19

Evaluation

• Part I : very succesful• Owners of data want prototype to check their own

data• Provides insights• Easy detection of anomalies

19

Page 20: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 20

Work in progress

20

• II, Combination of different fragments• Testing with policymakers (end this year)• Or “How to glue statistical tables?”

• III, Derive variables + analysis• Absolute vs relative (per population unit)• Turnover / # employees• Etc

Page 21: StatMine, visual exploration of output data

an exploration of dissemination data: StatMine 21

Questions?