Combining Data from Different Sources and Modes · –Health statistics (medical registration)...

32
Combining Data from Different Sources and Modes Introduction Tbilisi, Georgia - 22-26 October 2018 1

Transcript of Combining Data from Different Sources and Modes · –Health statistics (medical registration)...

Page 1: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Combining Data from Different

Sources and Modes

Introduction

Tbilisi, Georgia - 22-26 October 2018

1

Page 2: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

2

(Agenda to be added later)

Page 3: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Why are we here?

• Continuous professional development!

– To provide a coherent conceptual framework

– To kindle an awareness of methodological issues and

challenges

– To fortify a scientific attitude to statistical production

– Understanding of potential errors and statistical uncertainty

– Ability to apply relevant concepts in practice

– Appreciation of future opportunities and obstacles

3

Page 4: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Official statistics:

A statistical system!

4

Page 5: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

5

Page 6: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

6

Page 7: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

The Statistics Act

7

Page 8: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Statistics and analyses for the benefit of society

Official statistics are the nation’s shared factual basis and are essential for a living democracy. Statistics are vital to effective planning, evaluation, debate and research.

Official statistics are a public good that everyone shall have equal access to.

Society’s needs are of central importance to the content of the statistics. The production of statistics is increasingly governed by EU legislation …

… an obligation to adhere to the European Statistics Code of Practice, which sets out principles for how the statistics should be

compiled and disseminated.

8

Page 9: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

European Statistics

Code of Practice

9

Page 10: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

10

Page 11: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Handbook for Quality reports & the Quality

assurance framework of the European

Statistical system

11

Page 12: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

The Generic Statistical Business Process Model (UNECE)

12

Page 13: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

13

Page 14: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

“Found” data and data collection:

A distinction

14

Page 15: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

• “Found” data

– Administrative data

• Reusing data from other official/government institutions

– End users system

– Transaction data from private sector

– Published prices etc

– Big data (organic data)

15

Page 16: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

16

Administrative data

• Administrative data are collected primarily for non-statistical

purposes, and adopted for producing statistics

• Have been summarized for centuries…

Page 17: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

17

Page 18: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Administrative data (Nordbotten 2010)

• Application in present statistical production

– Controlling the process of statistical data and quality

evaluation of final products

– Producing new statistical product either separately or in

combination with data from multiple sources

– Preparing improved collection frames for sample surveys

and censuses

18

Page 19: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Administrative data (Nordbotten 2010)

• Examples of present application– Census statistics

– Population statistics

– Foreign trade statistics

– Income statistics (taxation data)

– Social statistics (registration of public services)

– Employment statistics (unemployment registration)

– Education statistics (registration of students)

– Health statistics (medical registration)

– Criminal statistics (judicial registration)

– Business statistics (enterprise registers)

19

Page 20: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

The Statistical archive system (Nordbotten 2010)

20

Page 21: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

21

End user system data capture of price data

Chain store n

Chain store 1 Data reception

SSB

Production

system SSB

• Stat. Norway creates a questionnaire and an XSD (XML Schema Definition) in SERES (Semantics Register for Electronic Services).

• This solution gives immediate feedback to the enterprises system of any errors in the data.

• Because the staff in the enterprise use their own local and familiar computer systems, little instructions is needed before use.

Page 22: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Transaction data from the private

sector (Amdam 2017)

22

Card

transactions

Purchase transaction data Payment transaction data

Barcode data

Receipt data

Transaction data

Account

transactions

Page 23: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Published prices – manual capture or using

web scraping

23

https://www.norwegian.no/booking/fly/v

elgflyvning/?

D_City=OSLALL&

A_City=LPA&

TripType=1&

D_Day=04&

D_Month=201711&

D_SelectedDay=04&

R_Day=04&

R_Month=201711&

R_SelectedDay=04&

AgreementCodeFK=-1&

CurrencyCode=NOK&

rnd=50997&

processid=59704&

mode=ab

https://www.norwegian.no/booking/fly/velgflyvning/?D_City=OSLALL&A_City=LPA&TripType=1&D_Day=04&D_Month=201711&D_SelectedDay=04&R_Day

=04&R_Month=201711&R_SelectedDay=04&AgreementCodeFK=-1&CurrencyCode=NOK&rnd=50997&processid=59704&mode=ab

Page 24: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Big/organic/whatever data

24

Page 25: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

“Found” data• Legal aspect

– Who owns the data you have “found”?

– Is it legal to re use for another purpose?

• Unique and permanent identifier?– Is there one unique and permanent identifier used

throughout the society for e.g. persons and businesses?

• Political aspect– Is it safe to reuse data, can it be misused?

– “What if Hitler had more registers…”

– Example of misuse in Norway during WW2, Jews identified.

25

Page 26: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

“Found” data

• Response burden (chain store internal systems)

– If the NSI imposes an extra variable not really needed by respondents themselves, it can significantly increase the burden on businesses and institutions.

• Quality aspect

– Relevance is the main problem• Can often be something a bit on the side of what we really

need in Official statistics

– NSI have little control over the processing/making of the data

26

Page 27: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Data collection

• Statistical data collected for the purpose of preparing

statistics and are in general not available for any other

purpose

• Direct collection by the NSI

– Surveys

– Census

27

Page 28: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

28

Face to face

TelephonePaper

and

pencil

Web

Data collection Modes:

28

Page 29: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

BLUE-ETS final Conference, 8 March 2013, Brussels 29

Metadata

Management Editing

• Joint reporting system for all

governmental institutions in Norway

29

Page 30: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

Data capture replacing data collection? (Nordbotten 2010)

30

Page 31: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

«It is only recently that our eyes have opened to the fruitful idea of

using basic survey methods in combination. We have been to

blinded from looking upon them as exclusive alternatives to observe

that they may be applied as supplementary parts of single

investigations»

Stanley Payne 1964

31

Page 32: Combining Data from Different Sources and Modes · –Health statistics (medical registration) –Criminal statistics (judicial registration) –Business statistics (enterprise registers)

• “Found” data

– Administrative data

• Reusing data from other official/government institutions

– End users system (SFTP, XML)

– Transaction data from private sector

– Published prices etc

– Big data, organic data

• Data collection

– Direct collection by the NSI

• Face to face interview

• Telephone interview

• Self-completion surveys by paper

• Self-completion surveys by web

32

1. Which of these do you use in your own

production of statistics?

2. If we meet again in 10 years, which will

be the most prominent? Justify your

answer.