Combining Data from Different Sources and Modes · –Health statistics (medical registration)...
Transcript of Combining Data from Different Sources and Modes · –Health statistics (medical registration)...
Combining Data from Different
Sources and Modes
Introduction
Tbilisi, Georgia - 22-26 October 2018
1
2
(Agenda to be added later)
Why are we here?
• Continuous professional development!
– To provide a coherent conceptual framework
– To kindle an awareness of methodological issues and
challenges
– To fortify a scientific attitude to statistical production
– Understanding of potential errors and statistical uncertainty
– Ability to apply relevant concepts in practice
– Appreciation of future opportunities and obstacles
3
Official statistics:
A statistical system!
4
5
6
The Statistics Act
7
Statistics and analyses for the benefit of society
Official statistics are the nation’s shared factual basis and are essential for a living democracy. Statistics are vital to effective planning, evaluation, debate and research.
Official statistics are a public good that everyone shall have equal access to.
Society’s needs are of central importance to the content of the statistics. The production of statistics is increasingly governed by EU legislation …
… an obligation to adhere to the European Statistics Code of Practice, which sets out principles for how the statistics should be
compiled and disseminated.
8
European Statistics
Code of Practice
9
10
Handbook for Quality reports & the Quality
assurance framework of the European
Statistical system
11
The Generic Statistical Business Process Model (UNECE)
12
13
“Found” data and data collection:
A distinction
14
• “Found” data
– Administrative data
• Reusing data from other official/government institutions
– End users system
– Transaction data from private sector
– Published prices etc
– Big data (organic data)
15
16
Administrative data
• Administrative data are collected primarily for non-statistical
purposes, and adopted for producing statistics
• Have been summarized for centuries…
17
Administrative data (Nordbotten 2010)
• Application in present statistical production
– Controlling the process of statistical data and quality
evaluation of final products
– Producing new statistical product either separately or in
combination with data from multiple sources
– Preparing improved collection frames for sample surveys
and censuses
18
Administrative data (Nordbotten 2010)
• Examples of present application– Census statistics
– Population statistics
– Foreign trade statistics
– Income statistics (taxation data)
– Social statistics (registration of public services)
– Employment statistics (unemployment registration)
– Education statistics (registration of students)
– Health statistics (medical registration)
– Criminal statistics (judicial registration)
– Business statistics (enterprise registers)
19
The Statistical archive system (Nordbotten 2010)
20
21
End user system data capture of price data
Chain store n
Chain store 1 Data reception
SSB
Production
system SSB
• Stat. Norway creates a questionnaire and an XSD (XML Schema Definition) in SERES (Semantics Register for Electronic Services).
• This solution gives immediate feedback to the enterprises system of any errors in the data.
• Because the staff in the enterprise use their own local and familiar computer systems, little instructions is needed before use.
Transaction data from the private
sector (Amdam 2017)
22
Card
transactions
Purchase transaction data Payment transaction data
Barcode data
Receipt data
Transaction data
Account
transactions
Published prices – manual capture or using
web scraping
23
https://www.norwegian.no/booking/fly/v
elgflyvning/?
D_City=OSLALL&
A_City=LPA&
TripType=1&
D_Day=04&
D_Month=201711&
D_SelectedDay=04&
R_Day=04&
R_Month=201711&
R_SelectedDay=04&
AgreementCodeFK=-1&
CurrencyCode=NOK&
rnd=50997&
processid=59704&
mode=ab
https://www.norwegian.no/booking/fly/velgflyvning/?D_City=OSLALL&A_City=LPA&TripType=1&D_Day=04&D_Month=201711&D_SelectedDay=04&R_Day
=04&R_Month=201711&R_SelectedDay=04&AgreementCodeFK=-1&CurrencyCode=NOK&rnd=50997&processid=59704&mode=ab
Big/organic/whatever data
24
“Found” data• Legal aspect
– Who owns the data you have “found”?
– Is it legal to re use for another purpose?
• Unique and permanent identifier?– Is there one unique and permanent identifier used
throughout the society for e.g. persons and businesses?
• Political aspect– Is it safe to reuse data, can it be misused?
– “What if Hitler had more registers…”
– Example of misuse in Norway during WW2, Jews identified.
25
“Found” data
• Response burden (chain store internal systems)
– If the NSI imposes an extra variable not really needed by respondents themselves, it can significantly increase the burden on businesses and institutions.
• Quality aspect
– Relevance is the main problem• Can often be something a bit on the side of what we really
need in Official statistics
– NSI have little control over the processing/making of the data
26
Data collection
• Statistical data collected for the purpose of preparing
statistics and are in general not available for any other
purpose
• Direct collection by the NSI
– Surveys
– Census
27
28
Face to face
TelephonePaper
and
pencil
Web
Data collection Modes:
28
BLUE-ETS final Conference, 8 March 2013, Brussels 29
Metadata
Management Editing
• Joint reporting system for all
governmental institutions in Norway
29
Data capture replacing data collection? (Nordbotten 2010)
30
«It is only recently that our eyes have opened to the fruitful idea of
using basic survey methods in combination. We have been to
blinded from looking upon them as exclusive alternatives to observe
that they may be applied as supplementary parts of single
investigations»
Stanley Payne 1964
31
• “Found” data
– Administrative data
• Reusing data from other official/government institutions
– End users system (SFTP, XML)
– Transaction data from private sector
– Published prices etc
– Big data, organic data
• Data collection
– Direct collection by the NSI
• Face to face interview
• Telephone interview
• Self-completion surveys by paper
• Self-completion surveys by web
32
1. Which of these do you use in your own
production of statistics?
2. If we meet again in 10 years, which will
be the most prominent? Justify your
answer.