ESSnet ON MICRO DATA LINKING AND DATA WAREHOUSING IN STATISTICAL PRODUCTION
RESULTS OF STOCKTAKING,CONCLUSIONS OF FIRST YEAR
*Pieter Vlag
Senior Statistical Researcher – Statistics [email protected]
ESSnet DWH: Main conclusions first year 2
Contents
• Answers on questionnaire• Results of visit to Statistics Finland• Results of visit to CSO-Ireland• Conclusions of the ESSnet DWH - group• Implications for work in 2012/2013
ESSnet DWH: Main conclusions first year 3
Questionnaire
• Send to all National Statistical Institutes of the ESS and Switserland
• 24 NSIs responded• Respons is representative (no specific group of
countries missing)• In interpretation, distinction between questions on
opportunities/barriers
implementation
definition DataWareHouse
4
Answers on questionnaire (opportunities/barries)
• Do you think that the results of this ESSnet are useful for your work ?
ESSnet DWH: Main conclusions first year
5
Answers on questionnaire (opportunities/barriers)
• What do/did you see as the main motivation to start DWH in your business statistics systems ?
ESSnet DWH: Main conclusions first year
> 1 answer per NSI
6
Answers on questionnaire (opportunities/barriers)
• What do you see as the main general methodological barriers to implementing an integrated system ?
ESSnet DWH: Main conclusions first year
> 1 answer per NSI
7
Answers on questionnaire (opportunities/barriers)
• What do you see as the main technical methodological barriers to implementing an integrated system ?
ESSnet DWH: Main conclusions first year
> 1 answer per NSI
8
Answers on questionnaire (opportunities/barriers)
• What do you see as the main IT barriers to implementing an integrated system ?
ESSnet DWH: Main conclusions first year
> 1 answer per NSI
9
Answers on questionnaire (implementation)No NSI answers ‘YES’ on all these four questions- Do you have a single coherent system which covers most of your data in the production of business statistics ? - Is your metadata currently integrated into your data systems ?- Is your data input for current needs integrated into your data systems ?- Are your current output requirements integrated into your data systems ?
CONCLUSION: No NSI has a finished DWH system
ESSnet DWH: Main conclusions first year
10
Answers on questionnaire (implementation)
On the other hand, the answers suggest that all responding NSIs are at the stage of •either considering to develop an integrated datawarehouse system •or developing a datawarehouse system •or implementing parts of a (prototype) datawarehouse system
ESSnet DWH: Main conclusions first year
11
Answers on questionnaire (1st conclusions)NSIs
- recognise the opportunities of DWH-systems
- consider the high investments, or investment related issues, as most important barrier.
- are considering or developing DWH-systems.
- mention similar methodological and IT-issues
- expect “sharing knowledge and experiences” as outcome from this ESSnet.
Hence, Business Case for this ESSnet
ESSnet DWH: Main conclusions first year
12
Answers on questionnaire (definition of a DWH)
In questionnaire two extremes presented
- Data model
- Process model
ESSnet DWH: Main conclusions first year
13
Questionnaire (‘process model’ DWH)
ESSnet DWH: Main conclusions first year
In the “process” model perspective, the DWH is primarily a set of databases to store the data between the statistical data-processing steps. Statistical processing (weighting, consistency) is done outside. The DWH system is not primarily designed to produce flexible output, but
more intended to harmonise the statistical processes.
Production processes
Input 1
Input 2
Data warehouse
The ‘process model’ perspective
Output 1
Output 2
Output 3
Known production processes, exploiting synergies or experience
processes
Knowninputs
Knownoutputs
Metadataattached to production process
14
Questionnaire (‘data model’ DWH)
ESSnet DWH: Main conclusions first year
Surveys
Admin data
Register data C
lea
ned
cohe
rent
da
ta s
ourc
es
Data warehouse
The ‘data model’ perspective
Store &process
Coherencywork
Registers
Com
mon
ex
trac
tion
proc
ess
Aggregate statistics
Microdata
Time series
No
n-st
and
ard
de
finiti
ons
in
mu
ltip
le fo
rma
ts,
gen
era
ted
by h
eter
ogen
eou
s pr
oce
sses
Sta
nda
rd f
orm
at,
sta
nda
rd p
roce
ss
Sta
nda
rd f
orm
at,
stan
dard
pr
oces
s, s
tand
ard
var
iabl
es
Ta
ilore
d p
rodu
ctio
n of
m
icro
and
agg
rega
te d
ata
Heterogeneous(unknown?) inputs
Heterogeneous(unknown?) outputs
Metadataattached todata items
In the “data model” model perspective, the DWH is primarily a unit for storing, processing and linking all available data, irrespective of where they have come from or where they are going to. Data acquisition is driven by availability of sources; output production is driven by availability of data in the store. Business registers and metadata have are even more important in these model than in regular statistical processes,
because they are essential for storing, processing, linking and flexible outputdata.
15
Answers on questionnaire (definition of a DWH)
- How would you describe your single conceptual approach ?
ESSnet DWH: Main conclusions first year
16
Answers on questionnaire (definition of a DWH)But,
answers on this question in conflict with
- follow-up inquiries
- follow-up visits
HENCE,
- presented models were multi-interpretative
- a straighter definition of a statistical DWH system was needed.
ESSnet DWH: Main conclusions first year
17
Main conclusion from visit to Statistics Finland (figure)
ESSnet DWH: Main conclusions first year
Input I
Input II
Processing base
ActualDWH
Output I
Output IIEct.
Ect.integrated stat. data
18
Main conclusion from visit to Statistics Finland (in words)
The Statistical DataWareHouse consists of two parts:•A processing (data)base in which all used input data are processed and integrated. •A publication (data)base, used for (micro)analyses and calculation of the aggregates (for publication).
* Data are transferred to the publication base after they have been approved in the processing database.
In contrast to the DataWareHouse concept at commercial enterprises, the processing part is much more emphasized at NSIs
ESSnet DWH: Main conclusions first year
19
Main conclusion from visit to CSO-Ireland (figure)
ESSnet DWH: Main conclusions first year
Input I
Input II
Proc. base
ActualDWH
Output I
Output IIEct.
Ect.integrated stat. data
Proc. base
Proc. base
Architecture for data processing(depending on data ?)
20
Main conclusion from visit to CSO-Ireland (in words)
CSO has two integrated processing systems:
- An older one, in which data are stored after each processing step. This system is used for survey data.
- A newer one (to be implemented), in which admin data are stored one time after performing all processing steps.
A reason for reducing the number of data storages might be related to a less extensive data cleaning for admin data. Hence, nature of the data (survey or admin data) might be a factor when defining a business architecture for the integrated processing system.
ESSnet DWH: Main conclusions first year
21
Main conclusion of the ESSnet DWH group (in figure)
ESSnet DWH: Main conclusions first year
Input I
Input II
Imp.+
aggr ActualDWH
Output I
Output IIEct.
Ect.Integrated
Out of scope Out of scope
Stat BR(pop.
frame)
cleaning
Processing issues
DWH
Confidentiaiity issues
Integrated systems
22
General conclusion of the ESSnet DWH group (in words)
A Statistical DataWareHouse consists of two parts:
Part I• A processing phase in which statistical input data are
- at a 1st stage linked to the Business Register
- at 2nd stage cleaned (between data source)
- at a 3rd stage made consistent between the sources by imputing missing data and correcting for inconsistencies between the sources
before being transferred to the actual DataWare House.
.
ESSnet DWH: Main conclusions first year
23
General conclusion of the ESSnet DWH group (in words)
A Statistical DataWareHouse consists of two parts:
Part II•An actual DataWareHouse from which flexible aggregated and microdata, meant for output, can be generated. These generated aggregated and microdata themselves do not belong to the Statistical DataWareHouse System. The data in this DataWareHouse are completely integrated, interpretation of (the quality of) these data should theoretically be independent of the input source
Part II is more recognisable for commercial enterprises
.
ESSnet DWH: Main conclusions first year
Main conclusion of the ESSnet DWH group (static SBR or SBR integral part of DWH)
ESSnet DWH: Main conclusions first year
Input I
Input II
Imp.+
aggrActualDWH
Output I
Output IIEct.
Ect.
Integrated data
SBR
cleaning
SBR preferably integral part:Feedback from oth. SourcesBut with moderation
feedback
25
Main conclusion of the ESSnet DWH group (metadata)
ESSnet DWH: Main conclusions first year
Input I
Input II
Imp.+
aggrActualDWH
Output I
Output II
Ect.
Ect.Integrated data
SBR
cleaning
Confidentaility issues
Input
Descr.
Process (step) descr. (output)var. descr.
Revenue Agency Chambers
Commerce
Survey NSurvey
1
SBR Customs Agency
Employees Data
Staging Data
SBR
Domains Estimation Univers/Cenus
Primary Micro Data
Staging Data
Data Mart Data Mart
Alimentation: -Extraction -Transformation -Loading
Sources Layer
Integration Layer
Data Access Layer
Interpretation and data analysis layer
Met
a D
ata
Institutional Output
Dashboards
Analysis ReportingData
Mining
Independent process
Inte
grat
ed s
yste
ms
Act
ual D
WH
Relationship with DWH-Architectural models (e.g.Kimball)
27
Implication for work 2012/2013Metadata- Fitting statistical DWH in current metadatamodels.- Keep it manageable !
Methodology
- Fitting current (ESSnet) methodology into stat.DWH
1. data-linking & feedback to BR.
2. (selective ?) editing + (repeated) weighting
3. data confidentiality
IT and Architecture- Fitting ‘methodology’ into ‘adapted GSBPM-model- Relating ‘adapted’ GSBPM to Stat. DWH Architecture
.
ESSnet DWH: Main conclusions first year
ESSnet DWH: Main conclusions first year 28
Summary
• Business Case for ESSnet DWH present• Questionnaire: Sorry for confusing DWH-model extremes.• Visits to Finland and Ireland useful for feedback/ideas ect. • Statistical DWH model developed, consisting of
- part 1: integrated systems
- part 2: actual DataWareHouse• Statistical DWH <> ‘commercial DWs, as more emphasizes on part 1• Actions defined for 2012/2013 on
metadata
methodology
IT and Architecture
Statistical System in the Netherlands 29
Thank you for your attention!
Questions?
Top Related