Eurostat
November 2015Eurostat Unit B3 – IT and standards for data and metadata exchange
Jean-Francois LEBLANCChristian SEBASTIAN
SDMX IT ToolsIntroduction
Eurostat
Table of contents
1. Where are we?2. Standardization3. Why do we need a model?4. GSBPM Generic Statistical Business Process Model
1. Phases2. Key features3. Other uses
5. Standards – Relations6. GSIM Generic Statistical Information Model
2
Eurostat
Table of contents
7. SDMX & DDI8. SDMX
1. Why?2. Benefits3. Costs4. Opportunities5. Impacts6. From 1.0 to 2.17. The SDMX components8. SDMX in practice
9. Summary3
Eurostat
1. Where are we?
• Dramatic changes in the environment of official statistics producers (e.g. data deluge)
• Modernization of statistical information system seen as a question of survival for the sector of official statistics
• Standardization viewed as a key enabler for modernization
• "Standards-based” industrialization of statistical production
4
Eurostat
2. Standardization
• Why is it necessary? • Harmonization• Reusability and interoperability• Shared solutions across statistical institutes
• What does it imply?• Common processes• Common tools• Common methodologies
5
Eurostat
2. Standardization
• Industry Standards• GSBPM - Generic Statistical Business Process Model• GSIM - Generic Statistical Information Model• SDMX - Statistical Data and Metadata eXchange• DDI - Data Documentation Initiative
• Other major standards • RDF - Resource Description Framework• LOD - Linked Open Data• JSON - JavaScript Object Notation• XBRL - eXtensible Business Reporting Language
GSBPM
GSIMSDMX
DDI
6
Eurostat
3. Why do we need a model?
• To define and describe statistical processes in a coherent way
• To standardize process terminology• To compare and benchmark processes within and
between organisations• To identify synergies between processes• To inform decisions on systems architectures and
organisation of resources
7
Eurostat
4. GSBPM Generic Statistical Business Process Model
• Applicable to all activities undertaken by producers of official statistics -> data outputs
• Used by National and international statistical organisations
• Independent of data source, can be used for:• Surveys / censuses• Administrative sources / register-based statistics• Mixed sources
8
Eurostat
4.1 GSBPM - Phases
9
Eurostat
4.2 GSBPM – Key features
Not a linear model• Sub-processes do not have to be followed in a
strict order• It is a matrix with many possible paths, including
iterative loops within and between phases• Some iterations of a regular process may skip
certain sub-processes
10
Eurostat
4.3 GSBPM – Other uses
• Harmonizing statistical computing systems • Facilitating sharing of statistical software• Framework for process quality management• Structure for storage of documents • Measuring operational costs
11
Eurostat
5. Standards - Relations
Statisticsproduction
GSBPM GSIM
TechnologyMethods
Conceptual
Practical
SDMX, DDI, RDF, ISO-11179, …
Informationconcepts
Statisticalconcepts
Statisticalhow-to
Productionhow-to
12
Eurostat
6. GSIM Generic Statistical Information Model
GSIM
Other standard
s
DDI
SDMX
Implementationstandards
Conceptualmodel
13
Eurostat
7. SDMX & DDI
• DDI offers a very rich model for the documentation of micro-data
• SDMX offers a very integrated exchange platform for statistical outputs (IT architectures, tools, web services)
integration of the complete production process
The combined use of both standards could allow a higher level of
14
Eurostat
SDMX
8. SDMX Statistical Data and Metadata eXchange
World Bank
UNSD
15
Eurostat
8.1 SDMX – Why?
• The exchange of statistical data and metadata is complex, resource intensive and expensive
• In the past, national and international organisations had developed specific approaches and solutions
• Opportunities and challenges related to new technologies for machine to machine exchange were coming up, e.g. XML, web services.
SDMX is the global answer to this.
16
Eurostat
8.2 SDMX - Benefits
• Efficiency• Reduced burden after low investment• Consistent and comparable data and metadata messages
produced by different organizations• Harmonized statistical processes, offering new ways of data
and metadata exchange (such as data hubs) • Web-based dissemination formats are provided that are
computer “readable” and easier to update.
17
Eurostat
8.3 SDMX - Costs
• Development/maintenance of the SDMX standards and guidelines done by the international sponsoring institutions (supported by NSIs)
• Standards are public and open source
• IT tools are created by sponsoring or other organizations and made freely available
• Capacity building by individual sponsoring institutions
• User community input by means of open process
• Low investment cost – gradual implementation
18
Eurostat
8.4 SDMX - Opportunities
• Across domains
• Across organizations
19
Simplification
StandardizationHarmonization
• Streamline data flows• Central management
(SDMX Registry)
• Software tools• Data sharing• Data structures
• Concepts• Code lists
Eurostat
8.5 SDMX - Impacts
• Reduced reporting burden via common formats adopted by international organizations for data and metadata exchange
• User-friendly access when publishing national dataand metadata on the web via global standards for data formats, catalogs/registries and associated services
• Improved management and analysis of data via global guidelines for metadata vocabularies and repositories in common formats
• Replicable models and tools for statistical information systems at national levels
20
Eurostat
8.6 SDMX – From 1.0 to 2.1
21
Version 1.0
Version 2.0
Version 2.1
SDMX recognised and supported as the preferred standard
2008SDMX accepted at UN level
September 2004 February 2008 April2011 November 2005
Version 2.0
SDMX-EDISDMX-MLSDMX Registry
Version 1.0
GESMES/TS
Eurostat
8.7 The SDMX Components
Describe statistics in a standard way Objects and their relationships
Data Structure Definition (DSD), Concepts, Code List
Central management and standard access SDMX Registry, SDMX Web Services
Cross Domain Concepts Cross Domain Code Lists Statistical Domains Metadata Common Vocabulary
Push Provider generates and sends file to receiver
Pull Provider opens web service to data Receiver downloads regularly
Hub Special case of pull: receiver downloads on end user request
22
Eurostat
9. Summary
• To enable a modernized statistical production, standards are the key
• Standards at different levels are being used in an increasingly coherent way
• GSBPM and GSIM provide conceptual models and facilitate communication
• SDMX, DDI and other standards provide implementation models which can be used in a coordinated way
• There are now more technologies than just GESMES and XML: a coherent overall model is critical
23
Eurostat
24
Introduction
Top Related