Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29...

46
Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data structure definitions and data file implementation 29 November 2007

Transcript of Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29...

Page 1: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 1Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX training session on basic principles, data structure definitions and data file implementation

29 November 2007

Page 2: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 2Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

A - Introduction

Page 3: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 3Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Purpose of the training session

• Provide understanding of the basic SDMX principles (DSD and Dataset Implementation)

• Provide knowledge to the SDMX Standard and its XML implementation

• Present ESTAT tools as case studies illustrating their scope and usage

Page 4: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 4Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Current practices

Current practices on data and metadata exchange:– Legal Framework (Commission Regulations, Council

Regulations, etc.)– Data and metadata files, questionnaires, quality

reports, etc.– Format (paper form, EDIFACT, XML, Structured Files,

etc.)– Media (Email, file upload, Web-form, removable media,

dial-up, etc.)

Page 5: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 5Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

The need for a standard…

• Enhance electronic data and metadata exchange

• Enhance availability of statistical data and metadata information for the users

• Promote interoperability between different systems

• Improve the quality of transmitted data (Timeliness & Punctuality, Accessibility & Clarity, Accuracy, Comparability)

Page 6: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 6Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX (Statistical Data and Metadata eXchange)

Initiative on the standardisation of the statistical data and metadata exchange process.

• 7 Sponsors (BIS, ECB, ESTAT, IMF, OECD, UN, WB)

• “Push” and “pull” mode

• Use of XML technologies to promote interoperability

• Basic principles: Data Structure Definitions (DSD) & Metadata Structure Definitions (MSD)

SDMX registries

Data on the WEB using SDMX

Page 7: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 7Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX (cont.)• Exchange and Sharing of statistical information

– Statistical data

– Statistical metadata Structural metadata

Reference metadata

• Emphasis on macro-data (aggregated statistics)

• Promotes a “data sharing” model – low-cost

– high-quality of transmitted data

– interoperability between (otherwise) incompatible systems

Page 8: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 8Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

B – SDMX Core Elements

SDMX Training29 November 2007

Page 9: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 9Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Year MonthTurnover

index Status Confidentiality2002 January 84.5 actual free2002 February 85.6 actual free2002 March 95.4 actual free2002 April 106.2 actual free2002 May 98.0 actual free2002 June 95.3 actual free2002 July 105.4 actual free2002 August 107.1 actual free

2002 September 105.2 actual free2002 October 109.4 actual free2002 November 104.5 actual free2002 December 111.9 actual free2003 January 89.1 provisional free2003 February 88.3 provisional free2003 March 96.1 provisional free

Source: National Statistical Service of GreeceData prepared to be transmitted to the European Commission (including EUROSTAT)

Table 1. Deflated turnover index (on volume of sales) for retail trade for Greece (no adjustment). Reference period: January 2002 to March 2003.

(monthly data - Base year: 2000)

EXAMPLE

DATASET1

Page 10: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 10Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Male Female

131 35 16624057 23871 4792829400 28345 577454799 4187 8986

2572350 2683230 52555802562077 2674534 5236611

17423319

10837 10581 214186038 6331 12369

Number1338329283

Rate1.8

Years82.3 75.5 78.3

Data prepared to be transmitted to the European Commission (including EUROSTAT)

Life expectancy at birth

Number of persons

Marriages

Total fertility rate

ImmigrantsEmigrants

Divorces

Population on 01/01/2006Population on 01/01/2005Deaths under 1 yearBirths outside marriage

Statistical adjustmentDeathsBirthsNet migration

SexTotal

Demographic Characteristic

Demography Rapid Questionnaire_Table RQFI05V1. Data for Finland. Reference period: January to December 2005 (annual,

provisional data - 1st revision).

EXAMPLE

DATASET2

Page 11: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 11Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX Information Model

• The SDMX Information Model (SDMX-IM) is a conceptual model from which syntax specific implementations are developed.

• The SDMX-IM provides for the structuring not only of data, but also of “reference” metadata!

• The model is constructed as a set of structures which assist in the understanding, re-use and maintenance of the model.– Data Structure Definition and Metadata Structure Definition– Dataflows - Datasets – Data Provisioning– …

Page 12: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 12Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Structures in the SDMX-IM

Structure Components

Concept Scheme Concept

Code List Code

Category Scheme Category

Organisation Scheme • Organisation

• Organisation Role

- DataProvider

- DataConsumer

- MaintainanceAgency

Data Structure Definition (DSD) • Dimensions

• Attributes

• Measures

• Groups

Page 13: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 13Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Structures in the SDMX-IM (cont.)

Fundamental parts:1. Structural metadata (DSD, concepts, code lists)2. Observational data (organised set of numeric observations)3. Reference metadata

Definitions:• Data Structure Definition (DSD): set of structural metadata needed

to understand the dataset structure

• Dataflow Definition: a description of the dataset which identifies, categorises and constraints the allowable content of the dataset

• Dataset:– an organised collection of statistical data– the ‘container’ of a Data Flow Definition for an instance of the data.

Page 14: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 14Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Structures in the SDMX-IM (cont.)

• Code lists – Codes: list of predefined values to be used within the DSD– Codelists enumerate a set of values to be used in the representation of several

structural components of SDMX.• Concept Schemes – Concepts: a statistical characteristic used within a

DSD– Additional properties can be defined for concepts:

• Provide Name/Description in various locales• Assign default representation (coded or uncoded)• Define semantic hierarchies of concepts

• Category Schemes – Categories: Category schemes are made up of a hierarchy of categories (subject matter domains), which in SDMX may include any type of useful classification for the organization of data and metadata– A Dataflow may be linked to many Categories

Page 15: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 15Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

DSD components

• Dimension (e.g. frequency, reference area):– Classificatory variable used for identification of subsets or single

observations– Definition of the key descriptor for reporting Datasets

• Attribute (e.g. title, observation status):– Add additional metadata about the observations– Can be attached at four possible levels (Observation, Time Series /

Cross-Sectional data, Group, Data Set)• Measure (e.g. turnover index, # of births, # of deaths):

– Data (uncoded / unclassified) that can be reported (The observation value)

– Primary (Time Series) or Cross-Sectional (Cross-sectional data)• Groups:

– Grouping of dimensions in order to attach group attributes (e.g. sibling group)

Page 16: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 16Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Data Structure Definition

Examples:– Time Series dataset

• STS domain: Turnover Index for Retail Trade and repair DSD

– Cross-Sectional dataset• Demography domain: Rapid questionnaire DSD

Page 17: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 17Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Year MonthTurnover

index Status Confidentiality

2002 January 84.5 actual free

2002 February 85.6 actual free2002 March 95.4 actual free2002 April 106.2 actual free2002 May 98.0 actual free2002 June 95.3 actual free2002 July 105.4 actual free2002 August 107.1 actual free2002 September 105.2 actual free2002 October 109.4 actual free2002 November 104.5 actual free2002 December 111.9 actual free2003 January 89.1 provisional free

2003 February 88.3 provisional free2003 March 96.1 provisional free

Source: National Statistical Service of GreeceData prepared to be transmitted to the European Commission (including EUROSTAT)

Table 1. Deflated turnover index (on volume of sales) for retail trade for Greece (no adjustment). Reference period: January 2002 to March 2003.

(monthly data - Base year: 2000)

STS Sample Dataset

Dimensions

Measure

Attributes

Dimensions

Page 18: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 18Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

STS DSD componentsDataflow: STSRTD_TURN_M

Concept Concept ID Code List Valuereference period TIME_PERIOD Month/Yearreporting country REF_AREA CL_AREA_EE EL - Greecebase year STS_BASE_YEAR CL_STS_BASE_YEAR 2000type of index STS_INDICATOR CL_STS_INDICATOR TOVV - Turnover deflated (volume of sales)activity STS_ACTIVITY CL_STS_ACTIVITY Retail tradeadjustment ADJUSTMENT CL_ADJUSTMENT No (Neither seasonally or working day adjusted)frequency FREQ CL_FREQ monthly datatitle TITLE Title of the exchanged datasetstatus OBS_STATUS CL_OBS_STATUS actual/provisional dataconfidentiality OBS_CONF CL_OBS_CONF Free (free of publication data)decimals DECIMALS CL_DECIMALS 1 - One

Measures Turnover index OBS_VALUE observations

Groups Time series Set of ordered monthly data (01/02-12/02)

Dimensions

Attributes

Page 19: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 19Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Male FemaleNumber of persons

Statistical adjustment 131 35 166

Deaths 24057 23871 47928

Births 29400 28345 57745

Net migration 4799 4187 8986

Population on 01/01/2006 2572350 2683230 5255580

Population on 01/01/2005 2562077 2674534 5236611

Deaths under 1 year 174

Births outside marriage 23319

Immigrants 10837 10581 21418

Emigrants 6038 6331 12369Number

Divorces 13383

Marriages 29283Rate

Total fertility rate 1.8Years

Life expectancy at birth 82.3 75.5 78.3

Data prepared to be transmitted to the European Commission (including EUROSTAT)

Demography Rapid Questionnaire_Table RQFI05V1. Data for Finland. Reference period: January to December 2005 (annual provisional

data - 1st revision).

Demographic Characteristic

Sex

Total

Demography Sample DatasetM

ea

su

res

Dimensions

Attributes

Page 20: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 20Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Demography DSD componentsDataflow: DEMOGRAPHY_RQ

Concept Concept ID Codel List Valuesreference period TIME_PERIOD 01-2005 to 12-2005reporting country COUNTRY CL_COUNTRY Finlandsex SEX CL_SEX male / femaledeomographic characteristic DEMO CL_DEMO # of births, # of deaths etc.frequency FREQ CL_FREQ annual datatitle TITLE Title of the exchanged datasetstatus OBS_STATUS CL_OBS_STATUS provisional datareference table TAB_NUM RQFI05V1version REV_NUM 1st revisionStaistical adjustment ADJT number of personsdeaths DEATHST number of personsbirths LBIRTHST number of personsnet migration NETMT number of personspopulation on 01/01/06 PJAN1T number of personspopulation on 01/01/05 PJANT number of personsdeaths under 1 year DEATHUN1 number of personsbirths outside marriage LBIRTHOUT number of personsimmigrants IMMIT number of personsemigrants EMIGT number of personsdivorces DIV pure numbermarriages MAR pure number

total fertility rate TFRNSI decimal indexlife expectancy at birth LEXPNSIT number of years

Groups SectionSet of annual demographic characteristics from FI (01/05-12/05)

Measures

Dimensions

Attributes

Page 21: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 21Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Data Provisioning

• A Data Provider can provide data/metadata for many Dataflows using an agreed data structure.

• Dataflows may incorporate data coming from more than one Data Provider.

• Provision Agreement which data providers are supplying what data to which data flows.

•The Dataflow may be linked to 1 or more Categories (subject matter domains) from different Category Schemes.

Page 22: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 22Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Identification, Versioning & Maintenance

• Identification: every structural element must have a semantic identifier (e.g. CL_UNIT)

• Versioning: a specific element may have different versions (updates of the element)

• Maintenance: some structures must be maintained by an organisation – Unique identification: id+version+agency

• id: CL_UNIT version:1.0 agency: ESTAT• id: CL_UNIT version:1.0 agency: ECB

• Internationalization: the use of multiple languages for describing any element

• SDMX-IM covers aggregate data and metadata in all domains (not domain-specific)

Page 23: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 23Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX High level View

CategoryScheme

Data or Metadata Structure Definition

Category

can have child categories

comprises subject or reporting categories

Data or Metadata

Flow

Data Provider

Provision Agreement

uses specific data/metadata structure

can be linked to categories in multiple category schemes

conforms to business rules of the data/metadata flow

can get data from multiple data providers

can provide data or metadata for many data or metadata flows using agreed data or metadata structure

is registered forRegistered

Data or MetadataSet

Data or Metadata

Set

Page 24: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 24Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Tools Demonstration

Page 25: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 25Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX Registry

• A repository for keeping– Structural metadata (e.g. CodeLists, ConceptSchemes,

DSDs)– Provisioning information (e.g. Dataflows, Provision

agreements)

• Repository is accessible via a Web Service accepting SDMX-ML messages

• Graphical User Interface (GUI) for user interaction over the Web

Page 26: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 26Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Data Structure Wizard

• DSW – “standalone” application (replacing AccessDB tool)

• Main functionalities– Manage data structures (create, modify, delete, query)

– Import/Export SDMX-ML structures (validate structure messages)

– Import/Export GESMES/TS structure files

– Create Data messages

– Query SDMX Registry

– Submit data structures to SDMX Registry

Page 27: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 27Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Example - DSD creation using the DSW

Page 28: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 28Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Example• Dimensions

– Frequency (CL_FREQ) – Reference Area (CL_AREA_EE)– Time period– Product (CL_PRODUCT)

• Attributes– Compilation (uncoded, @group)– Confidentiality (CL_OBS_CONF, @observation)– Status (CL_OBS_STATUS, @observation)– Availability (CL_AVAILABILITY, @series)

• Group

Page 29: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 29Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

C – SDMX-ML Data sets

SDMX Training29 November 2007

Page 30: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 30Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Syntaxes for SDMX data

• Based on a common Information Model– SDMX-EDI (GESMES/TS)

• EDIFACT syntax• Time series oriented – One format for Data Sets

– SDMX-ML• XML syntax• Four different formats for Data Sets• Easier validation (XML based)

• Tools enable us to use the desired format

Page 31: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 31Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX-ML Data Messages

Equivalent representations for reporting Datasets:– Generic message: one schema, not domain-specific– Compact message: format for large-volume

exchange of data, schema is specific to a DSD– Utility message: format for advanced validation,

schema is specific to a DSD– Cross-Sectional message: format for non-time-

series data, schema is specific to a DSD

Page 32: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 32Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

The SDMX-ML Time-Series format

• Used for representing time-series data

• Contain related metadata as defined in DSDs

• Three different (equivalent) representations available– Generic message– Compact message– Utility message

Page 33: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 33Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Generic DatasetTime Data Status2002-01 84.5 a2002-02 85.6 a2002-03 95.4 a2002-04 106.2 a2002-05 98 a2002-06 95.3 a2002-07 105.4 a2002-08 107.1 a2002-09 105.2 a2002-10 109.4 a2002-11 104.5 a2002-12 111.9 a2003-01 89.1 p2003-02 88.3 p2003-03 96.1 pa:actual

p:provisional

Table 1. Deflated turnover index for retail trade and repair based on volume of sales for Greece (no adjustment). Reference period: January 2002 to March 2003. (Base year: 2000)

Page 34: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 34Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Compact Dataset

Time Data Status2002-01 84.5 a2002-02 85.6 a2002-03 95.4 a2002-04 106.2 a2002-05 98 a2002-06 95.3 a2002-07 105.4 a2002-08 107.1 a2002-09 105.2 a2002-10 109.4 a2002-11 104.5 a2002-12 111.9 a2003-01 89.1 p2003-02 88.3 p2003-03 96.1 pa:actual

p:provisional

Table 1. Deflated turnover index for retail trade and repair based on volume of sales for Greece (no adjustment). Reference period: January 2002 to March 2003. (Base year: 2000)

Page 35: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 35Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Utility Dataset

Time Data Status2002-01 84.5 a2002-02 85.6 a2002-03 95.4 a2002-04 106.2 a2002-05 98 a2002-06 95.3 a2002-07 105.4 a2002-08 107.1 a2002-09 105.2 a2002-10 109.4 a2002-11 104.5 a2002-12 111.9 a2003-01 89.1 p2003-02 88.3 p2003-03 96.1 pa:actual

p:provisional

Table 1. Deflated turnover index for retail trade and repair based on volume of sales for Greece (no adjustment). Reference period: January 2002 to March 2003. (Base year: 2000)

Page 36: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 36Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

The SDMX-ML Cross-Sectional data format

• Used for representing non time-series data

• Contain related metadata as defined in DSDs

• Two different representations available– Generic message– Cross-Sectional message

Page 37: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 37Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Cross-Sectional Dataset

Topic Male Female Total

Statistical adjustment 131 35 166

Deaths 24057 23871 47928

Births 29400 28345 57745

Net migration 4799 4187 8986

Population on 01/01/2006 2572350 2683230 5255580

Population on 01/01/2005 2562077 2674534 5236611

Deaths under 1 year 174

Births outside marriage 23319

Immigrants 10837 10581 21418

Emigrants 6038 6331 12369Divorces 13383

Marriages 29283

Total fertility rate 1.8Life expectancy at birth 82.3 75.5 78.3

Demography Rapid Questionnaire_Table RQFI05V1. Data for Finland. Reference period: January to December 2005 (revised annual provisional data).

Page 38: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 38Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Conversions• Equivalent formats

– Can convert from any SDMX-ML format to another

– Based on the same IM– Exceptions:

• If a Cross-Sectional DSD does NOT contain time dimension

– Conversions:• Between the SDMX-ML formats

• Can be expanded to other formats (e.g. CSV, GESMES)

Page 39: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 39Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

D – Producing SDMX-ML Data sets

SDMX Training29 November 2007

Page 40: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 40Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Reporting and Dissemination Guidelines

• Define and classify all the underlying concepts of a dataset• Provide the specification of the DSD:

– Name & identifier

– List of statistical concepts

– List of metadata concepts

– List of code lists

• Provide the related Dataflows (e.g. STSRTD_TURN_M,

DEMOGRAPHY_RQ)• List the Mandatory attributes (e.g. reference area, frequency), and

the Conditional ones

Page 41: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 41Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Message Implementation Guidelines (MIG)

• Comprises:– DSD details (id, version, agencyID)

– Dimensions (concepts, representations, dimension types -e.g. frequency, entity, count, etc.-, attachment level )

– Measure (primary or cross-sectional)

– Attributes (concept, representation, assignment status –mandatory or conditional-, attachment level, attribute type, attachment measure)

– Groups (subset of dimensions)

Page 42: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 42Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Structure of a MIG document

1. DSD table

2. Dataflows table

3. Referenced concept schemes

4. Referenced Code Lists

5. Detailed explanation of the Generic SDMX-ML sample dataset

6. Detailed explanation of the Compact (or Cross-Sectional) SDMX-ML sample dataset

Page 43: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 43Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Example - Data Set creation using the DSW

Page 44: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 44Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX Converter• Main Functionality

– Reading the input message• parsing of the message • populating the data model of the tool (based on the

SDMX v2.0 information model )– Writing the converted message

• uses the data model to write the output message in the required target format.

• Information retrieved from the Registry– Data flow ID is used to retrieve the data flow definition

from the Registry. – The DSD is retrieved from the data flow definition and is

used to acquire the DSD

Page 45: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 45Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

SDMX Converter (cont.)• Tool utility:

– You may already have data in other format than SDMX-ML (e.g. CSV, GESMES/TS)• CSV Compact SDMX-ML

– You may want further validation of your data• Compact SDMX_ML Utility SDMX_ML

• Conversions:– From CSV to any type– From SDMX-ML to any type– From SDMX-EDI to any type

Page 46: Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Slide 46Eurostat Unit B3 – Statistical Information TechnologiesSDMX Training for users 29 November 2007

Conversion Example