DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project...

119
Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 Metadata Standards usage and needs in NSIs and Data Archives WORK PACKAGE 7 Standards Development REPORTING PERIOD: From: Month 18 To: Month 36 PROJECT START DATE: 1 st May 2011 DURATION: 48 Months DATE OF ISSUE OF DELIVERABLE: 23 rd July 2013 DOCUMENT PREPARED BY: 7, 2, 5, 6, 9, 15, 26 UGOT, UTA, GESIS, NSD, IAB, Destatis, SCB Combination of CP & CSA project funded by the European Community Under the programme “FP7 - SP4 Capacities” Priority 1.1.3: European Social Science Data Archives and remote access to Official Statistics

Transcript of DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project...

Page 1: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

Project N°: 262608

ACRONYM: Data without Boundaries

DELIVERABLE D7.1

Metadata Standards – usage and needs in NSIs and Data Archives

WORK PACKAGE 7

Standards Development

REPORTING PERIOD: From: Month 18 To: Month 36

PROJECT START DATE: 1st May 2011 DURATION: 48 Months

DATE OF ISSUE OF DELIVERABLE: 23rd July 2013

DOCUMENT PREPARED BY: 7, 2, 5, 6, 9, 15, 26 UGOT, UTA, GESIS, NSD, IAB, Destatis, SCB

Combination of CP & CSA project funded by the European Community Under the programme “FP7 - SP4 Capacities”

Priority 1.1.3: European Social Science Data Archives and remote access to Official Statistics

Page 2: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

1

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 262608 (DwB - Data without Boundaries).

This document has been prepared by: Merja Karjalainen (UGOT-SND), Mari Kleemola (UTA-FSD) and Uwe Jensen (GESIS); with the valuable contributions of Iris Alfredsson (UGOT-SND), Maurice Brandt (DESTATIS), Michelle Coldrey (UGOT-SND), Claus-Göran Hjelm (SCB), Ørnulf Risnes (NSD) and David Schiller (IAB).

Page 3: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

3/119

Table of Contents

INTRODUCTION ....................................................................................................................... 6

1. BACKGROUND INFORMATION .................................................................................................. 8

1.1 Data without Boundaries and Metadata ................................................................................ 8 1.1.1 Integration of Heterogeneous Data Resources ........................................................................ 8 1.1.2 Using Common Metadata for Harmonisation for Data Integration ......................................... 9 1.1.3 Metadata at DAs and CESSDA – Inspiration for an OS Microdata Search Portal? ................... 9

1.2 European Organisations Influence over the NSIs and DAs ..................................................... 10 1.2.1 The Research Community ...................................................................................................... 10 1.2.2 CESSDA and the Data Archives ............................................................................................... 11 1.2.3 ESS/Eurostat and the NSIs ...................................................................................................... 13

1.3 Definition of Metadata ........................................................................................................ 13

1.4 Semantic Integration and Conceptual Models ...................................................................... 15

2. METADATA STANDARDS ...................................................................................................... 17

2.1 Metadata Usage at NSIs & DAs: Historical Overview & Status Quo ....................................... 17

2.2 SDMX - Statistical Data and Metadata eXchange .................................................................. 19 2.2.1 The SDMX Community ........................................................................................................... 19 2.2.2 About SDMX – Scope and Content ......................................................................................... 20 2.2.3 Euro-SDMX Metadata Structure (ESMS) ................................................................................ 21 2.2.4 SDMX Registry ........................................................................................................................ 21

2.3 DDI - Data Documentation Initiative .................................................................................... 23 2.3.1 The DDI Alliance ..................................................................................................................... 23 2.3.2 DDI-Codebook ........................................................................................................................ 23 2.3.3 DDI-Lifecycle ........................................................................................................................... 24 2.3.4 DDI Moving Forward - Future Developments ........................................................................ 25

2.4 Other Metadata Standards and Standards ........................................................................... 25 2.4.1 PREMIS – PREservation Metadata: Implementation Strategies ............................................ 26 2.4.2 METS – Metadata Encoding and Transmission Standard ....................................................... 26 2.4.3 DCMI - Dublin Core Metadata Initiative ................................................................................. 26 2.4.4 INSPIRE Metadata Regulation ................................................................................................ 26 2.4.5 TEI – Text Encoding Initiative ................................................................................................. 27

2.5 ISO/IEC 111 79 – Metadata Registries .................................................................................. 27 2.5.1 ISO/IEC 1179 and Metadata Standards .................................................................................. 28

2.6 Metadata Related to Persistent Identifiers ........................................................................... 28

2.7 Case Studies ........................................................................................................................ 29 2.7.1 SCB – Statistics Sweden .......................................................................................................... 29 2.7.2 IAB – Institute for Employment Research .............................................................................. 33

3. METADATA STANDARDS IN CO-OPERATION ............................................................................... 37

3.1 The SDMX-DDI Dialogue ...................................................................................................... 37

3.2 Frameworks and Standards for Statistical Modernisation ..................................................... 39

3.3 DDI and SDMX: Overlaps and Gaps ...................................................................................... 40

Page 4: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

4/119

3.4 DDI and SDMX at Australian Bureau of Statistics (ABS) ......................................................... 42

4. VOCABULARIES AND CODING SCHEMES .................................................................................... 44

4.1 Beneficial Vocabularies and Coding Schemes ....................................................................... 44

4.2 Classifications used by NSIs ................................................................................................. 46 4.2.1 The NUTS Classification .......................................................................................................... 46 4.2.2 NACE - Statistical Classification of Economic Activities in the European Community ........... 47 4.2.3 Neuchâtel Model – Classifications and Variables .................................................................. 47 4.2.4 SDMX - Content-Oriented Guidelines and Metadata Common Vocabulary .......................... 48

4.3 Vocabularies used by the Data Archives ............................................................................... 49 4.3.1 The European Language Social Science Thesaurus (ELSST) ................................................... 49 4.3.2 CESSDA Topic Classification.................................................................................................... 49 4.3.3 DDI Controlled Vocabularies .................................................................................................. 49

4.4 Examples on Classifications Relevant in Social Sciences ........................................................ 50 4.4.1 ISCO - International Standard Classification of Occupations ................................................. 50 4.4.2 ISCED - International Standard Classification of Education ................................................... 51 4.4.3 Coding Schemes for Standards of Socio-demographic Characteristics .................................. 52 4.4.4 Classifications on Geography, Countries and Languages ....................................................... 52

5. FRAMEWORKS AND MODELS ................................................................................................. 54

5.1 Reference Models and Reference Architectures ................................................................... 54

5.2 CMF – Common Metadata Framework ................................................................................. 55

5.3 GSBPM - Generic Statistical Business Process Model ............................................................ 55

5.4 GSIM – Generic Statistical Information Model ...................................................................... 56

5.5 CORA & CORE - Common Reference Architecture and Common Reference Environment ....... 58

5.6 OAIS – Open Archival Information System ........................................................................... 58 5.6.1 The Model .............................................................................................................................. 58 5.6.2 OAIS and Social Science Data Archives .................................................................................. 60 5.6.3 Self-assessment of Archives ................................................................................................... 61

5.7 DQAF - Data Quality Assurance Framework ......................................................................... 62

6. THE STATE OF THE ART ........................................................................................................ 63

6.1 The NSI Community ............................................................................................................. 63 6.1.1 Eurostat Monitoring of National Metadata Systems - Phase 1.............................................. 63 6.1.2 DwB WP8 Survey: Questionnaire on Metadata Data (at NSIs) .............................................. 65

6.2 The Data Archive Community .............................................................................................. 66 6.2.1 Metadata Standards Usage at the DAs .................................................................................. 66 6.2.2 Controlled Vocabularies at Data Archives .............................................................................. 68 6.2.3 Co-operation between NSIs and Data Archives ..................................................................... 69

6.3 NSIs and DAs in Co-operation .............................................................................................. 70 6.3.1 France ..................................................................................................................................... 71 6.3.2 United Kingdom...................................................................................................................... 71 6.3.3 Norway ................................................................................................................................... 71 6.3.4 Germany ................................................................................................................................. 72

Page 5: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

5/119

6.4 Discussion ........................................................................................................................... 73 6.4.1 NSI Current Needs and used Metadata Standards, Classifications & Coding Schemes ......... 74 6.4.2 DAs Current Needs and used Metadata standards, Classifications & Coding Schemes ........ 75

6.5 Concluding Remarks and Summaries ................................................................................... 79

6.6 Further Relevant Facets of the Metadata Agenda for NSIs & DAs .......................................... 81

APPENDIX 1: INTERNATIONAL COOPERATION ................................................................................ 83

APPENDIX 2: THE NATIONAL STATISTICAL INSTITUTES AND THE DATA ARCHIVES ..................................... 88

APPENDIX 3: WORK PACKAGE 7 - SURVEY OF (CESSDA) DATA ARCHIVES ........................................... 90

REFERENCE LIST ....................................................................................................................109

GLOSSARY...........................................................................................................................116

Page 6: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

6/119

INTRODUCTION

The Data without Boundaries (DwB) project is an EU project, within the 7th Framework Programme (FP7), aiming at making official statistics (OS) microdata from European countries available for researchers within the European Union1. The motivation for doing such an effort is that currently OS microdata repositories are underutilized resources within research, e.g. within the social science research area, both nationally in many countries and internationally. Large databases containing rich microdata on individuals and firms enable researchers to gain a deeper understanding of the society and the economy and subsequently make it possible to design and evaluate policies. For example, micro-econometrics – a branch of economics that unites economic theory with statistical methods – has contributed substantially to scientific policy evaluations (Heckman, 2004). European National Statistical Institutes (NSIs) and Data Archives (DAs) that are members in the Council of European Social Science Data Archives2 (CESSDA, see Section 1.2.2), and researchers are involved in the DwB project in order to facilitate better access to OS microdata3. Support for searching and locating OS microdata at NSIs is by far not as developed and coordinated at a European level as for research data at DAs. No structured database exist that present national OS microdata from the NSIs at a European and international level, even if there are some degree of integration via Eurostat/European Statistical System, (ESS)4, by which datasets from different countries are integrated through gathering results from common European Surveys coordinated by Eurostat/ESS. One of the key elements in the success of making DA’s research data searchable and accessible at a European level is, that the data is from the beginning meant to be used in research. Therefore building infrastructures to support search abilities for researchers has been more obvious than in the case of OS microdata. DwB has the mission to develop an infrastructure for making OS microdata at European NSIs searchable at the European level, aligned with the CESSDA portal, and develop secure ways of remote access to OS microdata. A starting point for aligning functionalities between different systems in a decentralized environment is to find ways for interoperability between those systems. On both the organisational and technical levels, one of the most important issues is to agree on common standards and protocols. In order to accomplish interoperability between different systems that are involved with OS microdata and the planned search services, DwB will encourage cooperation between the European Statistical System (ESS) led by Eurostat, NSIs, and other stakeholders such as the Central banks and the DAs in the CESSDA. Currently, the degree of cooperation between NSIs, or other OS microdata providers, and DAs vary in different countries. In some cases they have a close cooperation5, while in other countries the cooperation is sparse (see Section 6.2.3).

1 http://www.dwbproject.org

2 http://www.cessda.org/

3 http://www.cessda.org/accessing/catalogue/

4 http://epp.eurostat.ec.europa.eu/portal/page/portal/pgp_ess/about_ess

5 E.g. National Social Science Data Services, NSD and Statistics Norway, SSB

Page 7: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

7/119

The DwB project is organised in twelve Work Packages (WPs), each focusing on certain aspects of the subjects mentioned above. The aim of WP7 is to create a common platform for a lasting cooperation between NSIs and DAs. Several work packages are working with metadata issues. WP5 will provide structured metadata for two purposes: for creating a study-level catalogue of official statistics microdata, and for working with the actual data files. WP8 is focused on the search portal and the metadata needed for data discovery, and WP12 aims to implement a one-stop shop for data discovery. These three work packages have a very practical approach to metadata issues, and WP7 complements their work by giving a state-of-the art review of metadata standards usage and by promoting collaboration within standards as well as discussion about metadata. To accomplish this, WP7 focus on charting compatible metadata standards for use within NSIs and DAs. There are six deliverables (D) (D7.1-D7.6) in WP7:

D7.1 – Metadata standards - usage and needs in national statistical institutes and data archives

D7.2 – Standards with future relevance for European Social Science data infrastructure needs and key areas

D7.3 – Metadata standard selection and usage - Rules and best practices D7.4 – Software development and metadata standards D7.5 – DDI and SDMX D7.6 – Metadata standards and practices in related disciplines and standards for linking

different sources This report is the first deliverable (D7.1) in WP7. D7.1 results from the work in Task (T) 7.1 (and in part T7.2), in which the state-of-the-art in metadata usage in NSIs and DAs has been reviewed. Partly, the report is based on responses in surveys that have been carried out to collect national and international requirements and constraints for metadata usage within the NSI and DA communities. In Chapter 1, background information such as basic concepts and important actors in the area of work for WP7 (in DwB) is introduced, and Chapter 2 contains an overview of metadata standards and frameworks that are in use in these communities. The contents of Chapter 3 cover co-operability between different metadata standards, and how those standards can be combined. Chapter 4 introduces and describes the importance of using controlled vocabularies and coding schemes in metadata harmonisation. In Chapter 5, Frameworks and Reference Models that are of relevance for the work in WP7 (DwB) are introduced, and in Chapter 6 the state of the art in metadata usage and the use of controlled vocabularies at the different NSIs and DAs are presented. Chapter 6 concludes the findings.

Page 8: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

8/119

1. BACKGROUND INFORMATION

Chapter 1 is outlined as follows: In Section 1.1 the role of metadata within the DwB project is described. Section 1.2 describes governing and umbrella organisations for OS microdata and research data repositories, and Section 1.3 focuses on the metadata concept. In Section 1.4, a description is given of two important processes, semantic integration and conceptual modelling, that are in the heart of the process of harmonizing metadata from heterogeneous resources.

1.1 Data without Boundaries and Metadata

In this Section, an explanation will be given to why the usage of metadata and metadata standards are foundational for the outcome of the DwB project. The concept data integration is shortly introduced to set the scene (in Section 1.1.1). Metadata harmonisation, that is one approach that can be used in data integration, and that have a central role in many current systems in the process of integrating data from heterogeneous data resources, is described in Section 1.1.2,. A brief discussion about how the experiences from the development of the CESSDA portal can be used in the DwB project is found in Section 1.1.3.

1.1.1 Integration of Heterogeneous Data Resources

The DwB project has the ambition to integrate data from distributed, heterogeneous resources. Within computer science, integration of data from distributed, heterogeneous resources has been an area of research since the 1970’s (Sheth and Larson, 1990; Hammer and Mcleod, 1993). Data integration is nowadays a natural ingredient in a wide spectrum of data-intensive research areas, in which there are needs for comparison and overview of data in distributed and often heterogeneous data resources. Whatever the systems or surrounding technology, there are some fundamental elements in all kinds of data integration. When building systems that make heterogeneous data resources in some sense interoperable, there must be agreements on how to cooperate; on what level should the resources be interoperable, what technical standards to use, level of autonomy of the different resources etc. Here, the autonomy concept is used in a general sense; to what degree are the resources restricted in their daily work and development of internal systems by participating in a particular data integration. A system in which distributed resources are made interoperable can be called a federation of distributed data resources (Sheth and Larson, 1990; Hammer and McLeod, 1993). As already mentioned, there is a gradient in how much the data integration process impacts the work locally with the data resources; from very limited autonomy (e.g. a data warehouse, with a completely centralised data management system) to completely autonomous (e.g. a peer-2-peer system) (Batini and Scannapieco, 2006). No matter where on the scale of impact, the resources have to agree on how to access their data, and information about the data (structural and semantic). It is a prerequisite for any integration system to solve semantic and technical heterogeneity between different data resources to be integrated.

Page 9: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

9/119

The CESSDA DAs have, through CESSDA, agreed on a level of integration that works for the resources at the European DAs6 (European Commission, 2007). Researchers can explore descriptions of the various DAs data holdings through a common portal, and find where the data are located. The CESSDA portal is a federation, in which the autonomy of the repositories has been retained to a large extent, except for the requirement that they have to serve CESSDAs integration system with necessary descriptions about their data for exploration in common standards (Alvheim, 2009; European Commission, 2007).

1.1.2 Using Common Metadata for Harmonisation for Data Integration

Definitions for the terms metadata and metadata standards are given in Section 1.3. Metadata discussed in this Section is, according to the discussion in Section 1.3, descriptive metadata. The focus in DwB is set on an approach where the data integration is based on harmonisation of descriptive metadata from different resources (as described in Quandt, 2010). Harmonisation refers to a process in which the resources are described in a common metadata standard, and a common terminology is used that semantically means the same for all resources (semantic integration, see Section 1.4). Data providers that harmonise their metadata do not necessarily have to internally adopt the common metadata standard. Instead, each of the data providers can map their internal metadata onto an agreed common metadata standard/model and schema. The harmonised metadata that describes the data in the resources are then “served” to a system that integrates the metadata in a common search platform. The integration system can for example access the harmonised metadata with a process called harvesting (for more details, see D8.3 in DwB). Controlled vocabularies (see Chapter 4) and other constraints can be used to strengthen the metadata harmonisation semantically and structurally. The CESSDA portal is an example of integration of data in heterogeneous, autonomous resources (data archives) by using harmonised descriptive metadata represented in a common metadata standard, and using controlled vocabularies and code schemes. Harmonisation of metadata is done by the DAs, and the harmonised metadata are made available in local servers for harvesting, and for presenting in the CESSDA portal. This approach has been shown to be successful7 (European Commission, 2007). Retaining the autonomy of the resources/DAs to a large extent is one of the most valuable benefits by having a common interoperable platform based on common metadata standard agreement.

1.1.3 Metadata at DAs and CESSDA – Inspiration for an OS Microdata Search Portal?

Many lessons have been learned when building the CESSDA portal, including the work with agreements on metadata standards8 (European Commission, 2007). In WP7 (and to some extent WP8), the metadata usage within NSIs and DAs are reviewed, with the goal to enhance a compatible metadata standard. The overall challenge and the ultimate objective in the DwB project are to find a solution for a common European portal for the providers of OS microdata

6 www.madiera.org

7 www.madiera.org

8 www.madiera.org

Page 10: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

10/119

(NSIs), analogous to the one for DAs in the CESSDA portal. In WP12 of DwB, the work aims at an alignment of functions between a search portal for OS microdata and the CESSDA portal, so data from the NSIs and the DAs can be searchable simultaneously9. WP7 will explore metadata standards that are compatible with both the NSIs and the DAs current metadata practises. Even though the DAs and NSIs have many similarities in their data and metadata production and data distribution and preservation processes, they use different metadata standards. One of the main reasons for this is that NSIs and DAs have different focus and scopes as data producers and data repositories, and therefore emphasize different phases in these processes10 (UNECE secretariat, 2009). An overview of metadata standards that are used within DAs and NSIs is given in Chapter 2. An important issue is that documentation for European OS microdata at the NSIs is not yet streamlined. However, there are examples for streamlining OS data for specific needs. For example Eurostat/ESS requires that statistics are compiled on the basis of common standards, i.e. data have to be presented and described uniformly with respect to scope, definitions, units and classifications in the different surveys and sources11. To get an overview on the current usage of metadata standards at European NSIs and DAs for the work in the DwB project, WP7 and WP8 have carried out surveys12 to collect information on the subject. A presentation and an analysis of the survey responses can be found in Chapter 6. Communities that are involved in developing metadata standards that are well established among NSIs and DAs have, independently of the DwB project, started to investigate the possibilities to use a combination of the standards. That work is presented in Chapter 3, and is of great interest for the work in the DwB project.

1.2 European Organisations Influence over the NSIs and DAs

There are organisations and agents on European level that play important roles when working towards an infrastructure for searching and disseminating OS microdata. These organisations have influence over their members in standardisation and cooperation issues.

1.2.1 The Research Community

One of a Data Archive’s tasks is to provide the research community with support and services to preserve, organise, maintain and make data material available. These support and services include the participation in the investigation and information of legal and ethical aspects in respect to archiving, management and dissemination of data. An important task is to provide

9 http://www.dwbproject.org/about/wps.html#wp12

10 http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model

11 http://epp.eurostat.ec.europa.eu/portal/page/portal/quality/code_of_practice/compliance/principle_14

12 The results from the WP7 survey are available in Appendix 3

Page 11: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

11/119

support and guidance to researchers regarding documentation, data descriptions, and easy access to many types of data, as well as to support and encourage secondary analysis of data. The National Statistics Institutes’ primary goal is to produce official statistics. However, an increasing number of NSIs is being tasked with making their vast data collections available to a broader range of users, including researchers. Researchers, in turn, have a wish to easier search capabilities and access for OS microdata for scientific purposes. Therefore the aims of the research community, the data archives and the NSI are highly congruent. Researchers' goal is not only to gain access to a wide variety of microdata from both statistical institutes and data archives, but also to understand the data, and to use data from various sources together in a meaningful and effective way. The problem is that data are documented in different ways, using different metadata standards, formats and structures. All this makes it often difficult to decipher what each piece of metadata actually means and how one standard compares with another. In addition, the automation of exchange and sharing of metadata is more complex, which in turn complicates the development of services that would benefit the research community, like common catalogues. WP7 in the DwB project will give an overview, and a starting point for requirements of metadata standards to support the development of a search portal for OS microdata, that meets the needs from the research community, e.g. transparency in the user interface according technical details about harmonisation of the metadata.

1.2.2 CESSDA and the Data Archives

Data archives became a part of the social science research scene in the late 1950s and early 1960s. Social scientists were using more and more computerized data, thus creating a need to store and manage the data (including the metadata). From the beginning, the possibilities to re-use data and to verify results were important motives, as well as the realization that research data has long-term value (Doorn and Tjalsma, 2007). In addition to the preservation task (in the literal sense), data archives were mandated to select, acquire, receive, manage, describe and provide access to research data (several DA also have agreements with NSIs and other government bodies and also provide access to OS microdata). Data archives are most often affiliated with universities and/or national research organisations. CESSDA is an umbrella organisation for the social science data archives in Europe and since the 1970s the members have worked together to improve access to data for researchers and students (see Figure 1.1). CESSDA research and development projects and Expert Seminars enhance exchange of data and technologies among data organisations. The primary goals of CESSDA are (Mochmann, 2002):

Collect, store and distribute numeric data for use in secondary analysis

To facilitate easy and quick access to data for scientific analysis13

To promote projects and procedures for enhancing exchange of data and technologies among data organisations

13

The CESSDA Catalogue enables users to locate datasets, as well as questions or variables within datasets, stored at CESSDA archives throughout Europe. Data collections include sociological surveys, election studies, longitudinal studies, opinion polls, household and business surveys as well as censuses produced by NSIs in several cases. More information can be found at http://www.cessda.org

Page 12: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

12/119

To stimulate development and use of these procedures throughout the world

To encourage new data organisations to further these objectives

All members of CESSDA have membership obligations14, a selection listed below, thus displaying the influence it has over its members.

1. To support inter-archival data transfer 2. To circulate reports of current activities, archive holdings, services, and plans for new

activities 3. To support training and information exchange in methods and techniques of data

service and secondary analysis 4. To adhere to the CESSDA Trans-border Data Access Agreement

See Appendix 1 for more information about international cooperation among data archives.

Figure 1.1. Data archives that are members of CESSDA. The figure is taken from CESSDAs homepage and might

change due to the current process of setting up a new legal entity for CESSDA15.

CESSDA ERIC

Presently, CESSDA is shifting into a new organisation known as CESSDA ERIC (European Research Infrastructure Consortium)16, 17. This is primarily to meet the challenge of archiving

14

CESSDA constitution: http://www.cessda.org/doc/cessdaconstitution20040402.pdf 15

www.cessda.org/about/members

Page 13: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

13/119

social science data optimally and ensuring access for researchers across national borders (see Appendix 2 for more information).

1.2.3 ESS/Eurostat and the NSIs

The European Statistical System (ESS) is an organisation that has the mission to develop, produce and disseminate European statistics. The ESS does not operate within a political vacuum and for many EU Member States the relationship between the Commission and national government is very important18. Partners:

NSIs and NSAs the EUs 28 Member States, 4 EFTA (European Free Trade Association) Countries & 5 Candidate Countries

Eurostat

European Central Bank (ECB) and CBs

International Organisations – UNECE (United Nations Economic Commission for Europe) Statistical Division

ESS is led by Eurostat, which is the statistical office of the European Union situated in Luxembourg. Its task is to provide the European Union with statistics at European level that enable comparisons between countries and regions19. Eurostat is part of the European Commission and needs to work closely with NSIs in order to achieve its objectives.

1.3 Definition of Metadata

A commonly used definition of the term metadata is “data about data”. Metadata have a descriptive role in data documentation, in various contexts, e.g. to describe the contents and context of a file containing microdata, and to describe the contents in a database (NISO, 2004).20 Metadata can in itself be used as data. For example, MARC21 records describe bibliographic items and thus are generally considered to be metadata, but researchers wishing to test Lotka's law can use MARC records as their data22. With this in mind, a more comprehensive definition could be the one given by Bargmeyer and Gillman, 2000, p. 1: “Metadata is data that is used to describe other data, so the usage turns it into metadata”. For a more detailed introduction to the metadata concept, see for example (NISO, 2004).

16

http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=eric 17

http://www.cessda.org/about/research/index.html#cessda_ri 18

http://www.ons.gov.uk/ons/about-ons/what-we-do/relationships-abroad/european-statistical-system--ess-/index.html 19

http://epp.eurostat.ec.europa.eu/portal/page/portal/about_eurostat/introduction 20

This definition is a bit ambiguous, since metadata can also describe real-world physical object, e.g. archeological objects. 21

MARC (MAchine-Readable Cataloguing) is a standard for the storage and exchange of bibliographic records and related information in machine-readable form. http://www.loc.gov/marc/ 22

Lotka's law describes and predicts the frequency of publication by authors in a given subject field. It says that there is an inverse relation between the number of publications and the number of authors producing these publications. See for example Pao (1986).

Page 14: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

14/119

Metadata is usually categorized in four types: descriptive, structural/technical and administrative metadata. In the social sciences, one more type of metadata is recognized: paradata (Gregory et al., 2009). Descriptive metadata is sometimes also referred to as reference metadata. It consists of a range of information that can be used to discover the data and assess the quality of it, for example title, abstract, methodology and author. Structural metadata 23 describes the structure of datasets and relationships between items. Administrative metadata is information about administrative processes like file management and rights management; version numbers, format information, user rights and event dates are typical administrative metadata. Paradata (sometimes also called behavioural metadata) is information about the data collection process and reactions of respondents during data collection, for example the start and end time of interview, and delay in responding and interviewer's observations. It is worth noting, that although these categories may be helpful when thinking what metadata means, it is not always simple, unambiguous nor even necessary to divide metadata elements or items into these categories. When talking about metadata standards usage and needs, our focus is mostly in the descriptive metadata, although the other metadata categories are not forgotten. At DAs, metadata are usually descriptions of datasets that are collected in research projects, and at NSIs metadata are descriptions of statistical tables and/or OS microdata. Naturally, those DAs that have agreements with NSIs, provide also descriptions of OS microdata. Metadata adds a description layer to the digital raw data files. The task of the description layer is to conceptualize and add substance to numbers and bytes in the data file. Without appropriate metadata, the raw data would be hard or impossible to interpret and analyse, and this problem would be amplified when other researchers want to reuse the data. This report will cover also metadata that are essential for data archival, preserving, dissemination and exchange (see Chapter 2). Sometimes the term documentation is used to describe all information that is needed to interpret, understand and use data 24 , 25 . Documentation for a dataset can include: questionnaires, codebooks, syntaxes, methodology reports and quality reports. There is an important difference in the use of the terms documentation and metadata, basically since they are used in different contexts for describing datasets. Documentation is meant to be read by humans, but metadata, and especially metadata stored according to standards, can be encoded to be machine-actionable and thereby enable the implementation and usage of metadata-driven systems. In short, metadata can be seen as a subset of data documentation, providing standardised structured information explaining the data. Modern systems built for storing metadata for datasets often have functionality for deriving human-readable codebooks from the metadata (Gregory et al., 2009).

23

In digital library community usage, structural metadata is defined as describing the "intellectual or physical elements of a digital object” (http://www.digitizationguidelines.gov/term.php?term=metadatastructural). 24

http://data-archive.ac.uk/create-manage/document/overview [Accessed 2013-03-22] 25

http://datalib.edina.ac.uk/mantra [Accessed 2012-11-30]

Page 15: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

15/119

Standards are used to promote interoperability between for example organisations and systems (Bargmeyer and Gillman, 2000). Metadata standards are developed and maintained within communities to give the best possible and controlled descriptions of data for their specific needs. The expression metadata standard is often also used for well-established metadata schema specifications. The Data Documentation Alliance (DDI) develops specifications for metadata that describe social science data. The term “DDI metadata standards” in this report refer to DDI specifications of metadata schema specifications for describing social science research data.

1.4 Semantic Integration and Conceptual Models

Semantic integration is a process of interrelating information from different sources (Doan and Halevy, 2005; Elmasri and Navathe, 2007). A conceptual model describes concepts and relationships between those concepts within a domain of interest, independently on design and implementation details (Elmasri and Navathe, 2007). Conceptual models expresses concepts in the domain/-s of interest are represented, and relationships between those concepts. Conceptual models of resources are useful when semantically mapping concepts in heterogeneous resources to each other (like data archives, NSIs and other data producers), to avoid having distracting technical details to take into consideration. The actual mapping to achieve semantic integration can be accomplished using different approaches, e.g. to first agree on a metadata schema (relevant for that particular integration) that contain common concepts and relationships between them, and then map the different resources’ concepts against that common schema. Different approaches for semantic integration are described in D8.3 of the DwB project. The use of conceptual models is emphasized in a standard (ISO/IEC 111 79) for describing the contents and the structure metadata registries that are designed to easily be integrated with other resources (see Section 2.5). The contents are interpreted into concepts and relationships that are represented in a conceptual model of the metadata. In the process of semantically integrating heterogeneous metadata registries, the use of conceptual models promotes a shared understanding of metadata for experts working in different domains, and decreases the risk that problems with different interpretations of the terms and concepts occur. In harmonisation processes (see Section 1.1.2) the semantic integration is focused on definitions of concepts rather than names of metadata items in certain versions of standards. There are other reasons than the integration perspective for having conceptual models for metadata registries. There might be changes in the technology for managing the contents - the metadata and data – but the contents stays the same. Conceptual models of the metadata represent the contents, while metadata schemas or different versions of metadata standards are explicit technologies used for implementation. Currently, applications utilizing metadata standards are generally coded against specific versions of the standards, which means fairly high maintenance costs when new versions of standards are introduced and need to be

Page 16: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

16/119

supported26. Moreover, the technology used to express, handle, share, publish and use the contents are prone to change, and changes within technology are likely to be rapid and even unexpected. It is also important to notify that metadata registries are databases, and classical database development procedures apply to the design process and maintenance of those databases as well, including using conceptual modeling on top of logical descriptions and the physical structure of the database. This is to ensure consistency and facilitate maintenance (Elmasri and Navathe, 2007).

26

One solution to this problem can be called "standards agnosticism", where standards themselves are represented as metadata objects within a registry, and every metadata object describes which versions of the standards are supported. This way, introducing new standards or new versions of new standards has a minimal impact on existing applications (Gregory, 2011).

Page 17: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

17/119

2. METADATA STANDARDS

Metadata standards are developed in order to support the usage of uniform descriptions of data (metadata) that are managed within specific communities. A community can for example be centred around a certain research area, or a specific usage of the data. For example, international official statistics governmental institutions, such as Eurostat/ESS (see Section 1.2.3), have developed metadata standards to support the process of integrating results from NSIs from European NSIs. Those metadata standards are focused on handling statistical data and results of analysis, e.g. to disseminate results and related metadata. One of the aims of the DwB project is to find metadata standards that can be used to support the work in making European and national OS microdata resources searchable for researchers. None of the currently existing metadata standards are comprehensive standalone for that purpose. Instead, there will be efforts made to find a way for cooperation between different metadata standards that have complementary features. In Chapter 3, the cooperation between different metadata standards is discussed, and the importance of certain features when developing metadata standards that are planned to be integrated with other metadata standards are high-lighted. In this Chapter, metadata standards and other standards that are used at European NSIs and CESSDA DAs are introduced. A historical overview is given in Section 2.1 to the metadata usage at NSIs and DAs. In Section 2.2 and 2.3, the Statistical Data and Metadata eXchange (SDMX) metadata standard and the Data Documentation Initiative (DDI) metadata schema specification are presented. In Section 2.4, metadata standards and standards for dissemination, exchange and archiving are introduced. In Section 2.5, a metadata registry standard is introduced, ISO/IEC 11179, that support the development of metadata registries for common understanding of data. Other examples of metadata standards are presented that are important in their area of usage, and that also are followed within NSIs and DAs when describing specific data. In Section 2.6 metadata related to Persistent Identifiers (PIDs) is discussed. In Section 2.7, case studies with two different organisations are included, with respect to metadata management within the organisations. A more specified analysis of to what extent metadata standards and other standards are used within NSIs and DAs is presented in Chapter 6.

2.1 Metadata Usage at NSIs & DAs: Historical Overview & Status Quo

Metadata standards that are used within NSIs and DAs mirror procedures that are in focus in different reference frameworks and models (see Chapter 5) that are followed within NSIs and DAs, and thereby they have native/built-in similarities and dissimilarities. The reference frameworks are results from efforts to describe organisations that traditionally have different purposes. There are two metadata standards that are followed by a majority of the European NSIs and CESSDA DAs, and are therefore described more in depth both in this Chapter, and will be in focus in the rest of this report (D7.1). These are SDMX metadata standard and the DDI metadata schema specification. There are various reasons to why those two metadata standards are widely accepted. The historical background that explains the driving forces for their development is described briefly here. However, it has to be pointed out, that many NSIs

Page 18: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

18/119

and DA use those two standards only for specific purposes, in exchanging results and integration of international studies, but in their own internal systems they have other ways to manage and organize their data and data processing. In Chapter 6 there are results presented from surveys, and analysis, about metadata usage and reference framework that are followed in European NSIs and DAs. Historically, standardised cross-national statistics has been a topic of interest since the late 1920s. In 1930s, during the depression, a need for cooperation between different countries’ economies was implied (BIS et al., 2002). This led to a need for macroeconomic data, that were comparable across different countries, and that are described in standardised ways (e.g. who produced the data). In 1940s, the UN and IMF provided standard definitions of statistical concepts27, 28. Those standards have continuously been developed through the years, and are still being modified, with a growing set of other statistical topics, e.g. the System of National Accounts (SNA) that partly consists of The United Nations Classifications Registry and EUROSTATs metadata registry (RAMON) (described in Section 4.2), respectively. In the 1950s advances in computing technologies led to internal standards for coding statistical data. Through 1960s-1980s standards for electronic exchange of information were developed (mainly driven by commercial interests for use in banking systems for transactions), and in the late 1990’s - EUROSTAT, the IMF and their member countries implemented a standard business practice for electronic exchange of statistical data. In 2002, the SDMX (Statistical Data and Metadata eXchange) initiative was founded, which was a joint effort from Bank for International Settlements (BIS), the European Central Bank (ECB), EUROSTAT, the International Monetary Fund (IMF), the Organisation for Economic Cooperation and Development (OECD) and United Nations Statistics Division and the World Bank (BIS et al., 2002). The focus of the SDMX initiative was on business practices in the field of statistical information (exchange and sharing of data and metadata), and to explore common e-standards and other standardisations. The SDMX initiative is still active, and has extended the member list to include the World Bank. The work of the SDMX initiative has resulted in technical and statistical standards and guidelines, and IT infrastructure, for efficient exchange and sharing of statistical data and metadata. In Section 2.2, the SDMX metadata standard is introduced. With the brief background history for the development of standards that are used within many NSIs, e.g. the SDMX metadata standard, one can see that it has been an authority driven development with the aim being to integrate results. In contrast, metadata used at DAs were originally developed to describe research data for deposits in archives after being created and used in research (Mochmann, 2002). In the middle of the 20th century, techniques were developed and spread for carrying out empirical social sciences for mass scale investigations about attitudes, behavior and values of average citizens of societies, e.g. statistical analysis of representatives of whole populations (Mochmann, 2002). Data that were collected were hard to access for other researchers than the principal investigators. The need for replication and verifying findings from earlier research, in combination with the nature of collections of social science data that are time

27

Measurement of National Income and the Construction of Social Accounts, 1947. Scanned original publication at: http://unstats.un.org/unsd/nationalaccount/docs/1947NAreport.pdf 28

UN Balance of Payments Manual, 1948, IMF.

Page 19: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

19/119

dependent and unique29, were the driving forces for founding social science data archives. The first social science archive, the Roper Center30, was founded in 1947 in the United States, and the first European archive was founded in 1960 at the University of Cologne. The planning for cross-national comparative research was initialized during the 1960s, supported by International Social Science Council (ISSC) (Mochmann, 2002), and in 1962 with the establishment of the North American Inter-University Consortium for Political and Social Research (ICPSR). In 1976 the Council of European Social Science Data Archives (CESSDA) was formed, with the aim to promote acquisition, archiving and distribution of data for social sciences. The development of standards for study description was emphasized, and in the 1990s, a CESSDA portal for data discovery was developed. In the CESSDA portal, there were problems that could not be solved with the infrastructure solution that was used. In the beginning of 2000, CESSDA decided to move their portal to an infrastructure that was based on technologies that better met the needs for their requirements31 (Ryssevik and Musgrave, 2001). One of those technologies was a metadata schema specification that was specifically developed for describing research data in social, behavioral, and economic sciences, and was developed by the Data Documentation Initiative (DDI). This led to a wide acceptance in using the DDI metadata schema specification as internal metadata schema among European social science data archives. The international DDI committee was established by ICPSR, with a member list that included several data archives. DDI is a standard that has evolved from requirements defined by national data archives and is being used by such. The DDI metadata schema specification is introduced in Section 2.3. Since the establishment of the DwB work plan in January 2011, a great deal has happened as far as metadata standards development is concerned. When talking about current use of metadata standards, a focus will be on the two „major players“, which are the SDMX metadata standard and DDI metadata schema specification. Furthermore, besides the aforementioned standards there are some other metadata standards worth mentioning (see Section 2.4).

2.2 SDMX - Statistical Data and Metadata eXchange

2.2.1 The SDMX Community

The Statistical Data and Metadata eXchange (SDMX) initiative is committed to developing and promoting technical standards that are suitable for the electronic exchange of statistical information both within the Member States and internationally (see Section 2.1). Since setting up an agreement between seven sponsoring organisations, there has been considerable progress in all aspects of SDMX.

29

Since questions and other measurements are done at certain time, and give people’s opinions and experiences that can never be recorded again (Mochmann, 2002). 30

http://www.ropercenter.uconn.edu/ 31

www.madiera.org

Page 20: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

20/119

Creation of two SDMX working groups in the year 2011, i.e. the SDMX Technical Standards Working Group and the SDMX Statistical Working Group, with participants from all over the globe, has contributed to ensure fast and reliable development of standardised file formats for data and metadata, and standardised contents of these files32. As this is a requirement for automated production, processing and exchange of SDMX data and metadata files between national and international statistical organisations, SDMX can aptly be described as a standard that “springs from the world of official statistics” (Gregory and Heus, 2007, p. 17). At the time of writing, SDMX usage has been adopted by a growing number of statistical organisations. The International Organization for Standardization published SDMX as an “International Standard” (IS) 17369 on January 201333.

2.2.2 About SDMX – Scope and Content

The Statistical Data and Metadata Exchange (SDMX) format is a standard primarily used by the official statistics community for the exchange of time series data (Vardigan et al., 2008; Vale, 2010). Therefore, the focus is set on structural metadata, i.e. information that describes and identifies statistical data and metadata. Structural metadata can be for instance the name of a statistical table or the dimension of a statistical cube. Usually structural metadata is associated with specific observations or series of data – that is, each data set has a set of structural metadata. In SDMX, these descriptions are referred to as Data Structure Definitions (DSDs). DSDs contain information on how concepts are associated with measures, dimensions and attributes of a data cube. In addition, a reference metadata set also has a set of structural metadata that describes how it is organised, referred to as Metadata Structure Definitions (SDMX Initiative, 2011a). Another focus of the SDMX is the dissemination of data and metadata. Linking data and metadata together makes the comprehension and further processing of the data easier (SDMX Initiative, 2011a). The current standard is SDMX 2.1. The technical specification consists of several sections (SDMX Initiative, 2011a):

SDMX Framework Document

SDMX Information Model

SDMX-EDI - the UN/EDIFACT format for exchange of SDMX-structured data and metadata.

SDMX-ML - the XML format for the exchange of SDMX-structured data and metadata

The SDMX Registry Specification

The SDMX Technical Notes

Web Services Guidelines

Being a standard that has a central role in integrating different systems, the SDMX standard has been aligned to ISO 1117934 (see Section 2.5.1).

32

http://sdmx.org/?page_id=6 33

http://www.iso.org/iso/catalogue_detail.htm?csnumber=52500 34

http://sdmx.org/wp-content/uploads/2009/01/04_sdmx_cog_annex_4_mcv_2009.pdf

Page 21: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

21/119

2.2.3 Euro-SDMX Metadata Structure (ESMS)

In order to facilitate and improve meta-data exchange within the European Statistical System (ESS) a Europe-wide standard named Euro-SDMX Metadata Structure (ESMS) 35 was implemented at Eurostat as well as at a national level. It was strictly derived from the list of cross-domain concepts defined in the SDMX Content Oriented Guidelines (2009) and uses 21 high-level concepts (see Figure 2.1) with the inclusion of a limited breakdown of sub-items. ESMS is well suited for the purpose of statistical production processes both as far as quality and documenting methodologies are concerned. According to the Monitoring Report dated September 2011, use of ESMS for the national production of reference metadata has become more widespread at NSIs. Out of the 33 responding NSIs 10 already apply the ESMS, while 13 among the remaining 23 countries; intend to adopt the standard structure in the future (Eurostat, 2011).

2.2.4 SDMX Registry

Eurostat has developed and put into operation an SDMX Registry36. An SDMX Registry typically provides machine readable interfaces/web services around a central and potentially normalized, harmonized and sometimes authoritative index of definitions of concepts, codes, definitions and so on. The Registry can be used by any other application in the network with sufficient access privileges. It can be seen as the index of a distributed database or metadata repository, which is made up of all the data provider’s data sets and reference metadata sets within a statistical community.

35

http://epp.eurostat.ec.europa.eu/cache/ITY_SDDS/Annexes/ESMS_Structure.xls 36

https://webgate.ec.europa.eu/fpfis/mwikis/sdmx/index.php/SDMX_registry

Page 22: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

22/119

1 Contact

•1.1 Contact organisation

•1.2 Contact organisation unit

•1.3 Contact name

•1.4 Contact person function

•1.5 Contact mail address

•1.6 Contact email address

•1.7 Contact phone number

•1.8 Contact fax number

2 Metadata update

•2.1 Metadata last certified

•2.2 Metadata last posted

•2.3 Metadata last update

3 Statistical presentation

•3.1 Data description

•3.2 Classification system

•3.3 Sector coverage

•3.4 Statistical

•3.5 Statistical

•3.6 Statistical population

•3.7 Reference area

•3.8 Time coverage

•3.9 Base period

4 Unit of measure

5 Reference period

6 Institutional mandate

•6.1 Legal acts and other agreements

•6.2 Data sharing

7 Confidentiality

•7.1 Confidentiality - policy

•7.2 Confidentiality - data treatment

8 Release policy

•8.1 Release calendar

•8.2 Release calendar access

•8.3 User access

9 Frequency of dissemination

10 Dissemination format

•10.1 News release

•10.2 Publications

•10.3 On-line database •10.4 Micro-data access

•10.5 Other

11 Accessibility of documentation

•11.1 Documentation on methodology

•11.2 Quality documentation

12 Quality management

•12.1 Quality assurance

•12.2 Quality assessment

13 Relevance

•13.1 User needs

•13.2 User satisfaction

•13.3 Completeness

14 Accuracy and reliability

•14.1 Overall accuracy

•14.2 Sampling error

•14.3 Non-sampling error

15 Timeliness and punctuality

•15.1 Timeliness

•15.2 Punctuality

16 Comparability

•16.1 Comparability - geographical

•16.2 Comparability - over time

17 Coherence

•17.1 Coherence - cross domain

•17.2 Coherence - internal

18 Cost and burden

19 Data revision

•19.1 Data revision - policy

•19.2 Data revision - practice

20 Statistical processing

•20.1 Source data

•20.2 Frequency of data collection

•20.3 Data collection

•20.4 Data validation

•20.5 Data compilation

•20.6 Adjustment

21 Comment

Figure 2.1. High-level concepts in the ESMS standard (compiled from the EURO-SDMX Metadata Structure, see Section 2.2.3).

Page 23: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

23/119

2.3 DDI - Data Documentation Initiative

2.3.1 The DDI Alliance

The Data Documentation Initiative (DDI)37 is an international effort to create a standard for metadata describing data resources from the social, behavioural and economic sciences. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving. The DDI Alliance is a self-sustaining membership organization that develops and promotes the DDI specification and associated tools, education, and outreach programs. Membership in the DDI Alliance is open to for-profit or not-for-profit educational, commercial, or governmental organizations that want to have a voice in the decision-making process for the standard. Members are expected to contribute to the work of the Expert Working Groups that undertake the development of the standard. At the time of writing, there are ten Working Groups: Administrative Data Group, Controlled Vocabularies Working Group, Governance Task Force, Qualitative Data Exchange Working Group, RDF Vocabularies Working Group, Survey Design and Implementation Working Group, Technical Implementation Committee (TIC), Tools Catalogue Group, Web Site Maintenance Group, and the DDI Developers Community. DDI is branched into two separate development lines. DDI-Codebook is intended primarily to document simple survey data. DDI-Lifecycle is designed to document and manage data across the entire life cycle, from conceptualization to data publication and analysis and beyond.

2.3.2 DDI-Codebook

Traditionally, in archiving social science research data, the metadata have been document-centric, based on a codebook approach designed for documenting single surveys after archiving (Mochmann, 2002). DDI-Codebook38 is the more lightweight version of the DDI standard. It is intended and used primarily to document simple survey data, or concrete files or products coming out of the social science data production process. DDI-Codebook was the first version of the DDI specification to be published; version 1 was released in 2000. Its origins are in data archiving. It was once a one-to-one relationship between a DDI instance and the physical data product (or data file). DDI-Codebook is widely used for documentation of data for archiving purposes and is well suited for generating codebooks or "data dictionaries". DDI-Codebook 2.0 was released in 2003, with 2.1 following two years later, adding coverage of aggregate data and geography. The most recent version is 2.5, published in 2012. It is backward compatible with version 2.1, incorporates new substantive elements requested by the community, and is designed to make migration of documents from DDI-Codebook to DDI-Lifecycle easier.

37

http://www.ddialliance.org/ 38

http://www.ddialliance.org/Specification/DDI-Codebook/

Page 24: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

24/119

2.3.3 DDI-Lifecycle

Social science research data are complex, and the traditional document-centric approaches limit the opportunities for comparative and interdisciplinary use of research data. The complexity of managing longitudinal and panel data over time, over geography, and across multiple languages and various releases and versions of datasets is often a barrier to efficient data access, accurate analysis and informed re-use. This has led to the further development of metadata, and controlled vocabularies, for social science research data that support re-use and comparability of data (Thomas, 2005). Despite the strengths of the DDI-Codebook metadata schema specification, it has several limitations. It was built as a digital equivalent of a paper-form codebook and thus does not support modularity. In addition, there is no proper mechanism to add local extensions. To solve these problems, the DDI Alliance developed the DDI-Lifecycle (DDI-L) model, based on a survey lifecycle model (see Figure 2.2). The implementation of the DDI-L model, the DDI-L metadata schema specification, encompasses all of the DDI-Codebook specification and extends it. Expressed in XML Schemas, the DDI-Lifecycle metadata schema is modular and extensible. Figure 2.2. The DDI-Lifecycle Model is a combined model fostering metadata reuse. The figure is based on the

DDI-L model figure at the DDI Alliance webpage39

.

The DDI-L metadata schema specification is designed to document and manage data across the entire life cycle, from conceptualization to data publication and analysis and beyond. It is designed to describe series or groups of studies and to support extensive reuse of metadata. Different types of metadata are organized into packages and various metadata modules are related to each step of the data lifecycle. As a result, the DDI-L version is a highly flexible schema specification that can be used for different purposes by all actors in the data lifecycle. (Gregory et al., 2010.) DDI-Lifecycle provides alignment with other metadata standards such as Dublin Core40, MARC41, ISO1117942 (see Section 2.5.1), SDMX43, and geographic standards such as FGDC44 and ISO1911545, and can work together with PREMIS46 and METS47 (Vardigan et al., 2008).

39

www.ddialliance.org/what [Accessed: 2013-06-26] 40

http://dublincore.org/ 41

http://www.loc.gov/marc/ 42

http://metadata-stds.org/11179/ 43

http://www.sdmx.org/ 44

http://www.fgdc.gov/standards/ 45

http://www.iso.org/iso/catalogue_detail.htm?csnumber=26020

Archiving

Distribution Processing Discovery Analysis Collection Concept

Repurposing

Page 25: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

25/119

The current version of the DDI-Lifecycle Specification is version 3.1. It was published in October 2009, superseding DDI 3.0 (published in April 2008). Version 3.2 is expected to be published during 2013. Compared to the previous version, one important change, that benefits users, is the fact that the 3.2 version requires less mandatory fields. Thus, it becomes easier to create valid DDI documentation even for those organisations lacking certain metadata elements. Due to the complexity of DDI-Lifecycle, it is not yet widely used in the data archiving community (see Section 6.2.1), although several data archives have been charting the possibilities to use it, and new tools and better tool support are being developed.

2.3.4 DDI Moving Forward - Future Developments

New directions for the DDI metadata model were discussed in October 2012 at the Dagstuhl workshop “DDI Lifecycle: Moving Forward”48. In the process of becoming a metadata standard for the social, behavioural, and economic sciences, DDI has continued to add new coverage and functionality to respond to new user requirements. At the moment, DDI is growing beyond social science, for example into the official statistics and medical research communities. For example, instead of concentrating on surveys, blood pressure gauges and magnetic resonance imaging (MRI) scans can be viewed as new types of instruments that capture and export data. There is also a growing emphasis on data from administrative registers and various Internet sources. What is needed is a smart and economical approach to metadata modelling. The new version of the DDI specification should be based on a unified modelling language (UML) data model that can then be expressed in XML Schema, RDF/OWL Ontology, relational database schema, and other languages. It will make it easier to interact with other disciplines and other standards, to understand the specification, to develop and maintain it in a consistent and structured way, and to enable software development that is less dependent on specific versions of the DDI. The DDI alliance is currently working on an RDF vocabulary for data discovery (the DDI Discovery vocabulary) that is set to work with both DDI-Codebook and DDI-Lifecycle. Open linked data represented as RDF is spreading, but most of the datasets are missing relevant documentation (see, for example data.gov). DDI-RDF is a good candidate to enhance the documentation of open linked data. At the time of writing, deliverables from the Dagstuhl workshop are not yet available. They will include drafts of a re-envisioned model-driven DDI specification and its documentation, and mappings to other standards.

2.4 Other Metadata Standards and Standards 46

http://www.loc.gov/standards/premis/ 47

http://www.loc.gov/standards/mets/ 48

http://www.dagstuhl.de/12432

Page 26: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

26/119

In this Section, additional technical standards are described that are commonly used at NSIs and DAs, e.g. for supporting dissemination processes, and for describing archival information.

2.4.1 PREMIS – PREservation Metadata: Implementation Strategies

PREMIS is an international working group concerned with developing metadata for use in digital preservation. In May 2005, PREMIS released the Data Dictionary for Preservation Metadata49, which is the international standard for metadata to support the preservation of digital objects, and ensures their long-term usability. PREMIS is implemented in digital preservation projects around the world and focuses on metadata supporting the functions of maintaining viability, renderability, understandability, authenticity, and identity in a preservation context. PREMIS builds on the OAIS reference model (see Section 5.6), and can be viewed as an elaboration of it, explicated through the mapping of preservation metadata to that conceptual structure. PREMIS metadata is expressed in XML format.

2.4.2 METS – Metadata Encoding and Transmission Standard

The METS50 schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library/archive. METS is used for packaging a set of related digital objects, and a METS document could be used in the role of Submission Information Package (SIP), Archival Information Package (AIP), or Dissemination Information Package (DIP) within the OAIS Model (see Section 5.6). METS documents consist of seven major sections: METS Header, Descriptive Metadata, Administrative Metadata, File Section, Structural Map, Structural Links and Behaviour. The descriptive and administrative metadata sections of METS may point to external metadata, for example to DDI or SDMX metadata. METS metadata is expressed in XML format.

2.4.3 DCMI - Dublin Core Metadata Initiative

The Dublin Core Metadata Initiative (DCMI)51 is a metadata standard used to describe resources for the purpose of discovery, and facilitates quick and easy searching. Dublin Core can be used to describe digital resources for example web pages, physical resources such as books, and objects like artworks. The DCMI provides guidelines for implementing Dublin Core metadata applications using XML as well as guidance on the use of non-DC metadata within DC metadata applications. The "classic" fifteen-element Dublin Core Metadata Element Set has been standardised as ISO Standard 15836:2009.

2.4.4 INSPIRE Metadata Regulation

INSPIRE (Infrastructure for Spatial Information in the European Community) is “an EU initiative to establish an infrastructure for spatial information in Europe that will help to make spatial or geographical information more accessible and interoperable for a wide range of purposes supporting sustainable development”52. The INSPIRE Directive53, entered into force on May

49

http://www.loc.gov/standards/premis/ 50

http://www.loc.gov/standards/mets/ 51

http://dublincore.org/ 52

http://inspire.jrc.ec.europa.eu/reports/registration_form.pdf 53

http://inspire.jrc.ec.europa.eu

Page 27: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

27/119

2007 resulting in the establishment of an infrastructure for spatial information in Europe to support Community environmental policies, and policies or activities which may have an impact on the environment. INSPIRE seeks to ensure that the spatial data infrastructures of the EU member countries are compatible as well as setting in place mechanisms for sharing spatial data across Europe. Metadata is one of these mechanisms: to enable discovery and to act as an information source about spatial data resources. The INSPIRE Metadata regulation was adopted in December 2008.

2.4.5 TEI – Text Encoding Initiative

The Text Encoding Initiative (TEI)54 is a consortium, which collectively develops and maintains a standard for the representation of texts in digital form, mostly in the humanities, social sciences and linguistics. TEI has produced a set of guidelines (a primarily semantic XML format) that specify encoding methods for machine-readable texts, used widely by libraries, museums, university projects, scholars etc. to present texts for online research, teaching and preservation.

2.5 ISO/IEC 111 79 – Metadata Registries

ISO/IEC 1179 is a standard that specifies a framework which contains recommendations for how to design and describe a metadata registry (MR in the rest of this section) in such way that it is prepared for being a part of an integration of different MR systems.

The term metamodel that is used in the documentation of ISO/IEC 11179 can be seen as a reference model (see Section 5.1) for the conceptual model for an MR (see Section 1.4). A conceptual model is independent of implementation technologies, and is used to describe a domain of interest, and in this case the domain of interest is an MR55. The metamodel in ISO/IEC 11179 describe what information is important to have in an MR, and how the different pieces of information relate to each other.

The basic semantic unit in ISO/IEC 11179 is a concept. The standard is written in 6 parts. Their ISO number, name, and publication date are as follows:

Part 1 - ISO/IEC 11179-1 Framework 2004 Part 2 - ISO/IEC 11179-2 Classification 2005 Part 3 - ISO/IEC 11179-3 Metamodel and basic attributes 2003 Part 4 - ISO/IEC 11179-4 Formulation of data definitions 2004 Part 5 - ISO/IEC 11179-5 Naming and identification principles 2005 Part 6 - ISO/IEC 11179-6 Registration 2005 The attributes for describing the meaning and representation are contained in Part 3. The management of these descriptions is done through a registry, the procedure for which is described in Part 6. ISO/IEC 11179 has an additional framework for attaching meaning, and

54

www.tei-c.org 55

As seen in Chapter 3, when developing metadata standards with the intention to integrate those to other metadata standards, the use of a conceptual model is an advantage as well when defining the structure of a metadata standard.

Page 28: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

28/119

that is through the use of classification that is defined in Part 2. The attributes for describing the associations between a classification scheme and data descriptions are in Part2 and in Part 3. Parts 4 and 5 give rules, guidelines, and principles for naming, identification, and forming definitions. Finally, Part 1 is an overall framework description and justification of the other parts.

2.5.1 ISO/IEC 1179 and Metadata Standards

How and why does the ISO/IEC 11179 standard relate to and influence the development and design of metadata standards, such as SDMX and DDI? The developer of an MR that has the intention to integrate the registry with other registries, and wants to use an established metadata standard, should choose a metadata standard that is aligned to the ISO/IEC 11179. Consequently, any metadata standard that is developed to be used in such environments that require integration capabilities should follow recommendations that are given to MRs in the ISO/IEC 11179 standard. Both DDI and SDMX are partly aligned to ISO/IEC 11179. For example, in ISO/IEC 11179, there is described the concept of Classifications as an important part of an MR, and a metamodel that describe how Classifications should be represented in an MR. If we take one step closer to the real world, Classifications can with some loosened restrictions also include for example controlled vocabularies and categories. Code schemes are category schemes with values, so they a maybe a little bit too specific to be divided into the Classification group, but very important in the world of NSIs and DAs. In Chapter 4, there is a presentation of classifications and category/code schemes and controlled vocabularies at NSIs and DAs. Another example of how DDI metadata schema specification and SDMX are aligned to ISO/IEC 11179 is that they support and encourage the MRs to use of concepts to describe the contents of their registries (Askitas et al., 2009; SDMX Initiative, 2011a), including cross-domain concepts for comparison of data from between different resources. In part 6 of the ISO/IEC 11179, the procedure is issued for the assignment of internationally unique identifiers for administered items in MRs, such as the ones described in Section 2.6. In the DDI metadata schema specification, there is support for assigning globally unique identifiers for DDI objects (instantiations of identifiable classes in DDI) (Thomas et al., 2009).

2.6 Metadata Related to Persistent Identifiers Persistent identifiers (PIDs) are used to uniquely identify objects, and thereby support citation of the object. Data citation standards are allowing for increased recognition and rewards for scientific activities on research data and the growth of online published datasets. An effective mean to identify and access relevant data and their potential for significant re-use on the long run is to promote and establish persistent identifier systems both as stable citation method and necessary technical infrastructure. Leading systems for persistent identifiers are Handle (used in several digital library and institutional repository)56, URN: NBN (for digital collections in national libraries) and the DOI (Digital Object Identifier) system57,

56

http://handle.net/ 57

http://www.doi.org/

Page 29: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

29/119

that is widely used to cite scientific articles and – with growing prominence – research data sets, including data stored at data archives. One of the major initiatives that promote citation standard for online published data sets is DataCite58, founded 2009. DataCite is a DOI registration agency. When registering a DOI, metadata is stored for the object in the DataCite metadata registry. One of the services DataCite offers for their users is search functionality for related metadata in the DataCite Metadata Store. Persistent identifiers are not used extensively at the NSIs currently, but should be used to support reusability in research and unique identification of OS microdata.59

2.7 Case Studies

To give a flavour for how metadata is managed in reality, in this section there are case studies with two organisations. Statistics Sweden (SCB), has been standardising its processes intensively, and it is among the leaders in data quality and metadata work. The SCB is also a forerunner in developing register-based systems; for example the Swedish census is based on information from different administrative sources and their work with their own metadata schema is presented. SCB is presented in Section 2.7.1. In Section 2.7.2, the German Institute for Employment Research (IAB) and their collaboration with the Research Data Centre (FDZ, Das Forschungsdatenzentrum), in making OS microdata available for researchers, is described.

2.7.1 SCB – Statistics Sweden

Sweden’s National Statistical Institute Statistics Sweden has included metadata development in its common data architectural strategy. SCB is a partner within the PC-axis Nordic cooperation60 as well as of the GSIM-development team (see Section 5.4) and has also participated in the Neuchâtel group61. By now, several metadata systems and templates containing metadata have been developed. The creation and implementation of the SCBDOK metadata model62 dates back to the late 1980’s and early 1990’s with the intention to be utilised as a means to cover the whole production process. Since the introduction of a major reorganisation programme in 2007, emphasis was put on process-oriented production and an all-embracing strategy with focus on customers as well as efficiency and standardisation issues (Blomqvist, n.d.). The challenge is to make SCB’s production - comprising data collection, processing, storage and dissemination - more efficient, and at the same time achieve increasing quality. SCB envisions a data warehouse that ensures data quality and data consistency, by providing methods to insure consistency, which include keeping the data store in good order, well-documented, and by minimising redundancy. All data in SCB’s data store shall be possible to

58

http://www.datacite.org/ 59

http://search.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=STD/CSTAT/MICRO(2012)14&docLanguage=En 60

http://www1.unece.org/stat/platform/display/metis/Nordic+Metamodel+for+PC-Axis 61

http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=14319930 62

http://www.scb.se/Statistik/BE/BE0401/_dokument/BE0401_DO_20122060.pdf

Page 30: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

30/119

compile in any desired way for resulting in clear, consistent relevant and accurate statistics. (Lundell, 2009)

Metadata strategy at SCB

Several organisational bodies handle metadata within SCB and there is currently no distinct strategy for metadata management. The Process department deals with classifications and documentation. Process managers have the responsibility for metadata that relates to processes. The Communication department’s role is to be in charge of the metadata regarding publication in the statistical databases. Developing the business architecture and its encompassing metadata lies in the realm of the Research and development department. There is a long-term goal that includes common metadata for the whole organisation, and that should be collected when it is created in the production system (Blomqvist, n.d.) SCB’s data warehouse and register architecture vision comprises a strategy where metadata is used to drive and deliver information between different process steps as illustrated in Figures 2.3-2.5.

Figure 2.3. Conceptual view of the data warehouse of Statistics Sweden (DUR vision). Figure is taken from (Blomqvist, n.d).

Figure 2.3 depicts the main flow of data and the successive processing of raw data into statistics. In addition, there are always iterations involving information flows in the opposite direction, e.g. when an error is detected at a late stage during the process that has been caused by deficiencies in an earlier stage in the process (Lundell, 2009). The data warehouse and register development program (DUR), is a major re-development project at SCB. One project within the DUR program is to examine and analyse needs and requirements for one common metadata registry/repository. Currently a feasibility study is

Page 31: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

31/119

undertaken within the DUR-program on tying metadata in the TRITON63 platform to the MetaPlus system. One metadata system for the whole production process will probably not mean one system as such but that the different tools are connected and reuse metadata when possible. Furthermore, MetaPlus development, a project for improving MetaPlus functionality, is planned for release during 2013. This includes multi-language functionality and improved export and administration functionality (Blomqvist, n.d.).

Figure 2.4. Conceptual view of the data warehouse of Statistics Sweden. Figure taken from (Blomqvist, n.d.).

Presently, work on an all-inclusive metadata strategy and development of a generic metadata repository is an on-going process.

Statistical business process model at SCB

Being based on the New Zealand model, SCB’s business process model has a close relationship with the METIS model, except for providing an archive on the process level. Altogether 9 processes can be distinguished (see Figure 2.5): 1. Specify needs, 2. Design and plan, 3. Build and test, 4. Collect, 5. Process, 6. Analyse, 7. Disseminate and communicate. Moreover, there are two overarching phases (not included in Figure 2.5): 8. Evaluate and feedback and 9. Support and infrastructure. Figure 2.5 shows the different processes together with their sub-processes, and for each sub-process activities are defined.

63

Triton is a statistical production platform for integration of common tools for the whole production process. Step 1 which is in production is aimed at supporting data collection integrated with editing tools. It contains metadata for data collection and provides possibilities for metadata driven production.

Page 32: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

32/119

Figure 2.5. Statistic Sweden’s statistical business process model. The figure is taken from (Blomqvist, n.d.).

The business process model is also the foundation for SCB’s present organisation. For example, SCB’s work schedule is based on common processes with process owners for each of the sub-processes, and all development projects at SCB project are based on the process model (Blomqvist, n.d.).

MetaPlus and classifications at SCB

MetaPlus was implemented and introduced in 2007 in order to improve overview issues. MetaPlus can be described as SCB’s data dictionary. It is an ISO/IEC 11179 (see Section 2.5) based model used with a Neuchâtel-based classification database64 that we can look upon as a subsystem of MetaPlus. Users can thus browse hierarchies within the SCB microdata documentation system on the Statistics Sweden website. The current MetaPlus model is available in English. Version 2.0 of the MetaPlus application is going to be available in 2013, with multilingual support (Blomqvist, n.d.). The primary development goal for MetaPlus is documentation of final observation registers that describe microdata in the data collection phase. However, it is general and can be used for all stages of the production process, also to describe aggregated data. Furthermore, MetaPlus can serve to get variable definitions and construct samples in the design phase. (Blomqvist, n.d.)

Experiences Summarized from the Metadata System Implementation at SCB

Statistics Sweden has formulated the following points to stress the lessons learned, when dealing with the design and implementation of metadata systems:

64

https://www.h2.scb.se/metadata/klassdb.aspx

Page 33: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

33/119

1. Different types of users should be involved at an early stage. Divergent groups were formed making up for methodologists, subject-matter statisticians and system developers.

2. The project should be content-driven, not IT-driven. 3. Small-scale, simple prototypes should be used from the beginning to be able to get

feedback and input from users already at an early stage. The exclusive use of written requirements tends to be too abstract. People participating in the project need to be aware of the complexity of the task and should be allowed to spend sufficient time on it to reach a successful conclusion. (Blomqvist, n.d.)

Having a look at the current situation for Statistics Sweden, we will find that several metadata systems are in use, which are not formally connected. “This has a lot of drawbacks as the same information has to be loaded manually into several systems. The vision for the future is to tie the different parts together into an integrated system of systems. This does not mean one giant system but subsystems that are linked to one another.” (Blomqvist, n.d., p. 6) Furthermore, the explicit goal in the long run is to be able to produce common metadata for the whole organization. According to future plans a production system is going to be introduced that follows the principle that the collection of metadata should start together with its creation. (Blomqvist, n.d.)

2.7.2 IAB – Institute for Employment Research

The Institute for Employment Research (IAB, Institut für Arbeitsmarkt- und Berufsforschung) is the research institute of the German Federal Employment Agency (BA, Bundesagentur für Arbeit). The data deriving from the IAB and the BA is prepared by and provided for the research community by the Research Data Centre (FDZ, Das Forschungsdatenzentrum) of the BA at the IAB. The goal of the FDZ is mainly to facilitate access to BA and IAB research microdata for non-commercial empirical research using standardised and transparent access rules. The FDZ thereby mediates between data producers and external users and controls for compliance with data protection regulations. The special challenge thereby is that the FDZ has to deal with survey data as well as with register data65.

Data sources for FDZ data products - IAB and BA

The data products provided by FDZ are collected in two different ways. The IAB carries out own surveys. While the survey is led by IAB interviews, data collection and parts of data editing are done by data collection agencies. Those surveys reflect the “normal” way of data collection. Social scientists create theory-guided questionnaires that should produce rich data products for scientific research. The second data source provided by the FDZ is register data. Such data is not collected for scientific research. It is generated during administrative processes or for other purposes, like controlling activities. For a clearer picture the procedures of data collection for process data of the German Federal Employment Agency should be described in more detail.

65

The expressions register data, administrative data and process (generated) data are seen as one and the same.

Page 34: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

34/119

The BA has a network of approximately 700 “job centres” located in almost every bigger German city. People that have to deal with unemployment issues, training courses and so on, visit those centres. Process relevant information is put into forms by the staff of the job centres. The forms are fed into different software products resulting in the following sources: (1) the benefit recipient history, (2) participation-in-measures history file, (3) the unemployment benefit II recipient history, and (4) the jobseeker history. In addition, the BA is receiving data from the social security notification system. This is information about every employee in Germany on a daily basis (only people participating in the statutory social security system are included; i.e. most self-employed are missing) that, depending on the data source, reaches back until the year 1975. All those data is transferred from the distributed data collection locations and different software solutions into the data warehouse of the BA on a daily or monthly basis. At this point none of the data is collected for scientific research. It is only used for internal processes (calculation of pensions) and for controlling purposes of the BA.

To serve those different tasks different data marts are created within the DWH. One data mart is responsible for scientific research. The reasons for the existence of this data mart cannot be found in internal administrative needs of the BA. There are laws in Germany that force institutions like the BA to make their data also available for scientific research. The extracts of the DWH are exported into large SAS files (because SAS can handle large amounts of data). Within those files the whole population (employees in Germany, participants in training courses) is captured. These SAS files are used to create the standard data products offered by the FDZ and for specific research questions by the different departments of the IAB. At this point it is obvious that any data documentation already done was not created with scientific research in mind. The real data documentation for scientific research starts now and is done by the FDZ staff. The described data collection process is depicted in Figure 2.6.

Figure 2.6. Overview of data sources at FDZ. The figure is taken from the FDZ website

66.

66

http://fdz.iab.de/en/FDZ_Overview_of_Data.aspx

Page 35: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

35/119

Preparation of research data at FDZ

In order to create research data for scientific purposes sub-samples of the SAS files are now put into Stata-files. The final data products are research data files stored in Stata format (occasionally also transferred into other statistical packages like SPSS). Those are fixed data sets that are sometimes organized in modules (a number of data sets organized according to topics like demographic information, employee structure of an establishment etc.) and that can be enriched by additional variables (once again stored in additional data sets). A data set is thereby a two-dimensional file, with the variables in rows and the cases in columns. Depending on the format of the statistical package uses (e.g. Stata) additional information is connected to the variables within the data files, e.g. variable name, value label etc. The size of the data sets varies and can reach 1.4 gigabyte for Stata-files and 2.8 gigabyte for SPSS-files. In total the FDZ currently offers 10 standard data products for the scientific community; 3 times establishment data; 5 times individual / household data, and 2 times integrated establishment and individual data. During the editing process different procedures have to be carried out. Only some examples should be mentioned. Variables need to be organized in valid answer schemes with valid value labels; due to disclosure control issues, anonymisation procedures have to be carried out; and above all inconsistencies within the data have to be found and solved. When editing survey data resolving inconsistencies is hard work but the good thing is that the editing staff is aware of the data collection process. This is not the case when talking about process data. A very close look at and research about the collection process is therefore crucial. In the case of the BA data for example, information about earnings is very precise because this data is important for the calculation of pensions; data on education on the other hand is quite poor due to the fact that this information is not important within the original data collection process. In addition, the social security data is collected continuously and occasionally new information is not connectable to the older information. Such issues are very important when it comes to the documentation of research data.

Documentation of research data at IAB, BA and FDZ

Research data without a useful and easy to understand documentation is a more or less senseless collection of numbers. The surveys carried out by IAB are organised by different departments. The latter are also responsible for the documentation of the data. Beside the required information, like sample size, weighting, variable description and data quality, a standardisation of the documentation is crucial so users can easily and quickly understand the data and get an answer to the question whether the data can be used to serve his/her research question or not. Documentation standards like DDI (Data Documentation Initiative) are built to serve the needs of data documentation for survey data. When talking about register data the data collection process is not yet that much supported by data documentation standards. Important topics that must be covered by the data documentation are: origin of data, circumstances and original mission of data collection, data quality, captured population, steps and modification the data has run through before becoming published research data, etc.

Page 36: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

36/119

More and more complex data structures like longitudinal data, linked data sources and complex scales are a big challenge for the documentation of data. The special challenge for register data lies in the fact that the people working with the documentation are not able to access all information about the data collection process easily. They have to do research on that issue by themselves. In addition, register data runs through processes that are not touched by survey data. Summing up a data documentation that really informs the users about the data they are working with is even more important for register data than for survey data.

Experiences about research data documentation at IAB, BA and FDZ

The need for data documentation and the overall goal of this work task should be summarized in short: Documentation is hard and sophisticated work; at the same time it is the crucial fundament for scientific research. For data providing organisations it is important to have a structured, standardized and easy documentation workflow. It is equally important to have a structured, standardised and easy dissemination workflow as well. Both topics touch the need to economise resources while doing data documentation. Looking at the recipients an easy and fast understanding of data sources is important. Therefore easy access to the documentation is important as well. Finally, cooperation between different data providers needs to be supported by standardized data documentation. Crucial features are: interoperability between different IT systems and the possibility to export and import data documentation smoothly. The last topic is especially important for data providers like the FDZ of the BA at the IAB. Register data has the advantage that no money has to be spent for data collection; only for data editing and documentation. In addition, register data in general captures larger populations than survey data; even if the variable list is normally smaller. The logical result is to merge survey data with register data in order to generate richer data products that are cheaper to create. Currently a lot of surveys are about to merge their data with the register data provided by the FDZ. Research data can only be used extensively when it is documented in a good and useful way. Such data documentation must fulfil a minimum set of demands: It must be standardised. It has to serve a basic documentation of issues that are relevant for different kinds of data (survey data, register data etc.) and data providers (NSIs, archives, universities etc.). Apart from the basic documentation the standard might also offer support for more specialised documentation needs. But still an agreed on basic documentation is crucial. In addition the standard must support IT tools for, of course, documentation, archiving, and dissemination. To meet all those requirements further research and collaboration is still needed (collaboration is discussed in 6.2.3).

Page 37: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

37/119

3. METADATA STANDARDS IN CO-OPERATION In the DwB project, the work with enhancing one metadata standard to be for enabling search and dissemination capabilities for OS microdata (see Section 1.1.3) has in practice turned out to instead making efforts to combine the functionality of the two of the most used metadata standards at DAs and NSIs, the DDI Metadata Schema Specification and the SDMX metadata standard (introduced in Section 2.2 and Section 2.3). The reason for doing this is that the two standards are working well for their purposes, and will continue to evolve within their own communities. The communities are currently working together to identify how to best use the complementary features of the metadata schemas/standards. Here we introduce two on-going initiatives: the SDMX/DDI Dialogue (Section 3.1) and the Frameworks and Standards for Statistical Modernisation (Section 3.2). It is worth noticing that both the work of these initiatives and the collaboration of the communities are evolving all the time. The High-Level Group for the Modernisation of Statistical Production and Services (HLG) oversees and coordinates international work relating to the development of enterprise architectures within statistical organisations67. The DDI Alliance, in turn, has endorsed the collaboration68 and outreach to the community, specifically the NSIs (Vardigan, 2013). Within the afore-mentioned initiatives, the GSBPM (see Section 5.3) and GSIM (see Section 5.4) models have been used to explore the overlapping and complementary features in the standards. The goal has been to identify issues affecting the coherence of the standards, for example to find ways to relate the DDI metadata to the structural SDMX metadata that is used for dissemination (to describe tables and quality reporting). The descriptive DDI could be used in harmonisation of metadata at the NSIs for building a search portal, with enhanced features according to the processing and dissemination of OS microdata (see D8.4 in DwB).

3.1 The SDMX-DDI Dialogue

Communities that develop and manage the SDMX metadata standard and the DDI metadata schema specifications have observed possible benefits in using complementary features of the two-metadata schemas/standards along the way, and have started fruitful projects to find out how to make these two standards cooperate (Gregory, 2011b). In this Section, a summary is given of informal meetings with members from the SDMX and DDI communities, referred to as the SDMX/DDI dialogue. Collaboration between SDMX and DDI communities, and with other similar-minded interest groups, is largely believed to result in benefits for all organisations wishing to use the standards together. Dating back to at least 2007, implementation experts have been working on the idea that the SDMX and DDI standards are complementary and not competing standards, and that they can be combined in powerful ways. The standards could, for

67

http://www1.unece.org/stat/platform/display/hlgbas/High-Level+Group+for+the+Modernisation+of+Statistical+Production+and+Services 68

http://www1.unece.org/stat/platform/download/attachments/59015399/DDI-SDMX+Collaboration+Response+2011-03-24.doc?version=1&modificationDate=1301130595819

Page 38: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

38/119

example, be used together as a way of supporting statistical production as represented by the Generic Statistical Business Process Model (Gregory 2007; Vardigan et al., 2008; Vale, 2010). In December 2010, a few months before DwB project kick-off, the SDMX Secretariat and the DDI Alliance, together with other stakeholders, started a dialogue to find solutions providing ways to make both standards work together both on a near-term and on a long-term basis. Participants in the Dialogue are specialists with an interest in data management and statistical information management standards. A core aim of the Dialogue was to influence the two standards, and the way they work together, in order to more completely and consistently meet the business needs of producers of official statistics. The Dialogue has identified three potential objectives for collaboration69:

1. To avoid duplication of effort by the two standards, and thus avoid confusion about which standards should be used for specific types of applications

2. To provide reassurance to the user communities of both DDI and SDMX that the end-to-end statistical process can be managed, and that the standards bodies are both considering the needs of users in this area

3. To provide specific technical guidance about the use cases and implementation of the standards for specific purposes

These three objectives need to be understood from the user perspective at both the technical and the business level and should not only be interpreted as being technical markers. Since the beginning of the Dialogue, there has been growing interest in the use of SDMX and DDI together. As a result, in October 2011 well over 50 people were attending the Dialogue. In order to perform the work more effectively, the Dialogue was reorganised into a Core Group and three Task Teams, each with a specific action. The Task Teams were:

1. Task Team 1: Vocabularies 2. Task Team 2: Business Case 3. Task Team 3: Access to Microdata

As part of the SDMX-DDI Dialogue, there is an on-going effort to produce a common vocabulary of terms for DDI and SDMX, to help these standards bodies to coordinate and, as a consequence, to better serve the users (Pellegrino, 2011). Of these task teams, DwB WP7 joined the Task Team 3: Access to Microdata. This Task Team had three priorities: to discuss the microdata use case, to collect information about related initiatives, developing active contacts with the relevant people, and to consider the content of potential communication materials. The microdata use case scenario focused on opportunities for microdata described using DDI to be used as the data source for applications which allow end users to specify and generate aggregate tabulations using SDMX. The aggregate tabulations should be specific to the user’s needs and guaranteed to be appropriately confidentialised. The user driven dynamic

69

http://www1.unece.org/stat/platform/display/metis/SDMX+DDI+Dialogue+-+Overview+Page

Page 39: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

39/119

approach to generating tabulations would offer the NSIs and their users significant advantages compared with the dissemination of “pre-packaged” tabulations. A key challenge is relating the structural and reference metadata associated with the resulting tabulation back to the description of the microdata. Task team 3 identified several related initiatives. Initiatives discussed include, for instance:

- Data without Boundaries project; - StatCan project about a new internal dissemination model where the goal is to harmonize

output on StatCan's website and where DDI will feed an SDMX tool70; - Finnish MIDRAS project (The Micro Data Remote Access System) that investigated how

register-based research can be supported and the availability of register-based data sets enhanced, and ended up recommending DDI Lifecycle71;

- ABS (Australian Bureau of Statistics) REEM (Remote Execution Environment for Microdata) project that can be seen as an existing example of describing microdata using DDI and obtaining tabulated outputs in SDMX format72 (see Section 3.4);

- The proposal to establish an OECD Expert Group for International Collaboration on Microdata Access that would formalise an existing International working group on Microdata Access73;

- Generic Statistical Information Model (GSIM)74 (see Section 5.4).

The work of the SDMX-DDI Dialogue has been complementary and parallel to the work of other initiatives and projects, for example the work with SDMX and DDI at ABS that is described briefly in Section 3.4. In the future, the work of the SDMX-DDI Dialogue will be integrated with other working groups75. The latest information about the SDMX-DDI Dialogue is available at the Dialogue Wiki Pages, hosted by UNECE76.

3.2 Frameworks and Standards for Statistical Modernisation

Frameworks and Standards for Statistical Modernisation77,78 (FSFSM) is a project that focuses on the implementation and integration of the models and frameworks that concern the statistical infrastructure and facilitate the management of statistical data and metadata (e.g. GSBPM and GSIM), and includes liaison with relevant standards bodies in the wider data industry where appropriate, for example the DDI Alliance (Project Outline79). 70

http://www.statcan.gc.ca/consult/2012/access-acces-eng.htm 71

http://www.csc.fi/sivut/e-infra/midras/en 72

http://www.abs.gov.au/AUSSTATS/[email protected]/Lookup/1504.0Main+Features3Sep+2010 73

http://www1.unece.org/stat/platform/display/msis/OECD+Microdata+Access+Group 74

http://www1.unece.org/stat/platform/display/metis/Generic+Statistical+Information+Model+%28GSIM%29 75

Notes of Core Group teleconference on 19 March 2013: http://www1.unece.org/stat/platform/download/attachments/59015399/DDI_SDMX_Dialogue_20130319.doc?version=1&modificationDate=1369207783458 76

http://www1.unece.org/stat/platform/display/metis/SDMX+DDI+Dialogue+-+Overview+Page 77

http://www1.unece.org/stat/platform/display/SFSP/Frameworks+and+Standards+Project+Home 78

http://www1.unece.org/stat/platform/download/attachments/80609989/2+June+2013.docx?version=1&modificationDate=1370605545083 79

Mari Kleemola, (A representative of DwB WP7), Participated in discussions about GSIM/DDI mapping in The Common Metadata Framework Project (SFSP), June 2, 2013.

Page 40: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

40/119

One of the goals of the project is to create a detailed mapping between the information objects in the GSIM with those in DDI and SDMX. The exercise will identify gaps and overlaps (see also Section 3.3 below) and propose solutions where possible. The project aims to complete by the end of 2013 and is expected to result in successful implementations of modernised statistical production processes as well as enhanced versions of standards.

3.3 DDI and SDMX: Overlaps and Gaps

SDMX and DDI Lifecycle have been designed to be complementary rather than competing (Gregory and Heus, 2007; Gregory, 2011b). The complementary nature of these standards is reflected, for example, in the way the standards support different parts of the GSBPM (Vale 2010). An important similarity is that both standards are based on a conceptual model, which in turn forms the basis of the XML representation. The SDMX Information Model is a metamodel for supporting data and metadata reporting, dissemination and exchange in the field of aggregate statistics and related metadata (SDMX Standards 2011). The DDI Lifecycle model is a more specific metadata model, designed to document and manage data across the entire life cycle, from conceptualization to data publication and analysis and beyond (DDI Alliance website). These models have many similarities, in part because both are aligned with ISO/IEC 11179 (see Section 2.5)80, but also because they were authored by a team that had a significant degree of cross-membership in both initiatives (Gregory and Heus, 2007). The SDMX standard has specified the SDMX Information Model using the Unified Modelling Language (SDMX Initiative, 2011b). There is no similar model designed for DDI currently, but the Alliance has recognised the need, and the modeling work is a key objective of the Alliance (DDI Moving Forward, 2013; Vardigan 2013). The DDI Lifecycle specification itself is modular (see Figure 2.2) and the specification can document different stages of the data lifecycle. However, in this context, it is important to distinguish between the model and the metadata, since the model is an abstract description of the data processing procedures and the relation between them, and the metadata is an implementation, or a technical specification, of the model. Much of the reasoning about similarities and dissimilarities of DDI and SDMX is done at the model level. In addition, there are many common and fairly well aligned metadata components between DDI and SDMX, and both standards use similar schemes for maintenance and identification. The alignments at the technical level are intentional (Gregory, 2011b). The relationship between SDMX and DDI has been discussed by Gregory (2011a and 2011b). It is worth noticing that the focus of the standards is very different. SDMX has a strong focus on reporting, collection and dissemination, but is not intended to be used in internal statistical production systems. DDI, on the other hand, is designed to support metadata-driven survey design and microdata production and management. However, there is an overlap at the point

80

ISO/IEC11179 is a widely accepted standard concerning semantics and metadata registries. http://metadata-standards.org/11179/

Page 41: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

41/119

of creating aggregate data or tabulations, and this overlap enables the possibility of transforming a dataset described using DDI into an SDMX dataset with related metadata. There are also things that are not covered well by neither DDI nor SDMX. This is true particularly in classification management, where the Neuchâtel Model for classifications and variables81 provides a better standard. Another issue not covered are business processes, where standards such as Business Process Modeling Notation (BPMN)82 and Business Process Execution Language (BPEL)83 could be used (Gregory, 2011b). It is also worth noticing that the DDI Lifecycle model and the GSBPM model are substantially similar at the top level, and both emphasize the reuse of metadata. The major difference is that the GSBPM has a much greater focus on repeated data collection, because NSIs collect data on fixed intervals (like monthly, quarterly or yearly) whereas in the social science world data is partly collected when needed and required funding is available. However, there are a number of longitudinal surveys with regular waves produced by research institutions as well. Because DDI comes out of the domain of social science research, its terminology is sometimes unfamiliar to those working in NSIs, and vice versa. A typical example is the use of the word “study” in DDI to refer to something that would be called a data collection cycle, or survey, in the NSI world. Also GSIM has many correspondences with the DDI Lifecycle. Most intersections with DDI are in the GSIM Concepts and Structures areas, that include information about questions, concepts, and variables. In addition, the machine-actionable capacity of DDI complements the GSIM Production area. Less correspondence can be found in the Business area (Vardigan, 2013). In forums such as the UNECE's METIS84, Frameworks and Standards Project85 and elsewhere there are on-going efforts to map DDI and SDMX to each other and to the GSIM model, and to chart the similarities and gaps between the standards:

The GSIM process has taken into account DDI and SDMX information models, and will continue to ensure alignment between them and the GSIM, identifying in more detail where there are gaps and overlaps.86

There is an effort as part of the SDMX-DDI Dialogue to produce a common vocabulary of terms, describing similarities and differences.

The SDMX Action Plan contains an item on the interoperability of SDMX and DDI87.

Working with the NSIs and the SDMX community as well as supporting the development of the GSIM are among the priority list of the DDI Alliance (Vardigan, 2013).

81

http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=14319930 82

http://www.bpmn.org/ 83

http://bpel.xml.org/ 84

http://www1.unece.org/stat/platform/display/metis/ 85

http://www1.unece.org/stat/platform/display/SFSP/Frameworks+and+Standards+Project+Home 86

SDMX/DDI Dialogue Core Group, teleconference 2 May 2012, meeting notes: http://www1.unece.org/stat/platform/download/attachments/65373645/DDI_SDMX_Dialogue_20120502.doc 87

http://sdmx.org/wp-content/uploads/2011/10/SDMX-Action-Plan-2011_2015.pdf

Page 42: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

42/119

Data without Boundaries project, Deliverable D8.2 "Metadata model: Metadata model for all metadata to be managed" defines the metadata objects (as an UML model) needed to support search and discovery of microdata, and provides a mapping between DDI and SDMX and the model.

3.4 DDI and SDMX at Australian Bureau of Statistics (ABS)

The Information Management Transformation Program (IMTP) is an effort currently being undertaken by the Australian Bureau of Statistics (ABS).88 The aim of fundamentally reshaping policies and strategies related to metadata management used in the past two decades. For IMTP, metadata management is the main prerequisite, with DDI and SDMX being the key standards that will be applied. This combination is especially interesting because the complementary qualities of the SDMX and DDI standards can be used. See Figure 3.2 for a graphical overview of the future environment of ABS. For statistical processes, the joint use of existing statistical information standards is fit for purpose due to the fact that the existing implementation allows description of microdata, tabulations and other artefacts, which are associated with the statistical production processes. SDMX has to be a part of the supporting, integrated, end-to-end statistical production processes. For NSI usage, whose inputs are predominantly microdata, SDMX needs to work together with the DDI-L standard as it applies to earlier phases of the production process (see Figure 3.1). From ABS’ point of view and seen from the angle of IMTP, DDI is not ideal in every aspect at a detailed level. Therefore, determination of open and weak points is still being worked upon. Some weaknesses are identified – DDI is very weak in the area of survey processing, such as editing and derivations, and there is no support for confidentiality analysis or time series processing. Further work will be done by ABS, together with the DDI community, on how to adjust the current DDI schema specification for usage in the IMTP work until a DDI schema specification version is published that capture the identified problems.

88

http://www1.unece.org/stat/platform/display/metis/ABS+IMTP

Page 43: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

43/119

Figure 3.1. Standards working together, a GSBPM view. Figure is taken from a description of IMTP work at ABS on the UNECE web site

89.

Figure 3.2. Indicative illustration of the future environment at ABS. Figure is taken from a description of IMTP work at ABS on the UNECE web site

90.

89

http://www1.unece.org/stat/platform/display/metis/ABS+IMTP

Page 44: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

44/119

4. VOCABULARIES AND CODING SCHEMES

4.1 Beneficial Vocabularies and Coding Schemes

Having a metadata standard for the description of data is essential but generally it is not enough when harmonizing metadata from different resources. Metadata standards are used to represent a structure for the description of resources. The structure supports the mapping of different resources by describing what metadata, or concepts, are comparable in the resources. The problem is, that metadata standards do not always specify the values, or terms, that are used to describe the resources. Or, in other words, metadata standards do not give restrictions of the values, or terms, that the metadata elements are instantiated with. When each metadata producer instantiate the elements without any negotiation about the values, it leads to inconsistent use of terms for metadata, or concepts, with in turn leads to misunderstandings and complicates drastically the machine-actionability and interoperability. Furthermore, it is hardly cost efficient if each organisation develops its own (inconsistent) sets of terms and definitions. For example, if we have a dataset about a system in which employers, workers and their representatives and the government interact to set the ground rules for the governance of work relationships, we might describe the scope of that data using either the term labour relations or the term industrial relations. A human being might know that both terms refer to the same thing; a machine has no way to make such connection. Ideally, the terms should be expressed by different national and international organizations using common terminology. When sharing or combining metadata and/or data from different sources, we need to make sure the meaning of the terms are understandable and explicit. Controlled vocabularies will enable this. A controlled vocabulary is a formally managed set of terms that are used in a specific community to represent concepts. The terms must be accepted, defined and managed using agreed-upon procedures. There are three broad categories of controlled vocabularies, in increasing level of complexity: flat, multi-level and relational. A coding scheme is a type of flat controlled vocabulary consisting of a set of codes and their meanings. Thesauri and ontologies are both examples of relational controlled vocabularies. Relational controlled vocabularies are the most automated and machine-actionable vocabularies (Neiswender, 2009). Classifications can be viewed as special kind of controlled vocabularies. Classifications are usually hierarchical and thus resemble thesauri, but the difference is that classifications are more rigid in structure and that classifications assign a (numerical) code to each term. Generally, a classification consists of terms, the definition of the terms, and the codes given to the terms. The main purpose of classifications is to organise resources. Statistical

90

http://www1.unece.org/stat/platform/display/metis/ABS+IMTP

Page 45: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

45/119

classifications represent a subset of classifications that are meant to organise, describe and present statistics (Hoffman and Chamie, 1999). By using controlled vocabularies, data can be described in a consistent, accurate, machine-actionable and easily manageable way. Controlled vocabularies also simplify the description process and provide opportunities to automation, resulting in cost-effectiveness in producing the metadata. An on-going project by the UK Data Archive tests the capacity and quality of automatic indexing using the HASSET thesaurus. Automatic indexing is being applied to the UKDA’s collection, and the automatic indexing will provide a ranked list of candidate keywords to the human expert for final decision-making (El-Haj, 2012). From the users' point of view, controlled vocabularies offer possibilities to enhance the searches and data discovery. Both the data archives and national statistical institutes use various kinds of vocabularies, for example definitions of key concepts and statistical domain specific variables. These might be global, European, national or local. Examples of global vocabularies include the OECD Glossary of Statistical Terms91, UN Glossary of Classification Terms92 and the Metadata Common Vocabulary (MCV)93 (see also 4.2.4 about MCV). European examples include Eurostat's Concepts and Definitions Database (CODED)94 and the multilingual European Language Social Science Thesaurus (ELSST)95. Suitable vocabularies are essential tools for producing good quality metadata, but not sufficient. Without proper instructions and guidelines on how to use them, the costs of producing, updating and maintaining vocabularies might become greater than the benefits. As the OECD puts it:

"Irrespective of the tool(s) adopted (glossaries, thesauri), there is still the need for senior management within an organization to ensure that appropriate practices and principles

involving the use consistent terminology are developed and adopted across the organization" (OECD, 2007).

The following three sections describe typical classifications and vocabularies used by NSIs (4.2) and by the Data Archives (4.3; 4.4). The usage of controlled vocabularies (including classifications etc.) will be discussed in Chapter 6, which focuses on the state of the art in metadata usage at the NSIs and DAs. Section 6.1 summarizes the monitored metadata usage at NSIs and co-operation with archives. The results of a survey for DAs regarding the usage of metadata standards, controlled vocabularies and their co-operation with NSIs in their countries is provided in section 6.2.

91

http://stats.oecd.org/glossary/ 92

http://unstats.un.org/unsd/class/family/glossary_short.asp 93

http://www1.unece.org/stat/platform/display/metis/SDMX+-+Metadata+Common+Vocabulary http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM_DTL_GLOSSARY&StrNom=CODED2&StrLanguageCode=EN 95

http://elsst.esds.ac.uk/

Page 46: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

46/119

4.2 Classifications used by NSIs

Classifications are part of the metadata that are vital for both end users and producers of statistical data. The use of international concepts, classifications and methods promotes the consistency and efficiency of statistical systems. Practice has shown that different organisations often have similar priorities and that this is especially true for the development of, among others, statistical classifications (UNECE 2009). The United Nations Classifications Registry96 contains updated information on Statistical Classifications maintained by the United Nations Statistics Division (UNSD). The UN website holds information about both international work on classifications and national classifications. For example, the international family of economic and social classifications is comprised of reference classifications on such matters as economics, demographics, labour, health, education, social welfare, geography, environment and tourism. The UN site also lists national classifications and country practices in classification. For example, the table about use of classification about activity, product, expenditure and national accounts in European countries97 shows that there are a variety of classifications in use. The results of a new survey conducted in 2012 are forthcoming and will be of interest to the DwB project. For European NSIs and European users of statistical data, RAMON98, Eurostat's metadata server is an important source of information about metadata and classifications. The main objective of RAMON is to make available both past and present classifications and metadata to help users in the analysis of statistical data. RAMON is organised into six broad categories: concepts and definitions; classifications, standard code lists; legislation and methodology; glossaries and thesauri; and national methodologies. Currently the RAMON metadata classifications category contains about 138 classifications and their descriptions. The RAMON service also includes a list of websites that contain further information about classifications used in the European national statistical institutes99.

4.2.1 The NUTS Classification

The NUTS classification, or Nomenclature of territorial units for statistics, is a hierarchical system for dividing up the economic territory of the EU for the purpose of the collection, development and harmonisation of EU regional statistics, socio-economic analysis of the regions, and framing EU regional policies100. As such, it is of great importance to both statistical institutes and researchers (However, one problem with the NUTS classification is that it is based on countries administrative divisions and cannot completely provide fully comparable units).

96

http://unstats.un.org/unsd/cr/registry/regct.asp?Lg=1 97

http://unstats.un.org/unsd/cr/ctryreg/ctrylist2.asp?rg=7 98

http://ec.europa.eu/eurostat/ramon/ 99

http://ec.europa.eu/eurostat/ramon/contact_points/index.cfm?TargetUrl=DSP_WEB_PAGES 100

http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/introduction

Page 47: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

47/119

The NUTS classification is developed by the European Union and covers the member states in detail. The NUTS regions are based on the existing administrative regions or units. The three NUTS levels are:

NUTS1: major socio-economic regions

NUTS2: basic regions for the application of regional policies

NUTS3: small regions for specific diagnoses

The current NUTS classification, valid from 1 January 2012 until 31 December 2014, lists 97 regions at NUTS 1 level, 270 regions at NUTS 2 level, and 1294 regions at NUTS 3 level.

4.2.2 NACE - Statistical Classification of Economic Activities in the European Community

NACE (Nomenclature générale des Activités économiques dans les Communautés Européennes) is part of an integrated system of classification in that field to allow respective comparison of statics on national, European and word level, and is developed since 1970 in the European Union. NACE provides a framework for collecting and presenting a large range of statistical data according to economic activity in the fields of economic statistics (e.g. production, employment, national accounts) and in other statistical domains. The comparability on an international level of statistics produced on the basis of NACE is due to the fact that NACE is part of an integrated system of statistical classifications, developed mainly under the auspices of the United Nations Statistical Division. The integrated system allows comparability of statistics that is produced in different statistical domains. Consequently, for instance, statistics on the production of goods (reported in the EU according to Prodcom surveys) could be compared with statistics on trade (in the EU produced according to CN) (NACE Rev. 2, 2008).

4.2.3 Neuchâtel Model – Classifications and Variables

The Neuchâtel model was developed in order to make communication between IT-experts and statistical experts easier. Prior to the Neuchâtel model, only IT-experts understood the metamodels, and a terminology model was needed in order to bridge the gap between the two groups. The Neuchâtel group was the first to use a model for classifications and related concepts (Karge, n.d.).

The Neuchâtel model is a terminology model, meaning that it records and presents expert knowledge in a certain area (Karge, 2005). The Neuchâtel Model was developed in two stages. The purpose of the first stage was to create a terminology for a common language and a common perception of the structure of classifications and the links between them. In the second stage, variables and related objects were added to the model (Willeboordse et al., 2006). NSIs use the Neuchâtel model, or other similar models, to create classification databases. The model is general and it is not tied to specific IT software or platforms (Netterstrøm et al., 2004).

A general description of a classification is that it is a system to categorise statistical objects in groups (categories) and classifications are used in order to define and organise data101. The

101

http://www.scb.se/Pages/List____259939.aspx, 2012-11-14

Page 48: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

48/119

definition of classification in the Neuchâtel model is “(o)ne particular named set of structured lists of mutually exclusive categories, which are consecutive over time and describe the possible values of the same variable.” (Netterstrøm et al., 2004). Variables and related objects were added to the Neuchâtel model in the second stage. The model now specifies both the object types and it lists the attributes associated with each object type (Willeboordse et al., 2006). The Neuchâtel model has been used to bridge the gap between IT-experts and statistical experts and a number of NSIs when building metadata systems has used the model (Karge, 2005). Out of the 33 NSIs studied in the Eurostat 2009 Monitoring, 14 of the countries answered that they use the Neuchâtel model for classifications and nine of them answered that they also use the model for variables (for details see section 6.1.1).

4.2.4 SDMX - Content-Oriented Guidelines and Metadata Common Vocabulary

”The SDMX Content-Oriented Guidelines recommend practices for creating interoperable data and metadata sets using the SDMX technical standards. They are intended to be applicable to all statistical subject-matter domains. The Guidelines focus on harmonising specific concepts and terminology that are common to a large number of statistical domains. Such harmonisation is useful for achieving an even more efficient exchange of comparable data and metadata, and builds on the experience gained in implementations to date.” 102 “The Metadata Common Vocabulary (MCV) contains concepts and related definitions that are normally used for building and understanding metadata systems and SDMX data exchange arrangements of international organizations and national data producing agencies. The MCV is part of the SDMX Content-oriented Guidelines and covers a selected range of metadata concepts:

General metadata concepts, useful for providing a general context to metadata (e.g. classification, metadata registry, statistical metadata, statistical production);

Metadata terms describing statistical methodologies and data quality (e.g. frequency, data collection method, data revision, source, adjustment, accuracy, timeliness);

Terms referring to data and metadata exchange (e.g. bilateral exchange, gateway exchange).” 103

“MCV is closely linked to the Cross-Domain Concepts as it also contains all these concepts, stating their definitions and context descriptions. MCV provides ISO/IEC 11179 compliant definitions for a number of statistical metadata terms. CODED (Eurostat concepts and definitions database) and the OECD Glossary of Statistical Terms contain the MCV terms.”104

Beyond the SDMX models controlled vocabularies implicitly incorporated for instance in other standards like in the SDDS (Special Data Dissemination Standard) initiated by the International Monetary Fund (see Chapter 2).

102

http://sdmx.org/?page_id=11 103

http://www1.unece.org/stat/platform/display/metis/SDMX+-+Metadata+Common+Vocabulary 104

http://www1.unece.org/stat/platform/display/metis/SDMX+-+Metadata+Common+Vocabulary

Page 49: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

49/119

4.3 Vocabularies used by the Data Archives

Retrieval and reuse of data requires thesauri, controlled vocabularies and data citation. The CESSDA Catalogue105 enables users to locate datasets, as well as questions or variables within datasets, stored at CESSDA archives throughout Europe. To enable an overview of and to simplify searching among more than 8,000 studies presented in the catalogue, the CESSDA members constructed a common topical classification scheme to describe the overall coverage of the resources held at individual member archives. The multi-lingual thesaurus ELSST enable to overcome the problem of studies described in different languages.

4.3.1 The European Language Social Science Thesaurus (ELSST)

ELSST106 is a broad-based multilingual thesaurus for the social sciences and is used to aid retrieval in the CESSDA Catalogue. ELSST is based on the British thesaurus HASSET (Humanities and Social Science Electronic Thesaurus) of the UK Data Archive. It has been developed over the years by the members of CESSDA to a multilingual thesaurus for use in an international setting. ELSST is a multidisciplinary thesaurus, which comprises about 3,300 descriptors and about 11,000 synonyms. Coverage is most comprehensive in the core social science disciplines: politics, sociology, economics, education, law, crime, demography, health, employment, and, increasingly, technology. The English thesaurus is the source version and the thesaurus is currently translated into eight other languages: Danish, Finnish, French, German, Greek, Norwegian, Spanish and Swedish.

4.3.2 CESSDA Topic Classification

Two thirds of the CESSDA member archives publish data and/or data descriptions of all or part of their collection in the CESSDA catalogue. To get a better overview of close to 8 000 datasets, the datasets are classified at study level by the CESSDA topic classifications. The classification is constructed in two tiers, with 19 topics in the top tier and 81 sub-topics. Most, but not all, of the archives participating in the CESSDA catalogue are using the CESSDA Topic Classification (compare with section 6.2 for further details).

4.3.3 DDI Controlled Vocabularies

An extensive set of controlled vocabularies is being developed for the Data Documentation Initiative (DDI) metadata standard, to be used to describe specific aspects of a dataset across the data life cycle107. Controlled Vocabularies Working Group (CVG) as the management team for the vocabularies performs the work. The major task of this DDI Alliance’s working group is to determine which DDI elements require controlled vocabularies and develop those in an

105

http://www.cessda.org/accessing/catalogue/index.html 106

http://elsst.esds.ac.uk/ 107

http://www.ddialliance.org/controlled-vocabularies

Page 50: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

50/119

international setting, additionally guided by a dedicated DDI CV versioning policy. So far eight Controlled Vocabularies has been developed for the DDI standard:

Analysis Unit,

Character Set,

Commonality Type,

Lifecycle Event Type,

Response Unit,

Software Package,

Summary Statistic Type

Time Method.

In the paper Controlled Vocabularies for DDI 3: Enhancing Machine-Actionability (Jääskeläinen et al., 2009) the authors list a number of advantages for DDI in connection with the use of controlled vocabularies:

Control of synonyms;

Control of lexical anomalies;

Promotion of consistency and efficiency;

Clearly defined terminology;

Promotion of interoperability; and

Support for machine-actionability.

The vocabularies used in the code lists are recorded in this standard format Genericode. Genericode is a specification of the OASIS Code List Representation TC, with a standard model and the presentation in XML provides for controlled vocabularies.108 Another way of representing controlled vocabularies in the field of Semantic Web is the Simple Knowledge Organization System (SKOS), which was published in August 2009 as the new standard by the W3C.109

4.4 Examples on Classifications Relevant in Social Sciences

For researchers comparing different countries or studying change over time within a country, it is essential to use standardised national and /or international classification systems. Below there are some examples of classifications described, which are widely used within social science in addition to mentioned classifications in section 4.2 (e.g. NACE).

4.4.1 ISCO - International Standard Classification of Occupations

One of the main important international classifications, which is intensively used for comparative purposes in the social sciences, is the International Standard Classification of Occupations (ISCO), which is maintained by the International Labour Organization (ILO). The classification describes occupations using the criteria "duties and responsibilities on the job" and structures them through unique parent and child groups.

108

www.genericode.org 109

http://www.w3.org/TR/skos-reference/

Page 51: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

51/119

ISCO belongs to the family of the economic and social classifications that have been registered into the United Nations Inventory of Classifications110. ISCO aims to provide:

a basis for international reporting, as well as the comparison and exchange of statistical and administrative data on occupations,

a model for the development of national and regional classifications of occupations

a system to be used in countries which have not yet developed their own national classifications.111

Research programs like ESS, EVS, ISSP, PIAAC and PISA use the ISCO schema to standardise surveyed occupation. Furthermore, the classification schema ISCO-88 provides the basis to develop measures on status and prestige, for example (Jensen, 2012):

the International Socio-Economic Index of Occupational Status (ISEI),

Standard International Occupational Prestige Scale (SIOPS).

A completely different classification of occupations, which is currently under discussion at Eurostat and within the ESS regards the

Classification schema of Erikson, Goldthorpe, Portocarero (EGP)

and European Socio-economic Classification (ESeC) based on EGP

4.4.2 ISCED - International Standard Classification of Education

National education systems are characterized by a large heterogeneity with respect to the training structures and educational content and make the comparison of national systems relating to various scientific and political issues. Furthermore, education has to be considered a constantly evolving system. Thus, comparative analyses in this field needs to cover also popular extensions of educational programs for very young children as well as changes of the tertiary level of education in Europe because of the Bologna Process (Bachelor, Master and Doctorate). To facilitate comparisons of statistics and indicators on education systems in different countries, the UNESCO supports and nurtures the development of the International Standard Classification of Education (ISCED). The classification is based on uniform and internationally agreed definitions, which are developed in the framework of international and regional consultations between education and statistical experts (however, ISCED cannot capture some important differences in the structure of the education systems (in particular more or less organized in different tracks). Though international classifications are useful and widely used, they cannot capture some structural differences between societies.

A revision of ISCED 1977 was officially adopted by UNESCO member states in 2011. The updated version ISCED 2011 includes significant adjustments reflecting significant changes and developments in the education systems worldwide since the last decades.

110

http://unstats.un.org/unsd/class/family/default.asp 111

http://www.ilo.org/public/english/bureau/stat/isco/index.htm

Page 52: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

52/119

In the framework of ISCED review, the Member States of UNESCO also decided to investigate fields of education in a separate process. A panel of experts led by UIS has developed a first proposal on “Fields of Education and Training”. A global consultation on the draft will take place in 2013, led by the intention to establish an independent but related ISCED classification. 112

4.4.3 Coding Schemes for Standards of Socio-demographic Characteristics

Particular classifications are in use at social science research providing figures or aggregated statistic. To comprise on socio-demographic characteristics standards coding schemes are applied on national level like the “Demographic and Regional Standards (2010)” for Germany. Together with standards like ISCED or ISCO or ESeC such standards and coding schemes of socio-demographic characteristics are used in the field of "labour market analysis" or "analysis of social inequality". The aim of the demographic standards is to achieve better comparability between the results of different studies in Germany by harmonizing socio-structural variables used in related population surveys (surveys of households or individuals). The current 5th edition of the publication provides a complete adjustment according the changing circumstances and conditions in the social environment since the previous version as of 2004. To facilitate the application of the Demographic Standards questionnaires for further use in different types of surveys are provided. Conversion keys allows for transformation of coded occupations using ISCO-88 (COM) in the Demographic Standards on the one side and the “Classification of Occupations” of the Federal Statistical Office used in German microcensus on the other side.113. Developed for research purpose in Germany it claims as well international usage supported by related explanatory notes. A dedicated chapter describes the differences between the Demographic Standards and the Core Social Variables, proposed by Statistical Office of the European Union (Eurostat). The intention of this list of core socio-economic variables is to recommend them as a gradual standard to include in European survey series of households or individuals like Labour Force Survey (LFS) or the Survey on income and living conditions (EU-SILC) (Eurostat, 2007). Thus, the gradual integration of common variables and definitions across several surveys on national and/or European level will allow a better description of sub populations and comparative socio-economic analyses on a more standardized basis.

4.4.4 Classifications on Geography, Countries and Languages

The use of ISO standards or national classification systems for standardized documentation of information about countries or regions, as well as information on languages is a further auxiliary means to support the comparability of data.

112

http://www.uis.unesco.org/Education/Pages/international-standard-classification-of-education.aspx 113

https://www.destatis.de/EN/Methods/DemographicStandards/DemograpicStandards.html

Page 53: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

53/119

ISO 639114 is the international norm to code the names for languages. The system consists of six sub-norms in total, which had been introduced since 1998.

ISO 3166115 is the schema to code existing states (ISO 3166-1), national sub-entities (ISO 3166-2) and formally existing states (ISO 3166-3).

On European level NUTS (described in chapter 4.2.1) is applied to systematically classify and identify spatial units in the member states.

Related regional information are setup on national level, e.g. by “Regionalen Standards (2013)” for Germany. They have been developed “to consider, in a comparable form, the regional context of surveys at different regional levels.” Further tools that are used in that national context are the municipality types defined on the basis of the settlement structure (by Federal Institute for Research on Building, Urban Affairs and Spatial Development, BBSR) or the BIK region size classes (BIK ASCHPURWIS + BEHRENS).116

114

http://www.sil.org/iso639-3/codes.asp and http://de.wikipedia.org/wiki/ISO_639 115

http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm 116

https://www.destatis.de/EN/Methods/DemographicStandards/DemograpicStandards.html

Page 54: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

54/119

5. FRAMEWORKS AND MODELS In this Chapter, reference frameworks and models that are followed by NSIs and DAs, according to both data processing procedures and organisational issues, are presented. The organisational parts of the reference models highlight the different cultures that NSIs and DAs traditionally “grow up in”117.

All of the introduced frameworks and models in this Chapter are reference models according to the definition in Section 5.1.

The United Nations Economic Commission for Europe (UNECE) has formed a working group, Statistical metadata (METIS)118,119, to work more focused on development of standards and frameworks related to processes that occur within agencies that produce national official statistics. In Sections 5.2-5.4, work done by the METIS work group that is central for the DwB project, is introduced; the Common Metadata Framework (CMF, Section 5.2), the Generic Statistical Business Process Model (GSBPM, Section 5.3) and the Generic Statistical Information Model (GSIM, Section 5.4).

In Section 5.6, a reference model for data archiving is introduced, the Open Archival Information System (OAIS), that is followed by a majority of DAs, including a presentation of protocols for quality self-assessment of the DAs according to their data processing procedures and organisational structure (Section 5.6.3). For the NSIs, there is a Data Quality Assurance Framework (DQAF) that is followed by many of those organisations. DQAF is introduced in Section 5.7.

5.1 Reference Models and Reference Architectures

The two terms reference model and reference architecture are used with slightly different meaning in different contexts. In software development, broadly speaking, a reference model is an abstract framework that can be use to describe a domain of interest on an abstract level, using concepts and terminology that is independent of specific technologies or implementations, and a reference architecture provides more details about the proposed technical solution, but is still on such an abstract level that there are not that many details about the specific technology to use (Ruas de Oliviera et al., 2010). Those two terms can analogously be applied on abstract descriptions of organisations (CCSDS, 2002).

117

This is not always true, since some NSIs and DAs have a close cooperation, and there are NSIs and DAs that have chosen to take influence from the other tradition, respectively. 118

From http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/bur/2007/mtg1/9.rev.1.e.pdf: METIS is an old abbreviation from the times of the Statistical Computing Project. It signifies “METaInformation Systems”. In the course of years it has become a sort of a trademark for the joint UNECE/Eurostat/OECD activities on statistical metadata Steering Group on Statistical Metadata (METIS), 119

http://www1.unece.org/stat/platform/display/metis/The+Common+Metadata+Framework/

Page 55: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

55/119

5.2 CMF – Common Metadata Framework

The Common Metadata Framework (CMF) was collectively developed by the national and international statistical organizations, coordinated by the METIS Steering Group and the UNECE secretariat. The aim of the CMF is to provide guidance to national statistical offices in choosing the right standards, models and approaches in developing their metadata systems, and this is accomplished by a living repository of knowledge and good practices related to statistical metadata. The repository requires continuous maintenance and improvement so that is relevant to the needs of national statistical offices120.

The CMF is published as an online publication via the METIS Wiki121, and is divided into four parts (A, B, C and D) and each part concentrates on different practical and theoretical aspects of statistical metadata systems as follows:

Part A – Statistical Metadata in Corporate Context – describes issues surrounding management and governance of statistical metadata system projects. The aim is metadata advocacy, to explain the importance of a good metadata management system for the efficient production of statistics.

Part B – Metadata Concepts, Standards, Models and Registries – provides information about concepts, international standards and models. It explains the key features of these standards, their applicability and how they relate to each other.

Part C – Metadata and the Statistical Business Process – is intended to explore the role of metadata throughout the statistical production process. The main outputs are the Generic Statistical Production Process Model (GSBPM, see Section 5.3) and the Generic Statistical Information Model (GSIM, see Section 5.4).

Part D – Implementation – comprises a set of case studies from national and international statistical organizations, describing their experiences in developing and implementing statistical metadata systems.

5.3 GSBPM - Generic Statistical Business Process Model

The original aims of the Generic Statistical Business Process Model (GSBPM)122 were to provide a basis for statistical organizations to agree on standard terminology to aid their discussions on developing statistical metadata systems, and to provide a flexible tool to describe and define the set of business processes that is needed to produce official statistics. The model can also be a support when harmonizing statistical infrastructures and can facilitate the sharing of software components.

The GSBPM model is intended to apply on all activities undertaken by producers of official statistics and can be used for description and quality assessment of processes based on

120

http://www1.unece.org/stat/platform/display/metis/Maintaining+the+Common+Metadata+Framework 121

http://www1.unece.org/stat/platform/display/metis/METIS-wiki 122

http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model

Page 56: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

56/119

surveys, administrative records, and other sources. The model can be applied and interpreted flexibly and the steps must not be followed in a strict order. In GSBPM, nine main processes are identified that are undertaken by producers of official statistics (see Figure 5.1), and each process is divided into 3-8 sub-processes. The nine processes are: 1) Specify needs, 2) Design, 3) Build, 4) Collect, 5) Process, 6) Analyse, 7) Disseminate, 8) Archive, and 9) Evaluate. There are also several over-arching processes that apply throughout the nine phases. Two of these processes are the quality management and the metadata management. The quality management includes quality assessment and control mechanisms. As metadata are generated and processed within each phase there is a strong requirement for a metadata management system to ensure that the appropriate metadata retain their links with data throughout the GSBPM.

Figure 5.1. The Generic Statistical Business Process Model (GSBPM). The diagram is adapted from the METIS group homepage

123.

GSBPM has been adopted by many national statistics agencies to different degrees (Lalor, 2011). See also Section 3.2, that introduce a newly formed project that will develop, and promote the use of, GSBPM and other relevant standards for official statistics.

5.4 GSIM – Generic Statistical Information Model

The Generic Statistical Information Model (GSIM)124 is a reference model125 for OS microdata processing that was recently released126, and is a further development of GSBPM (see Section 5.3).

123

http://www1.unece.org/stat/platform/download/attachments/57835551/GSBPM+picture+v4_0.doc?version=1&modificationDate=1301978441104 124

http://www1.unece.org/stat/platform/display/metis/Generic+Statistical+Information+Model+%28GSIM%29 125

The GSIM reference is a wider, much more specified model than how the term Reference Model is defined in Section 5.1. It is for example by design aligned against standards as SDMX and DDI.

1 Specify Needs

1.1 Determinate needs

for information

1.2 Consult & confirm

needs

1.3 Establish output

objectives

1.4 Indentify concepts

1.4 Check data availability

1.6 Prepare business

case

2 Design

2.1 Design outputs

2.2 Desgin variable

descriptions

2.3 Desitgn data

collection methodology

2.4 Desgin frame &

sample methodology

2.5 Design statistical

processing methodology

2.6 Design production

systems & workflow

3 Build

3.1 Build data collection

instrument

3.2 Buld or enhance

process components

3.3 Configure workflows

3.4 Test production

system

3.5 Test statistical

buisiness process

3.6 Finalize production

system

4 Collect

4.1 Select sample

4.2 Set up collection

4.3 Run collection

4.4 Finalize collection

5 Process

5.1 Integrate data

5.2 Classify & code

5.3 Review, Validate &

edit

5.4 Imputate

5.5 Derive new variables &

statistical units

5.6 Calculate weights

5.7 Calculate

aggregates

5.8 Finalize data files

6 Analyse

6.1 Prepare draft

outputs

6.2 Validate outputs

6.3 Scrutinize &

explain

6.4 Apply discolosure

control

6.5 Finalize outputs

7 Disseminate

7.1 Update output

systems

7.2 Produce

dissemination products

7.3 Manage release of

dissemination products

7.4 Promote

dissemination products

7.5 Manage user

support

8 Archive

8.1 Define archive

rules

8.2 Manage archive

repository

8.3 Preserve data and

associated metadata

8.4 Dispose of data &

associated metadata

9 Evaluate

9.1 Gather evaluation

inputs

9.2 Conduct evaluation

9.3 Agree action plan

Quality Management / Metadata Management

Page 57: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

57/119

The extended features of GSIM capture parts of the data processing that were not handled by the GSBPM model, and have been shown to be important to formalize to complete the description of data processing at the NSIs. The extensions include for example information that flow between the different processes. Such kind of information objects consists of data and metadata, but can also contain rules and parameters that are needed to run the data production processes. Although all statistical agencies use classifications, create datasets and publish products, there has not been any common way to describe this information. The lack of common means to describe the information used made it difficult to collaborate, standardize and share tools and methods. The scope of GSIM is to give the information objects agreed names, define them, specify essential properties, and indicate their relationships with other information objects. The GSIM contains objects that specify information about data and metadata (such as classifications) as well as the rules and parameters needed for production processes to run (like data editing rules). GSIM identifies around 150 information objects, which are grouped into four broad groups (see Figure 5.2).

Figure 5.2. The GSIM High-level Information Object Groups (Modernising Statistics 2012). Figure adapted from the METIS Group’s homepage about GSIM

127.

GSIM and GSBPM complement each other and most value can be obtained by applying them together for the production and management of statistical information. Good metadata management is essential for the efficient operation of statistical business processes. In the context of GSBPM, the emphasis of the over-arching process of metadata management is on the creation, updating, use and reuse of statistical metadata. Metadata management strategies and systems vital to the operation of GSBPM are facilitated by GSIM, which in turn supports a consistent approach to metadata. (UNECE, 2012a). Discussions at the 2nd Workshop on Strategic Developments in Business Architecture in Statistics (Geneva, 7-8 November 2012)128 focused on the vision and strategy of the High-

126

http://www1.unece.org/stat/platform/display/metis/2012/12/31/GSIM+Version+1.0+Released 127

http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=59703371

Business

Structures

Production

Concepts

Page 58: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

58/119

Level Group for the Modernisation of Statistical Production and Services (HLG-BAS)129, the future of GSIM, the additional steps to be taken when developing GSIM, and how GSIM can be used in a bigger framework. Metadata standards like DDI and SDMX were seen to have a role in modernising statistical production processes, although the GSIM is not directly tied to them nor other standards or concrete implementation details. There is clearly a need for mapping the standards to the GSIM model. Following the workshop HLG and DDI Alliance have started to explore the potential collaboration between DDI Alliance and the HLG-BAS projects (see Section 3.2). GSIM seeks to cover a lot of ground covered already by existing standards. This is true for both ISO/IEC 11179 and the Neuchatel model for classifications and other standards. The GSIM information on the GSIM home page130 goes in depth on this subject, particularly in Annex B - Influence of existing standards.

The latest information about GSIM is available at the GSIM Wiki Pages131, hosted by UNECE. See also Section 3.2, that introduce a newly formed project that will develop, and promote the use of, GSIM and other relevant standards for official statistics.

5.5 CORA & CORE - Common Reference Architecture and Common Reference Environment

The ESSnet132 Common Reference Architecture (CORA) is a reference architecture based on the GSBPM model for data processing systems at (Scannapenno and Vaccari, 2011). The ESSnet COmmon Reference Environment (CORE)133 is based on CORA, extended with an information model (e-CORA) that takes into account process modelling with focus on definition of sub-processes and communication interfaces. The aim of CORE was to design an environment, in which locally developed services provided by different NSIs could be integrated. In the CORE project an implementation of a prototype software, based on the e-CORA reference architecture, was carried out.

5.6 OAIS – Open Archival Information System

5.6.1 The Model

The Open Archival Information System (OAIS) model is a reference model for use when developing an OAIS (CCSDS, 2012), and is a published ISO standard, ISO 14721:2012134. An

128

http://www1.unece.org/stat/platform/display/hlgbas/Workshop+on+Strategic+Developments+in+Business+Architecture+in+Statistics 129

http://www1.unece.org/stat/platform/display/hlgbas/High-Level+Group+for+the+Modernisation+of+Statistical+Production+and+Services 130

http://www1.unece.org/stat/platform/pages/viewpage.action?pageId=59703371 131

http://www1.unece.org/stat/platform/display/metis/Generic+Statistical+Information+Model+%28GSIM%29 132

http://www.cros-portal.eu/page/essnet 133

http://www.cros-portal.eu/content/core-0 134

http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumber=57284

Page 59: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

59/119

OAIS is defined as an archive that has the responsibility to preserve information in long-term and make the data available for a specific community135. The recommendations for the OAIS model are developed by Consultative Committee for Space Data Systems (CCSDS, NASA), and was originally developed for storing data used in conjunction with space missions. There are some mandatory responsibilities for an organisation to operate as an OAIS, besides the long-term preservation and dissemination functions, e.g. to ensure that the information that is archived is well documented so it is understandable without needing assistance by the producers of the information (CCSDS, 2012). The OAIS model can guide the development of OAIS-related standards, such as metadata standards. The DDI-L schema specification (see Section 2.3), PREMIS (see Section 2.4.1) and METS (see Section 2.4.2) has been pointed out as especially interesting in definition of metadata that is needed for development of software and standards for long-term preservation in social science data archives (Jensen, 2010). According to the archived data objects, the OAIS model defines four types of information. These can be used as guidelines when determing what metadata needs to be gathered and preserved. All the four information types are Information Objects, thus they consist of a Data Object and its Representative Information. The four types are:

- Content Information is the information that is the original target of preservation. As an Information Object, it consists of the Content Data Object and its associated Representation Information.

- Preservation Description Information (PDI) is information that is needed to preserve, identify and understand the context of the Content Information. The PDI itself is divided into five types of preserving information called Provenance, Context, Reference, Fixity, and Access Rights Information136.

- Packaging Information binds, identifies and relates the Content Information and PDI. - Descriptive Information is used to discover the Content Information of interest.

An Information Package is defined as conceptual container of Content Information and Preservation Description Information. These two types of information are encapsulated and identified by the Packaging Information, and the resulting package can be discovered using the Descriptive Information attached to it (see Figure 5.3 for a graphical representation).

There are three types of Information Packages:

- Submission Information Package, SIP. This is the package that is sent to an OAIS by a Producer.

- Archival Information Package, AIP. The OAIS transforms the SIP(s) into one ore more AIPs for preservation.

- Dissemination Information Package, DIP. The OAIS provides all or a part of an AIP to a Consumer in the form a DIP.

135

In (CCSDS, 2012), it is explicitly pointed out that the OPEN in Open Archival Information System implies that the Recommendations for the OAIS reference model are developed in open forums, not that access to the archive is unrestricted. 136

Access Right Information was not in the Blue Book (2003) but was added in the Pink Book.

Page 60: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

60/119

Figure 5.3. A graphical presentation for an Information Package in the OAIS model. Figure adopted from (CCSDS, 2012, Figure 4-1).

In Figure 5.4, the OAIS model is depicted in a diagram that shows a view of different functions and information flow in the model, including the Information Packages. The OAIS presents a detailed functional model consisting of six main functional entities: Ingest, Archival Storage, Data Management, Administration, Preservation Planning, and Access. Additionally, an organisation has to provide Common Services, such as various technical support services, which are considered to constitute another functional entity. The Common Services entity is so pervasive, that it is omitted from Figure 5.4.

5.6.2 OAIS and Social Science Data Archives

The CESSDA DAs have the mission to preserve and disseminate digital social science data. Consequently, by definition, they are Open Archival Information Systems (see Section 5.6.1 above and Duşa et al., 2010).

Various data archives have explored the OAIS model and how it maps to the world of research data archiving. An early adaptor of the OAIS model is the ICPSR. Their self-assessment of OAIS compliance provides an interpretation of the OAIS model in the context of social science data archives (Vardigan and Whiteman, 2007). The UKDA was the first European data archive to explore the OAIS model. Their report (Beedham et al., 2005) proposes a methodology to map archive activities to the OAIS model and warns that the process is both time-consuming and resource intensive. A preliminary evaluation of FSD’s archival and dissemination procedures was carried out in spring 2010, using the OAIS model and the Trustworthy Digital Repositories Audit & Certification: Criteria and Checklist (TDR) (Kleemola, 2012). The OAIS model can be used as a reference model by DAs as well as other organisations dealing with digital data. The model defines what kind of metadata is needed for managing digital material during its lifecycle. The terminology and concepts provided by the model are of great value when developing a system for managing digital data and when discussing the

Packaging Information

Content Information

Preservation Description Information

Descriptive

Information

About Package 1

Package 1

Page 61: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

61/119

requirements for documentation and metadata. The lifecycle events in DDI format are able to reflect the OAIS model, and the GSBPM model includes the archival phase. The OAIS model facilitates comparisons of different organisations’ system architectures.

Figure 5.4. The OAIS model. The diagram is adapted from (CCSDS, 2012, Figure 4-1).

In 2008, the PPP project137 charted the CESSDA member archives’ processes and archival procedures. Despite the diversity of the member archives with respect to their organisational structure, size, legal status and funding sources, they share common characteristics about ingest, storage and access policies (Dusa et al., 2010). This is not surprising; each CESSDA member organisation has a mission to preserve and disseminate digital social science research data. The report concludes that “CESSDA members should review their operations by mapping their practice to the OAIS minimum requirements” (Dusa et al., 2010).

5.6.3 Self-assessment of Archives

CESSDA is currently shifting into the CESSDA European Research Infrastructure Consortium (CESSDA-ERIC, see Section 1.2.2). The collaboration in CESSDA-ERIC is largely based on trust that different Service Providers have the same understanding of certain activities. During 2013, CESSDA will carry out a self-evaluation project of the member archives to provide information on how the CESSDA members meet the requirements of the CESSDA-ERIC Statutes, and what kind of advice and support the Service Providers need. 138

137

PPP, the Preparatory Phase Project for a Major Upgrade of the Council of European Social Science Data Archives (CESSDA) Research Infrastructure, was a two-year (2008-2009) project that focused on tackling and resolving a number of strategic, financial and legal issues in order to ensure that European social science and humanities researchers have access to, and gain support for, the data resources they require to conduct research of the highest quality. The project was funded by EU 7

th Framework Programme (FP7) and was a direct result of

the CESSDA RI being identified by the ESFRI Roadmap exercise as a research network of excellence. http://www.cessda.org/project/ 138

http://www.cessda.org/about/governance/eric_req.html

Preservation Planning

Administration

Data management

Archival

Storage

Ingest Access

P

R

O

D

U

C

E

R

C

O

N

S

U

M

E

R

queries

result sets orders

Descriptive

Info

DIP

AIP

SIP

Descriptive

Info

AIP

MANAGEMENT

Page 62: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

62/119

CESSDA is currently shifting into the CESSDA European Research Infrastructure Consortium (CESSDA-ERIC) (see Section 1.2.2). The collaboration in CESSDA-ERIC is largely based on trust that different Service Providers have the same understanding of certain activities, including the gathering, production and disseminating of sufficient metadata. During 2013, CESSDA will carry out a self-evaluation project of the member archives to provide information on how the CESSDA members meet the requirements of the CESSDA-ERIC Statutes, and what kind of advice and support the Service Providers need.139 The assessment will be based on OAIS and the Data Seal of Approval (DSA) guidelines.

5.7 DQAF - Data Quality Assurance Framework

In the IMF140 Data Quality Assessment Framework (DQAF)141, quality-related features are identified of governance of statistical systems, statistical processes, and statistical products. There are five quality dimensions; assurances of integrity, methodological soundness, accuracy and reliability, serviceability, and accessibility, and a set of prerequisites for data quality.

A framework is provided by DQAF, that statistical organisations can follow to assess their existing practices against best practices and internationally accepted methodologies. The Special Data Dissemination Standard (SDDS) standard is one standard that is recommended within DQAF for statistics, and was established to guide countries that have access to international capital markets. SDDS142 is focused on the dissemination of economic and financial data to the public. Access is provided to a site143 on which members of SDDS can make their data visible144. Each member is responsible for the quality, and for quarterly submitting metadata. The metadata is reviewed by the IMF, e.g. for international comparability.

139

http://www.cessda.org/about/governance/eric_req.html 140

International Monetary Fund - www.imf.org 141

http://dsbb.imf.org/images/pdfs/dqrs_factsheet.pdf, http://dsbb.imf.org/images/pdfs/dqrs_nag.pdf 142

http://dsbb.imf.org/Default.aspx 143

http://dsbb.imf.org/Pages/SDDS/CountryList.aspx 144

PC-Axis is used for the technical solution that is a Windows program which can be linked to a web browser as a helper application. The main module has options to change between stub and heading (pivot function), export tables into other software like MS-Excel, it brings footnotes on different levels, it can make simple diagrams and has a link to the map program PX-Map. PC-Axis works with PC-Axis files.

Page 63: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

63/119

6. THE STATE OF THE ART

In Chapter 6, the state of the art in metadata usage at the NSIs and DAs is presented. The usage of controlled vocabularies (including classifications etc.) will also be discussed, since those are important for harmonisation processes. Also, the level co-operation between NSIs and DAs will be elaborated, since the outcome in the DwB project is heavily dependent on how the NSIs and DAs within countries work together, both on the level of organisational agreements for co-operation and on the technological level. We will first present results from three different surveys; two surveys that are summarized that monitor the metadata usage at NSIs (Section 6.1); and for the DAs there is a survey regarding the usage of metadata standards, controlled vocabularies and their co-operation with NSIs in their countries (Section 6.2).

In Section 6.3, the level of cooperation between the NSIs and DAs will be discussed, in the light of information in this report. An analysis and discussion, including some concluding remarks, about the usage of metadata standards at the European NSIs and CESSDA DAs are presented in Sections 6.4 and 6.5, based on the results of the surveys and on information in the other information in this report.

6.1 The NSI Community

The metadata standards usage assessment at the NSIs will be based on relevant parts of two surveys that have been conducted by other parties than WP7 members; one Eurostat report on monitoring metadata systems at European NSIs from 2009 (see Section 6.1.1) and one survey conducted by WP8 in DwB (D8.3, see Section 6.1.2).

6.1.1 Eurostat Monitoring of National Metadata Systems - Phase 1

To gain an overall view of the current metadata production and dissemination in the European Statistical System (ESS, see Section 1.2.2), Eurostat has been monitoring national metadata systems in the ESS since 2008. Three phases of assessment have taken place, the first 2008-2009, the second 2009-2010 and the third 2010-2011 (Eurostat, 2010).

For the benefit of this report, we chose to analyse the results from Questions 5-6.1 (see Figure 6.1) in the questionnaire from Phase 1 (Eurostat, 2010). In Table 6.1 the responses for those questions are summarized. The criteria for being compliant to a standard according to this particular survey are described in the survey report (Eurostat, 2010).

Figure 6.1. Questions 5 - 6.1 in the 2009 Eurostat survey - Monitoring of National Metadata Systems Phase 1 (Eurostat, 2010). A summary of the responses for the questions is presented in Table 6.1.

Question 5: Is your metadata production system SDMX-compliant?

Question 6: Is your metadata production system tailored to other standards or templates?

Question 6.1: If YES, please indicate which of the following (DQAF/SDDS, Metadata registry standard *model, Neuchatel variables model, Other) it is tailored to.

Page 64: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

64/119

Main findings that are relevant for the work in WP7 in DwB, were concluded in the report (Eurostat, 2010). The most used metadata standard used in the ESS was 2008/2009 the SDDS standard (see Section 5.7). Half of the NSIs were preparing to use the ESMS standard (a version of the SDMX standard that is modified to fit European statistics systems (see Section 2.2.3), and a majority of the NSIs were currently using ESMS concepts.

Table 6.1. Summary of results of questions 5 and 6 (in Figure 6.1) from Eurostat assessment of metadata production (Eurostat, 2010) at the European NSIs 2009. The columns contain information as follows: column 1 – the monitored NSIs, column 2 – results of question 5 in the questionnaire, columns 3-9 – results of question 6.1. An ‘X’ in columns 3-9 indicates that the NSI uses the standard or template.

SDMX- Compl (2009)

DQAF/ SDDS (2009)

ISO/IEC 11179 (2009)

DDI

(2009)

Dublin Core

(2009)

Neuch. Class

(2009)

Neuch. Var

(2009)

Other

(2009)

Austria: Statistics Austria No x Belgium: Statistics Belgium Yes x x Bulgaria: National Statistical Institute No x x x Croatia: Croatian Bureau of Statistics Yes x x x Cyprus: Statistical Service of Cyprus No x Czech Republic: Czech Statistical Office No x x x x Denmark: Statistics Denmark No x x x x Estonia: Statistics Estonia No x x x Finland: Statistics Finland Yes x x x France: National Institute of Statistics and Economic Studies

No

Germany: Destatis Yes x x x Greece: Hellenic statistical authority No Hungary: Hungarian Central Statistical Office

Yes x x

Iceland: Statistics Iceland No x Ireland: Central Statistics Office Yes x x x x Italy: Italian National Institute of Statistics

Yes x

Latvia: Central Statistical Bureau of Latvia

Yes x x

Lichtenstein: Amt für Statistik No Lithuania: Statistics Lithuania - Luxembourg : Service Central de la Statistique et des Etudes

No x

Malta: National Statistics Office No x Netherlands: Statistics Netherlands No x x x x Norway: Statistics Norway No x x x x x x x Poland: Central Statistical Office No x Portugal: Statistics Portugal Yes x x x Romania: National Institute of Statistics No x Slovak Republic: Statistical Office of the Slovak Republic

No x x

Slovenia: Statistical Office of the Republic of Slovenia

- x x

Spain: National Statistics Institute No x x Sweden: Statistics Sweden No x x x Switzerland: Swiss Federal Statistical Office

Yes x x x x x x

United Kingdom: Office for National Statistics

No x x

Turkey: Turkish Statistical Institute -

It was also concluded that compliance of metadata with standards and templates, both nationally and internationally is considered important by 80% of the NSIs, but there is work to

Page 65: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

65/119

do for many NSIs to configure their dissemination systems to be fully compliant with relevant standards (Eurostat, 2010).

6.1.2 DwB WP8 Survey: Questionnaire on Metadata Data (at NSIs)

A survey was conducted by the WP8 group in the DwB project 2011 for collecting technical information about microdata and metadata at the NSIs to accomplish their task145. The survey results are partly of interest for the work of WP7 as well. The response rate for the survey was 26 out of 34146. For the benefit of this report, we are mostly interested in the results regarding the usage of metadata standards at the NSIs. Table 6.2. Metadata standards and other standards used by NSIs, and the magnitude of NSIs using them. The contents of this table are adopted from a deliverable (D8.3) from WP8 in DwB.

SDMX (full or large extent compliant) 6

SDMX (partially or small extent compliant) 22

DQAF/SDDS 21

Metadata registry standard ISO/IEC 11179 7

Data Document Initiative (DDI): 2

Dublin core 10

Neuchatel classification model 13

Neuchatel variables model 10

Other standards 6

One remark done in the report, in which the results of the survey was presented (D8.3 in DwB) is, that all NSIs provide the following metadata when users are asking for sets of microdata:

definitions of statistical variables, concepts, indicators,

sampling techniques, sample description, the questionnaire,

methodologies, changing during the time on the methodologies,

calculations, validations, checking and imputations rules. The survey’s results show that a vast majority of NSIs are using the SDMX structure and DQAF/SDDS (SDDS format is used by the majority due to the imperative request of IMF regarding the uniformity of all countries for a defined set of socio-economic-financial indicators). One of the main findings in the report of the WP8 survey is that, in general, all NSIs are providing metadata but the structure of the metadata is varying. Some numbers for metadata usage:

18% of NSIs are using fully SDMX structure and

69% are using it partially and

145

”Improving Resource Discovery for OS data”. The information was necessary from the technical point of view of organization of databases and type storage, IT applications and access methods on the existing metadata resources hold by NSIs. 146

D8.3, DwB project. The questionnaire is presented in the file Annex_Micro_Metadata_NSI_Quest.doc. All answers of the NSIs are archived in Micro_Metadata_NSIs_Survey.zip and summarised in Summary_report_countries_Micro_Metadata.xls.

Page 66: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

66/119

almost all NSIs use SDDS format requested by IMF but this in not extended for all their statistics.

25 NSIs (76%) are using at least 13 of the ESMS (SDMX modified for European statistics systems, see see Section 2.2.3) concepts. Furthermore,

19 NSIs (58%) compile information on 17 or more ESMS concepts Additionally, the technical environment is of varying grade of modernization, e.g. some of the NSIs are not using database management systems for structured data storage, while others have online databases, both for data and metadata. Furthermore, disseminated metadata is mainly in HTML, PDF, Word and Excel format, and the interchange format for metadata is (between different statistical systems) mainly Excel, Word and PDF format.

6.2 The Data Archive Community

A survey was distributed by WP7 in DwB to the CESSDA archives (in 2012, see Appendix 3) with the intention to get information about the co-operation with their countries NSIs, DDI usage and usage of controlled vocabularies. For the metadata standards usage at the DAs, a summary of the results will be presented in Section 6.2.1, the controlled vocabularies in 6.2.2 and co-operation with NSIs in Section 6.2.3. The full responses are included in Appendix 3.

6.2.1 Metadata Standards Usage at the DAs

In the WP7 survey to the DAs, one block of questions was about if DDI is being used at the DAs, and which version; the DDI-Codebook and/or the DDI-Lifecycle (the questions are presented in Figure 6.2). Thus, the state-of-the art in the DAs refers only to the DDI standard. The use of DDI schema specification at the DAs is of special interest in the work of WP7 in the DwB project, since the CESSDA portal is based on DDI (see Section 2.1). Results of the survey could reveal interesting circumstances according to the harmonisation against DDI schema specification, and the usage of DDI internally at the DAs. Also, the extent of use of the DDI-Lifecycle is of interest, since the work with enabling co-operation between DDI and SDMX is focusing on the DDI-Lifecycle version of the schema specification. Of 21 CESSDA DAs that received the survey, 5 did not respond. Figure 6.2. Survey question about DDI usage at the CESSDA archives (question Block 2 in the WP7 questionnaire, see Appendix 3). A summary of the responses for the questions is presented in Table 6.3.

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How? o Are you using DDI-lifecycle (DDI3)? How? o If you are not currently using DDI-lifecycle, are you planning for DDI-

lifecycle? How?

Page 67: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

67/119

Table 6.3. Results of the questions in Block 2 (see Figure 6.2) in the survey sent to the CESSDA Data Archives. Column 1 contains the names of the DAs. The information in Column 2 was achieved at the CESSDA portal

147 for

comparison. An asterisk (*) indicates that there is more information added in the response (see Appendix 3). A

hyphen (-) means they did not complete the survey/question.

Publisher CESSDA

catalogue

DDI-Codebook

DDI-Lifecycle

Planning for DDI-Lifecycle

Austria: WISDOM Yes Yes No No*

Czech Republic: SDA No - - -

Denmark: DDA Yes No Yes na

Estonia: ESSDA No - - -

Finland: FSD Yes Yes No Yes*

France: RQ Yes Yes No Yes*

Germany: GESIS Yes Yes* Yes* Yes*

Greece: EKKE Yes Yes No Yes

Hungary: TARKI No Yes No Yes*

Ireland: ISSDA Yes No No No

Italy: ADPSS Yes No No Yes*

Lithuania: LiDA No - - -

Luxembourg: CEPS No - - -

Netherlands: DANS Yes Yes* Yes* -

Norway: NSD Yes Yes Yes* Yes*

Romania: RODA No Yes No Yes*

Slovenia: ADP Yes Yes* No* Yes*

Spain: CIS No No* No Yes

Sweden: SND Yes Yes Yes* -

Switzerland: FORS Yes Yes* No Yes*

United Kingdom: UKDA Yes - - -

The responses of the questions about DDI usage are presented in Table 6.3. The information in the table has been completed to indicate if the DAs have published any studies in the CESSDA portal. 12 of the responding DAs had published studies in the CESSDA portal at the time when the survey was distributed.

Descriptive summary of results Out of the 16 of the responding DAs 13 of them use one or more DDI metadata schema specifications to describe their resources:

12 DAs use DDI-Codebook, of which 8 use DDI-Codebook exclusively

5 DAs are using DDI-Lifecycle, to full extent or partially, of which 1 use DDI-Lifecycle exclusively.

8 DAs are planning to start to use DDI-Lifecycle (11 have responded Yes on that question, but 3 of them already use DDI-Lifecycle to some extent).

2 DAs do not use any of the DDI schema specifications (ISSDA and ADPSS).

147

The CESSDA Data Publishers list at http://www.cessda.org/accessing/catalogue/ Accessed November 2012

Page 68: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

68/119

6.2.2 Controlled Vocabularies at Data Archives

To get an overview of the present use of controlled vocabularies (CVs) at the European data archives, the issue was given one block of questions in the WP7 survey to the DAs (the questions are presented Figure 6.3).

Figure 6.3. Survey question about usage of controlled vocabularies at the CESSDA archives (question Block 3 in the WP7 questionnaire, see Appendix 3). A summary of the responses for the questions is presented in Table 6.4.

Table 6.4. Results of the questions in Block 3 (see Figure 6.3) in the survey sent to the CESSDA Data Archives. Column 1 contains the names of the DAs. The information in Column 2 was achieved at the CESSDA portal

148 for

comparison. An asterisk (*) indicates that there is more information added in the response (see Appendix 3). A hyphen (-) means they did not complete the survey/question.

Publisher CESSDA

catalogue

CESSDA Topical

Classific. DDI CV ELSST Own CV Other CV

Austria: WISDOM Yes Yes No Yes No No

Czech Republic: SDA No - - - - -

Denmark: DDA Yes Yes No No Yes* No

Estonia: ESSDA No - - - - -

Finland: FSD Yes Yes Some Yes Yes* Yes*

France: RQ Yes No No No No No

Germany: Gesis Yes Yes All Planned Yes* Yes*

Greece: EKKE Yes Yes Yes Yes Yes No

Hungary: TARKI No No* Some No Yes* No

Ireland: ISSDA Yes No No No No No

Italy: ADPSS Yes Yes No* No Yes* No

Lithuania: LiDA No - - - - -

Luxembourg: CEPS No - - - - -

Netherlands: DANS Yes No No No No No

Norway: NSD Yes Yes Some Yes Yes* Yes*

Romania: RODA No No* No* No* No No

Slovenia: ADP Yes Yes Some No Yes Yes

Spain: CIS No No No No Yes* Yes

Sweden: SND Yes Yes Some Yes - -

Switzerland: FORS Yes No No Yes Yes* No

United Kingdom: UKDA Yes - - - - -

The responses of the questions about CV usage are presented in Table 6.4. As for table 6.3, the information in the table has been completed to indicate if the DAs have published any studies in the CESSDA portal. 12 of the responding DAs had published studies in the CESSDA portal at the time when the survey was distributed.

148

The CESSDA Data Publishers list at http://www.cessda.org/accessing/catalogue/ Accessed November 2012

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA Topic Classification?

o Are you using DDI Controlled Vocabularies? (all/some)

o Are you using the ELSST Thesaurus?

o Are you using your own CV's? Please describe briefly:

o Are you using other CVs? Please specify:

Page 69: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

69/119

Descriptive summary of results: Out of the 16 of the responding DAs, 12 DAs use one or more CV:

12 DAs use at least one of the CVs that were asked for in the survey: CESSDA TC: 9 - DDI CV: 7 - ELSST: 7, of which 4 DAs use all three.

10 DAs use their own CVs in addition to the explicitly requested CVs.

4 DAs do not use any CVs at all.

6.2.3 Co-operation between NSIs and Data Archives

As a part of the WP7 survey, the questions in Figure 6.4 were stated to the DAs about co-operation with NSIs.

Figure 6.4. Survey question to the CESSDA DAs about co-operation with NSIs (question Block 1 in the WP7 questionnaire, see Appendix 3). A summary of the responses for the questions is presented in Table 6.4.

Table 6.5. Results of the questions in Block 1 (see Figure 6.4) in the survey sent to the CESSDA Data Archives. Column 1 contains the names of the DAs. An asterisk (*) indicates that there is more information added in the response (see Appendix 3). A hyphen (-) means they did not complete the survey/question.

A. Dissemination B. Documentation C. Other cooperation

Austria: WISDOM Austrian Microcensus from 1970 to 2003

German and English Nesstar

No

Czech Republic: SDA - - -

Denmark: DDA No* No Yes*

Estonia: ESSDA - - -

Finland: FSD No No Yes*

France: RQ Yes* Yes* -

Germany: GESIS Yes, through GML 149

Yes, through the GML GML

Greece: EKKE No No Yes*

Hungary: TARKI Yes Yes No

Ireland: ISSDA Yes, selected microdata Yes Yes

Italy: ADPSS Yes, inside the University No* No

Lithuania: LiDA - - -

Luxembourg: CEPS - - -

Netherlands: DANS Yes, scientific use files (secured micro data)

4 DC fields for discovery purposes

Yes*

Norway: NSD Yes* Yes* Yes*

Romania: RODA No No Yes*

Slovenia: ADP Yes, some Yes Yes

Spain: CIS No No Yes

Sweden: SND No No No

Switzerland: FORS Yes, through COMPASS 150

Yes, through COMPASS Yes

United Kingdom: UKDA - - -

149

The German Microdata Lab GML at GESIS (see section 6.3.2) 150

Compass is a result of co-operation between the Federal Statistics Office and the Swiss Foundation for Research in Social Sciences (FORS) in order to make official statistics micro data available to social science researchers. Data available cover themes such as economics, education, mobility and health. It also generates some public use samples from PISA and the Federal Population Census which are accessible without contract. http://compass.unil.ch/FORS_COMPASS/?lang=de

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI? o Any documentation of data from the NSI? o Any other cooperation?

Page 70: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

70/119

6.3 NSIs and DAs in Co-operation

The level and type of co-operation between official statistics providers and data archives vary in different countries. There are several factors that can affect the co-operation level. In the survey presented in Sections 6.2.3, and examples in Sections 6.3.1-6.3.2, insights are given to different co-operation constellations.

At a work shop arranged by DwB151, issues concerning co-operation was brought up in the presentations152,153; e.g. different ways of co-operation, factors that affect the co-operation and incentives for NSIs for co-operation.

Examples of DAs’ tasks in co-operation with NSIs with OS microdata:

Dissemination - Archives that contribute dissemination of official microdata, in various grade of anonymisation.

Metadata distribution – Descriptive metadata for OS microdata included into the CESSDA portal for integrated search capabilities through CESSDA archives

Metadata preparation – The DA is responsible for the documentation of the OS microdata with harmonised metadata.

Support for researchers – For example support the researchers in their search for data.

Factors that affect the co-operation are for example:

Mutual trust - Trust between NSIs and research community that is built over a longer time (e.g. in Norway).

Legislation or interpretation of legislation – Some of the OS providers are allowed to distribute metadata for OS microdata and/or the OS microdata to data archives for research purpose (e.g. in Germany).

Tradition and technology - For NSIs that are funded recently it is much easier to adopt new standards and technology, e.g. the DDI schema specification, since their systems can be developed for producing more detailed descriptive metadata from the start, and there are no well-established traditions in what kind of standards and technologies to use.

Global challenges. Research is becoming more data-intensive and data-driven (generally referred as the fourth paradigm of science154). It is argued that large and integrated datasets can provide a deeper understanding of society and enable new cross-disciplinary research. In addition, the open data movement155 has demanded open access to publicly funded data. On July 2012, the European Commission made explicit reference to open access to research data by publishing the communication "Towards better access to scientific information: Boosting the benefits of public

151

http://dwbproject.org/events/regional_workshop1.html 152

http://dwbproject.org/export/sites/default/events/doc/dwb_regional_ws/4A_dwb_regional_-workshop_archives-access-contribution_tubaro-silberman.pdf 153

http://dwbproject.org/export/sites/default/events/doc/dwb_regional_ws/4C_dwb_regional_-workshop_sors-adp-cooperation_smrekar.pdf 154

http://fourthparadigm.org 155

http://en.wikipedia.org/wiki/Open_data

Page 71: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

71/119

investments in research"156. All these initiatives as well as various national policies put pressure on NSIs to provide access for the researchers and to collaborate with the DAs.

Other incentives - Incentives for collaboration, other than legislation, for the NSIs are mostly related to lower workload, since the NSIs can concentrate on their mission as official statistics providers instead of managing research related systems and serving the researchers in other research related issues according to the OS microdata.

In Sections 6.3.1-6.3.4, different levels of co-operation are exemplified, and brief descriptions of the background for the co-operational level.

6.3.1 France

INSEE and Réseau Quetelet represent the first case for a long tradition of cooperation between DA and NSIs since the 80ies. The Data Archive provides access data from national statistics offices. The scope of data include censuses, multiple surveys from INSEE (Scientific Use Files) and the statistical offices of various ministries, as well as from other data producing agencies such as OVE, CEREQ and IRDES. Since 2008 cooperation includes the CASD that gives access to confidential microdata via remote access (compare http://www.reseau-quetelet.cnrs.fr/spip/rubrique.php3?id_rubrique=68&lang=en). The CMH-ADISP is the partner unit of the Réseau Quetelet in charge of distributing these data. The electoral data of the Interior Ministry are distributed by the CDSP. Access to the confidential data of the Statistical Confidentiality Committee is the object of cooperation between the Réseau Quetelet and Insee.

6.3.2 United Kingdom

ONS (Office for National Statistics) and UKDA cooperate for a long time. UKDA disseminates ONS SUF and have an agreement to give remote access to confidential microdata. UK Data Service Census Support is a value-added service of the UK Data Service, which exists to provide access to, and support for, users of the 1971 - 2011 Censuses of Population. Three government census agencies collate statistics for all of the four home nations in the UK and provide additional information:

Office for National Statistics (ONS) for England and Wales

Northern Ireland Statistics and Research Agency (NISRA) National Records of Scotland (NRS)

6.3.3 Norway

The cooperation between Statistics Norway (SN) and NSD has a long history. NSD acts as one of SN’s tools for dispersion of research data for better access to research data for students and researchers.157 The co-operation has been built on trust, but is now moving further through a project funded by the Research Council of Norway, in which NSD is participating in a project that aims to develop a new national infrastructure for improving national and

156

http://www.eesc.europa.eu/?i=portal.en.int-opinions.24976 157

http://snd.gu.se/sites/snd.gu.se/files/CES10_NSD.pdf

Page 72: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

72/119

international access to microdata for the research community (the Remote Access Infrastructure for Register Data, RAIRD.158

6.3.4 Germany

In the case of Germany, a national effort has been made, based on recommendations compiled by the German Commission on Improving the Informational Infrastructure between Science and Statistics (KVI, 1999-2001), to improve the information infrastructure between science and statistics (Rolf-Engel, 2010), in which the central objective was to improve the cooperation between the scientific community and official statistical agencies. The commission developed 36 recommendations for improving the co-operation, and some of recommendations have been implemented to various grade. A report has been written in which the 36 recommendations are described, and that evaluates the grade of implementation of the objectives for each recommendation (Rolf-Engel, 2010).

One of the most important recommendations was to establish research data centres (RDCs) for improving access to microdata and facilitate data analysis (Rolf-Engel, 2010; Bender et al., 2009). There are four publicly funded RDCs:159

Research Data Center of the Federal Statistical Office (FDZ–Bund)

Research Data Center of the State Statistical Offices (FDZ-Länder)

Research Data Center (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB)

Research Data Centre of the German Pension Insurance (FDZ-RV)

In congruence with the German data protection regulations, the mentioned RDC provide their data services under different laws which regards

Bundesdatenschutzgesetz (§16, Abs. 6, BStatG160), which is applied as long as there are no special regulations like the

Social Code Book (Sozialgesetzbuch (SGB)) relevant for the German social insurance services (Pension Insurance; Employment Agency) or the

Federal Statistical Law (Bundesstatistikgesetz (BStatG) that contains fundamental regulations regarding German Official Statistics.

Data access is provided by these services through anonymous microdata files, controlled remote data access and workplaces for guest researchers. Thus, the established access regulation ensures transparent and standardised access to OS microdata and guarantees equal treatment of every request from the data users’ community (Bender et al., 2009). In developing the German research data infrastructure at large, a set of criteria has been published in 2010 by the German Data Forum (RatSWD) to inform on objectives, functions and characteristics of the Members of RatSWD research data infrastructure, which consist of research data centres (RDC) and data service centres (DSZ) (RatSWD, 2010).

158

The RAIRD Project website (under development): http://raird.no/ 159

http://www.ratswd.de/en/data-infrastructure/rdc 160

http://www.gesis.org/en/services/data-analysis/official-microdata/microcensus/microcensus-grundfile/data-transfers/

Page 73: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

73/119

German Microdata Lab161 (GML), part of the Leibniz Institute for the Social Sciences (GESIS), is one of the RDCs accredited at RatSWD. The goal is to make microdata gathered by official statistical agencies accessible and available to the empirically oriented economic and social science research community. Provided data and extensive documentation (metadata) regards German data like the Micro-census or the Income and Expenditure surveys as well as official microdata from Europe, like European Union Labour Force Survey (EU-LFS) or the European Union Statistics on Income and Living Conditions (EU-SILC). Particular services are provided as ‘microdata tools’, which give additional information on the practical use of microdata in various subject areas and complemented jobs-routines for different statistical software packages. These routines support for instance the implementation and operationalization of various social science concepts systems, like CASMIN (Comparative Analysis of Social Mobility in Industrial Nations) or ISEI (International Socio-Economic Index of Occupational Status). The MISSY system, maintained by GML, is of high relevance to metadata access and data discovery. Currently the online-system is offering detailed information on the scientific use files from the German micro-census files since 1973. The Micro-data information system is under development to extent the scope of descriptive and research based metadata to the European surveys EU-LFS and EU-SILC, which are intensively used in the scientific community. These enhancements concern mainly the multilingualism and documentation at country level in the case of the European data and to ensure the re-use of these metadata by other providers (e.g., RDCs). Also, see Section 2.7.2 for an example of a data centre and data producers in co-operation.

6.4 Discussion

At an abstract level, DAs and NSIs are organisations with similar functionality: both manage datasets and disseminate them. Differences between DAs and NSIs may be considered according to the fact that – seen from in the past – they are organisations with different missions, and serve different clients. Where the DAs initially archive and disseminate research data for researchers and students, the NSIs produce official statistics and disseminate on demand of national authorities. However, the particular NSI focus is changing since several of them start to provide extended data services to support the data needs of the research communities across Europe. Thus, differences and commonalities are reflected on many levels of the organisations, and in many details. In addition, it should be mentioned, that the normal input data for both types of organisations are microdata. Majorly international statistical organisations have the need to produce further aggregations from aggregated data as input data.

The focus in this report is on present metadata standards use at European DAs and NSIs, and to examine if there are metadata standards and controlled vocabularies that can span over both DA and NSI sectors, or as the question is stated in the DoW: “Which metadata standards meets the majority of needs and which related vocabularies and coding schemes may be beneficial across all sectors”. Therefore the analysis and discussion will concentrate on those

161

http://www.gesis.org/en/institute/competence-centers/rdc-german-microdata-lab/

Page 74: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

74/119

perspectives. More detailed assessment criteria for metadata standards will be presented in D7.3 of the DwB project.

The major base for the discussion is provided by summarizing the results of investigating the present usage of metadata standards of European NSIs and DAs. Additionally, the aim of DwB, to support a functionality for European Public data services for OS microdata, as well as managing the workflows and regulations on data production and documentation, are general activities at NSIs and DAs that have to be taken into account/consideration.

These activities frame the major metadata needs for the dissemination and the production and Storage of the data products and the provision of respectively specified metadata sets.

Respective conclusions from the discussion will be drawn in summary in section 6.5.

6.4.1 NSI Current Needs and used Metadata Standards, Classifications and Coding Schemes

To summarize the current usage of metadata standards in relation to the need of public data services regards several dissemination aspects.

Retrieval, access and provision of research data requires respective metadata, in particular:

To search for research data covering topics of interest on national or European level. The scope of data types range from a broad scope of aggregated data (statistics, indicators) to the underlying microdata from surveys, register data or process data and alike.

Provision and retrieval of concepts is a key issue in dissemination to find and access relevant statistics at Eurostat or National Statistical Institutes for research purposes.

With the provision of official statistics, metadata on the provided statistics has to cover in particular descriptive metadata and respective documentations of the used statistical concepts and definitions as well metadata on the quality of the data.

Furthermore, the provision of statistics and microdata must inform on confidentiality issues and access conditions according to data privacy regulation with respect to available dissemination and access channels (Public use files, SUF Files, remote execution or remote access or safe centre).

In summary the European NSIs already apply (or plan to do so) the Euro SDMX Metadata Structure (ESMS). This schema comprises 21 high-level concepts, which were strictly derived from the list of 66 cross-domain concepts in the SDMX Content Oriented Guidelines.

The full adoption of all 21 concepts is under development among the 33 NSIs as monitored by Eurostat (see Section 6.1.1). The SDMX compliant structure of concepts is in use at Eurostat since 2010 and replaced the Statistical Data Dissemination Standard (SDDS). The structure of the ESMS serves since that as “unique structure” for the dissemination of “reference metadata at European level” and guides within the European Statistical System the collection of national reference metadata files from NSIs (Eurostat, 2010).

Considering the situation before 2010, where compliance of NSIs towards SDMX appeared very heterogeneous (10 applied SDMX 2010, see Section 6.1.1), since then the vast majority of

Page 75: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

75/119

NSIs plan to achieve more compliance to the technical and statistical SDMX standards in metadata production. The survey conducted by WP8 gives the number of current SDMX compliance at the NSIs: 28 use SDMX; of which 6 are fully or to a large extent compliant and 22 are partially of to a small extent compliant (see Section 6.1.2 for more details).

Additionally to usage of metadata standards for OS on European level, the development and implementation of widely usable standards like classifications, controlled vocabularies and coding schemes are in line to develop metadata standards that provided harmonised and comparable data products to the research community. As briefly reported in Section 4.2, there are commonly used classifications like for the analyses of European Regions (NUTS) or economic activities in the EC up to controlled vocabularies within the statistical domain.

Beyond the view top down from the European Level, the member states of the ESS System apply additional metadata standards for different purposes (see 6.1.2). In summary this regards with still high relevance

the Neuchatel model, which appear as a better standard regarding the handling of classifications and the documentation of variables (23 cases from 26);

the quality driven standards on reporting data and statistics (DQAF/SDDS) (21 cases from 26) according to IMF request;

the Dublin core (10 cases from 26).

However, it is necessary to mention that these figures present only a snapshot of tendencies of the developing metadata agenda (at a certain point of time) for data from OS in Europe and as well as on the level of the singular ESS Member States itself. Of course, this remark is also true for DAs and their metadata agenda.

For example, the standard for Quality Reports (ESQR) in the ESS was released 2009 in parallel to the implementation of the ESMS structure (while DQAF and SDDS was still in use for the IMF). The ESQR update at EUROSTAT answers the demand on European level for a more detailed standard structure of quality reports to improve homogenous reporting in the different statistical domains and hence to facilitate cross-comparisons of processes and outputs.

Thus, there is a need as well to foster harmonised quality reporting across statistical processes and Member States belonging to the European Statistical System (ESS). In terms of metadata standards and related reference models, the overarching GSBPM and the specialised GSIM are the major tools to discuss managing data and metadata at NSI at large (Eurostat, 2010).

6.4.2 DAs Current Needs and used Metadata standards, Classifications and Coding Schemes

The current usage of metadata standards in relation to the need of public data services is - in a general perspective - not different regarding the several dissemination aspects as described above for NSIs.

However, required metadata for retrieval, access and provision of research data differs according to the major type of the research data and underlying metadata standard DDI, in particular.

Page 76: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

76/119

Public data services and related dissemination channel start for DAs in general from the data holdings catalogue of the national data archive, which have their integrated complement with the CESSDA data portal. With the study description scheme there exists the ‘container’ to retrieve archived studies, datasets and documentations materials related to the study.

The retrieval regards different types of majorly survey data and some statistics from complex microdata (like aggregate data from trend surveys, panels data, or alike) or partly also qualitative data.

To support the retrieval of these data via the study description on European and national level thesauri vocabulary made available with the CESSDA topical classifications and ELSST thesaurus.

Metadata for versioning (version history etc.) and persistent identifiers to support data citation and re-use of e.g. survey data (this is also of interest regarding disseminated OS microdata, see Section 2.6).

The Provision and access to survey data in relation to data privacy regulation is generally based on factual anonymised datasets (PUF / SUF). Few CESSDA members (UKDA with SDS in UK and RQ with CASD in France) provide access facilities to data from OS via safe-centre environments or support the documentation of OS data in cooperation with national NSIs (compare Section 6.3).

As all documentation from study to variable level, archives produce and provide related metadata in compliance with the DDI Standard.

However, the archive’s internal data management and workflow regulations differ according to the use of DDI Codebook version or the DDI Lifecycle version. As shown in Section 6.2.1 the majority uses the DDI Codebook version (12 DA from 21), while the DDI-lifecycle is recently less used (5) but planned for future use (11) to large extent. The re-use of metadata and options to cover more phase of the data-lifecycle appear as one of several considerable drivers that move DAs to use the extended DDI version.

Beyond the provision of context information on the study and dataset level, the metadata on variable level, in particular for data from complex comparative survey series are of high relevance to support the research community. It offers further options providing detailed information in a human-readable and machine-actionable form.

In processing documentation material for retrieval and distribution of metadata for a study and its data the use of coding vocabularies and coding schemes, make the work more economical at DAs. In doing so it ease as well the retrieval in data catalogues on the base of agreed terms in the domain, e.g. by applying standardized wording for methodological aspects of the fieldwork in the metadata database. From the social science research perspective, several substantial classifications and vocabularies are applied (see Section 4.4) in performing substantial research and respective data analyses, which needs strong attention in developing a metadata retrieval system for data from OS and beyond.

Finally, metadata for long-term-preservation needs to be considered in managing data, metadata and respective documents related to the study or a complex survey series. Related metadata standards, like PREMIS or METS (see Section 2.4). These technical standards will

Page 77: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

77/119

attract growing attention in the daily work of archiving and curating data regarding the need, to become compliant with the requirements for trusted digital repositories.

Respective requirements, formulated e.g. by TRAC (Trustworthy Repositories Audit & Certification: Criteria and Checklist), DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) or DAS (Data Seal of Approval), are in-line or even in explicit compliance with the OAIS reference model, which has a central guiding function in the development of long-term archives (see Section 5.6).

6.4.3 Metadata Standards in Co-operation

DDI and SDMX are metadata standards that are used by, and accepted as major metadata standard for their community, by a majority of the DAs and NSIs, respectively (see 2.1, 6.4.1 and 6.4.2). This has been the drive for several project that have the aim to promote co-operation between DDI Lifecycle and SDMX; the SDMX dialogue in the past (see Section 3.1), and the ongoing Frameworks and Standards for Statistical Modernisation (FSFSM, see Section 3.2).

The aim of the SDMX/DDI Dialogue (Section 3.1) was to make DDI and SDMX work together with the major objectives,

first to become technically a complementary and interoperable metadata systems and

secondly to adjust both technical metadata standards to researcher needs e.g. regarding metadata elements and functions according to the types of data sets and the substantial scientific driven classifications and vocabularies they want to work with.

As mentioned in Section 2.2, the general focus of SDMX strongly regards reporting, collection and dissemination. However the use in internal statistical production systems is a valuable but a secondary application from a design perspective. SDMX consists of technical and content-oriented components. The respective content guidelines foster the harmonisation of concepts, terminology and data structures for aggregated data and respective metadata. Compared to that, the DDI Lifecycle is at first a technical standard, developed as generic metadata-driven workflows oriented concept, to support metadata production from survey design to production, management and dissemination of microdata along an explicit data life-cycle model (see Section 2.3). The model includes several options to re-use metadata at respective phases of the data and metadata workflow and allows exchanging metadata between research projects and data service institutions. One central overlap regards the creation and handling of aggregate data or tabulations during the data production process. This fact enables the possibility of transforming a dataset described using DDI into an SDMX dataset with related metadata. Beyond that both standards document data structure and use common metadata components and elements like codelists, dimensions, measures and so on as well as schemes for internal identification and maintenance of metadata objects. On the other side, the classification management is not well covered in both standards is. In this case, the Neuchâtel Model for classifications and variables provides a better standard (but

Page 78: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

78/119

not maintained for time being), as discussed in Section 3.3. However, it appears feasible to expose existing classifications and concepts (as often produced by NSIs) in parallel in DDI and SDMX formats, and to be (re-) used as publication formats in data collections and processing phases as well. As pointed out in the survey to NSIs (section 6.1.2), disseminated metadata from the NSIs are most often in formats that are not “machine-actionable”, e.g. PDF and Word. Another issue which neither of DDI nor SDMX cover regards the modelling of business processes, where standards such as Business Process Modeling Notation (BPMN) and Business Process Execution Language (BPEL) could be more sufficiently applied. An additional aspect in modelling work regards the fact, that SDMX is grounded on an explicit information model. Compared to that, there is still a need for a respective DDI information model, which became key objective in the present planning (see Sections 2.3.4 and 3.3).

6.4.4 The Role of GSBPM and GSIM in the Work with SDMX-DDI Co-operation

As technical standards are implementations of conceptual models, respective developments needs to be observed within the work with DDI/ SDMX co-operation as well. This regards in particular the activities and action plans regarding the progress of the GSBPM and GSIM model, as both models provide tools to describe the data production needs at NSIs and DAs in a sufficient manner. The project FSFSM (mentioned in 6.4.3, also see Section 3.2) works with on implementing, reviewing and better integrating standards, models and frameworks needed for the modernisation of statistical production and services, for example supporting practical implementation of key standards within national and international statistical organisations, and with promoting the use of, and developing the GSIM and GSBPM, and also with GSIM / DDI / SDMX mapping, by determining coherence between the standards, to identify gaps and areas for improvement.

From the DDI perspective there are commonalities of the DDI Lifecycle model and the GSBPM model are substantially similar at the top level, as described in Section 3.3, and both emphasize the reuse of metadata. Regarding the differences GSBPM has a much greater focus on repeated data creation collected on fixed intervals (like monthly, quarterly or yearly). Also the terminology used in the different domains needs ‘translations’ to understand each other (e.g. study, data set, data file, data collection, survey). In this respect, it is of relevance for the DwB metadata process, that there are efforts made to produce a common vocabulary of terms, describing similarities and differences, for SDMX and DDI (see Section 3.3). The DDI Lifecycle model has also many correspondences with GSIM. Intersections regard the GSIM concepts and structures areas, which include information about questions, concepts, and variables. The Machine-actionable capacity of DDI complements the GSIM Production area. Less correspondence can be found in the Business area. For the future metadata developments for processing and accessing OS microdata is of interest, that the GSIM process has taken into account DDI and SDMX information models,

Page 79: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

79/119

and will continue to ensure alignment between them and the GSIM, identifying in more detail where there are gaps and overlaps. Complementary the DDI Alliance supports the development of the GSIM as a priority (see Section 3.2). According to the experience at the Australian Bureau of Statistics (ABS) with DDI and SDMX there is a clear need, that SDMX has to be a part of the supporting, integrated, end-to-end statistical production processes (see Section 3.4). For NSI usage, whose inputs are predominantly microdata, SDMX needs to work together with the DDI-L standard as it applies to earlier phases of the production process. The need and the options for flexible metadata system developments are demonstrated by further experiences among DAs and NSIs. As mentioned, the CESSDA DAs can publish their metadata for harvesting, for participating in the CESSDA system, without having to configure their internal systems for being compliant to the DDI Codebook. Examples of this are the Irish Social Science Data Archive (ISSDA) and the Italian Data Archive for the Social Sciences (ADPSS) (see Section 6.4). This is interesting for example in those cases where systems are developed using other metadata standards, or some internally developed well-functioning metadata schema, as in the SCB example (see Section 2.7.1). Keeping these facets is in mind it can be concluded that the future proofed metadata standard has to be machine readable, interoperable with other standards, and carrying for business model processes. This requires on a broad basis to move, where necessary, to metadata management systems, that allows producing and disseminating the required metadata on the base of a technical XML based exchange format to support interoperability between different technical systems as well. In consequence, it is to support the research needs by accessing the different data resources at NSIs and DAs via seamless public data services. For example, the decisions made for a building a federated system may specify needs for change of work routines and developments that allow integrating heterogeneous metadata sources, e.g. by

advanced DBMS Technology for structured data and metadata storage;

further support for harmonisation of metadata from different sources and data domains;

developing necessary routines for mapping internal metadata to commonly agreed metadata schema;

machine-actionable metadata for supporting harvesting to extent the search capabilities in a common portal.

metadata for versioning (version history etc.) and persistent identifiers to support data citation and re-use of e.g. research survey data require complementary efforts in the domain of OS data.

6.5 Concluding Remarks and Summaries

In this Section, some concluding remarks and summaries highlight the work underway regarding the needs for a future proofed standard already reported and referenced

Page 80: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

80/119

elsewhere in this deliverable 162 . However, metadata practices and respective detailed assessments of metadata issues are not in focus in this concluding section. In Section 6.6, a short summary of the additional reports from WP7 and beyond are briefly presented, in which further relevant aspects about metadata standards will be investigated and discussed.

Concluding remarks (see Section 6.4 for detailed discussion)

There is a need for metadata standards that can support dissemination and search for OS microdata for use in research, and to combine these two functionalities.

The metadata standards have to be machine readable, interoperable with other standards, and carrying for business model processes.

GSIM is a reference model that has promising features for capturing the whole lifecycle, including dissemination, of OS microdata.

DDI-Lifecycle and SDMX are well-established metadata schema specifications to extend to support relevant parts of the data management for OS microdata, aligned and extended to follow GSIM and GSBPM.

Common (or properly mapped onto each other) Controlled Vocabularies and Classifications for better and easier harmonization, and co-operation of the SDMX and DDI standards, have to be developed.

Implementation independent representation of metadata standards and schema specifications is a pre-requisite for efficient mappings between different standards, and different versions of same standards.

In Section 6.4, discussion and summarized information from the report give a background for the concluding remarks, e.g. about external projects of interest for the DwB project, which obviously in addition to the State of the art in metadata standards usage at NSIs and DAs, have guided the concluding remarks on the requested needs for metadata standards, classifications and controlled vocabularies. Additionally, on-going and future work in other work packages of DwB (WP8 and WP12), that concern implementation issues of an infrastructure for OS microdata search and dissemination, influenced the concluding remarks.

Summary for the status quo at European NSIs

The metadata focuses on structural metadata, as it concerns aggraded microdata. However, there also a need to document the underlying microdata itself for further use in technical system on national and European level. However there are presently technical, infrastructural and legal constrains, to disseminate the national OS within the European Research Area at large. A central role to overcome this situation plays the planning and present activities on developing frameworks and standards for statistical modernization.

162

The question in the DoW is “Which metadata standards meets the majority of needs and which related vocabularies and coding schemes may be beneficial across all sectors. There are also questions to consider, like asking for the convergence of the usage and needs of metadata standards for a European OS microdata portal that should be aligned with the CESSDA portal. To answer this, the particular conceptual and operational activities are to reference with WP8 and WP12, which are in charge of respective implementation issue.

Page 81: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

81/119

So far SDMX and its adoption by ESMS can be considered the most used metadata standards to handle OS not only to provide metadata to describe aggregate data. However, of major interest are respective efforts in harmonizing specific concepts and terminology that are common to a large number of statistical domains. Considering both aspects together these are valuable tools to move forward comparative analyses at all, as it allows enhancing analyses potentials of both, the OS data and the data from empirical social research.

European NSIs use classifications that are promoted for example by Eurostat, IMF up to United Nations classification registry or the RAMON database to present e.g. aggregated OS on relevant research matters. Related vocabularies and coding schemes appear beneficial for all respective the data coming from across all sectors, like to analyze survey data and OS under regional aspects in Europe.

Summary for the status quo at CESSDA DAs

The metadata focus on descriptive metadata as it concerns the documentation of the fielded microdata from empirical social research. Additionally there are strong metadata that enhance the capacities to capture metadata at the time they occur along the data lifecycle and to re-use metadata at several of the value-chain, in particular in planning and managing complex survey series for comparative research.

DDI-Codebook is the most used standard to expose in particular existing studies or study series in the DA data catalogues at national level and integrated service at the CESSDA portal. Additionally, there is a strong movement in planning the application of the DDI Lifecycle standard to advance data and metadata management at several technical and organizational strands. However, these activities need also additional efforts to be integrated on CESSDA level

To advance the metadata management and its respective seamless infrastructure provision there are some more widely used controlled vocabularies at the DAs as the base to allow respective search and browsing functions to find and access data and metadata via the central CESSDA portal. With respect to the development of standards the substantial adaption in the different research domains and the technical integration of DAs and NSIs vocabs is in general strongly recommended in different metadata dialog initiatives.

6.6 Further Relevant Facets of the Metadata Agenda for NSIs & DAs

As mentioned, the task of this deliverable (D7.1) is to present the State of the Art on using Metadata standards, classifications and related controlled vocabularies at NSIs and DAs according to their needs. The relevant needs has been reported with respect to the manifold application of the ‘metadata toolbox’ ranging from the coverage of specific metadata standards and vocabularies in use up to metadata framework and models, providing necessary tools, to analyses and develop the business case in question. In saying this, the current State-of-the-Art as presented misses further relevant aspects like to assess standards regarding their future relevance for European Social Science data infrastructure needs and key areas. This aspect, and other aspects on metadata standards

Page 82: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

82/119

with future relevance, will be investigated and reported in deliverable D7.2. Rules and best practices regarding metadata standard selection and usage are focused at deliverable D7.3. Specific issues in software development to specific widely used metadata standards will be discussed in D7.4 while D7.5 will inform about the DDI-SDMX dialogue. Finally, D7.6 will enlighten metadata standards, classifications and practices in related disciplines to support the extension of existing social science metadata and the interdisciplinary use of research results. Additionally, sources of contextual metadata and regulative standards for linking data and publications are to be identified. Beyond work package 7, further investigations as well as practical consequences in term of implementations issues will be provided by WP8. This regard in particular the provision of a OS Object Model (D8.1), an related integrated Metadata model (D8.2) as well as a workflows and dataflows document for both possible models (D8.3) to proposing finally portal resource discovery functionality for a search/browse portal interface (D8.4). WP12 is in charge of Implementing Improved Resource Discovery for OS Data. These activities are framed by WP4 work on concepts and models on Improving Access to OS Microdata, which examine, how e-technology based Remote Access (RA) environments may help widen and enhance data access across the European Research Area. Complementary assets to the technical issues are promoted by WP5. Their activities care for improving access to OS data by servicing researchers in the use of European OS microdata. It will achieve this via two key objectives that will promote opportunities for new research in areas such as demography of European societies, labour market participation and future labour needs in Europe. Inspiring aspects regarding future metadata requirements and developments to improve the support for social science research needs made available in further project like the DASISH project163 and in the KNOWeSCAPE164 project.

163

http://dasish.eu/ 164

http://knowescape.org/wp-content/uploads/2013/04/TD1210-e.pdf

Page 83: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

83/119

APPENDIX 1: INTERNATIONAL COOPERATION

1. The National Statistical Institutes

1.1 Eurostat165

Eurostat is the statistical office of the European Union situated in Luxembourg. It was established in 1953 and when the European Community was founded in 1958 it became a Directorate-General (DG) of the European Commission. Eurostat’s key role is to supply statistics to other DGs and to supply the Commission and other European Institutions with data so they can define, implement and analyse Community policies. Its task is also to provide the European Union with statistics at European level that enable comparisons between countries and regions.

1.2 The European Statistical System (ESS)166

ESS is the partnership between Eurostat and the national statistical institutes (NSIs) and other national authorities responsible for the development, production and dissemination of European statistics in each Member State. This partnership also includes member countries of the European Economic Area (EEA) and the European Free Trade Association (EFTA).

The ESS functions as a network in which Eurostat’s role is to lead the way in the harmonization of statistics in close cooperation with the national statistical authorities. ESS work concentrates mainly on EU policy areas - but, with the extension of EU policies, harmonization has been extended to nearly all statistical fields. The ESS also coordinates its work with candidate countries, and at European level with other Commission services, agencies and the European Central Bank (ECB) and international organisations such as the Organisation for Economic Co-operation and Development (OECD), the United Nations (UN), the International Monetary Fund (IMF) and the World Bank.

1.3 European Statistical System Committee (ESSC)167

At the heart of the ESS is the European Statistical System Committee (ESSC), which was established by Regulation (EC) No 223/2009 of the European Parliament and Council of 11 March 2009 on European statistics. Article 7 of the Regulation lays down its task: the Committee "shall provide professional guidance to the ESS for developing, producing and disseminating European statistics"

In practice, this means that the Commission shall consult the ESS Committee in regard to: a. the measures which the Commission intends to take for the development, production

and dissemination of European statistics, their justification on a cost-effectiveness basis, the means and timetables for achieving them, the reporting burden on survey respondents;

b. proposed developments and priorities in the European Statistical Programme; c. the annual work programme for the following year;

165

http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/ 166

http://epp.eurostat.ec.europa.eu/portal/page/portal/pgp_ess/ess/ess_news 167

http://epp.eurostat.ec.europa.eu/portal/page/portal/pgp_ess/about_ess/statistical_committees/essc

Page 84: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

84/119

d. initiatives to bring into practice the reprioritization and reduction of the response burden;

e. issues concerning statistical confidentiality; f. the further development (revision or update) of the Code of Practice; g. any other question, in particular issues of methodology, arising from the establishment

or implementation of statistical programmes.

The ESSC is chaired by the Commission (Eurostat) and composed of the representatives of Member States' National Statistical Institutes. EEA and EFTA countries' National Statistical Institutes participate as observers. Observers from ECB, OECD, etc. may also participate in the meetings of the ESSC. It meets four times a year.

1.4 Directeurs Généraux des Instituts Nationaux de Statistique (DGINS)168

The DGINS Conference was created on 15 July 1953 in Luxembourg and was acting then as the predecessor of the Statistical Programme Committee (SPC). It is held once a year with the aim of discussing topics related to the statistical programme and methods and processes for the production of Community statistics. It is hosted each year by a different Member State and the Director General of the host country chairs the conference.

1.5 ESSnet169

During the Palermo meeting in September 2002, the Directors General of NSI (DGINS) expressed the need to find synergies, harmonization and dissemination of best practices in the European Statistical System (ESS). They proposed to create an adequate instrument: the Centres and Networks of Excellence (Cenex, now called ESSnet) projects, for putting together expertise distributed throughout the ESS organisations in order to develop specific actions which would benefit the whole system.

An ESSnet project is: "A network of several ESS organisations aimed at providing results that will be beneficial to the whole ESS". The ESSnet actions shall serve the interests of the whole ESS and should be in line with the 5-year statistical programme as being financed partly by Eurostat. Their objective is to take advantage of the synergies from cooperation of some Member States in order to share expertise and to save costs in solving common problems of European interest. The transfer of results and of knowledge to non-participating partners for the benefit of the entire ESS is an essential characteristic of ESSnet projects. Several ESSnet are of interest for DwB project (for instance DARA).

1.6 Partnership group170

The Partnership Group is a group of Directors General of the National Statistical Institutes of the ESS whose mission is to further the development of the ESS at the highest level, notably through ensuring the effective functioning of the European Statistical System Committee. Its tasks are to:

● Identify and propose strategic issues for discussion by the ESSC,

168

http://epp.eurostat.ec.europa.eu/portal/page/portal/ess_eurostat/statistical_committees/DGINS 169

http://www.essnet-portal.eu/ 170

http://epp.eurostat.ec.europa.eu/portal/page/portal/ess_eurostat/statistical_committees/partnership_group

Page 85: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

85/119

● Assist in co-ordinating the co-operation between National Statistical Systems and Eurostat on strategic issues in order to participate in the formulation of the issues before discussion in the ESSC,

● Discuss contentious issues in order either to make proposals to the ESSC with the aim of achieving consensus or to refer them to other bodies (e.g. Sector Groups) for further work,

● Channel ideas from ESSC members on the state of co-operation and how it could be improved,

● Comment on agendas for future ESSC meetings and discuss the substance of upcoming ESSC agenda items,

● Monitor the functioning of the ESSC and its subsidiary bodies, ● Work in an inclusive way, keeping all Heads of National Statistical Institutes (NSIs)

informed of discussion and actions through information exchange via the Network Group and ensuring that non-members views can be taken into account.

The Partnership Group normally meets four times a year between ESSC meetings.

1.7 Regional Statistics Coordination Officer (RESCO)

The regional statistics coordination officer deals with:

● the assurance of a smooth data flow of all types of regional statistics from data suppliers to Eurostat;

● co-ordination with other data producing institutions in the Member State; ● work for a harmonised data format of the data flow from NSOs; ● assistance in exchanging meta-data (methodological texts); ● the distribution of information from Eurostat to the national partners;

1.8 METIS171

METIS is a collaboration group between the United Nations Economic Commission for Europe (UNECE), Eurostat and OECD to provide a forum for discussing metadata issues. The Conference of European Statisticians Steering Group on Statistical Metadata (METIS Steering Group) is responsible for developing and maintaining the Common Metadata Framework (see chapter 5), as well as organising METIS Work Sessions and Workshops. The METIS-wiki172 is a place where people working in official statistics share information and ideas about statistical metadata.

1.9 Organisation for Economic Co-operation and Development (OECD)173

The Committee on statistics and Expert group on international collaboration on Microdata access was set up by OECD to discuss transnational access to confidential microdata both for ORCD needs, NSIs needs and researchers needs. It gathers representatives of the NSIs. Recommendations are expected at the end of 2013 and should include metadata issues.

171

http://www.unece.org/stats/archive/04.01d.e.html 172

http://www1.unece.org/stat/platform/display/metis/METIS-wiki 173

http://www.oecd.org/

Page 86: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

86/119

1.10 Other collaborations

Some collaboration is also within international statistical associations:

● International Statistical Institute (ISI)174

● International Association for Official Statistics (IAOS)175

● International Input-Output Association (IIOA)176

● International Association of Survey Statisticians (IASS)177

● International Association for Research in Income and Wealth (IARIW)178

2. The Data Archives

2.1 The Council of European Social Science Data Archives (CESSDA)179

CESSDA is an umbrella organisation for social science data archives across Europe. Since the 1970s the members have worked together to improve access to data for researchers and students. CESSDA research and development projects and Expert Seminars enhance exchange of data and technologies among data organisations.

CESSDA promotes the acquisition, archiving and distribution of electronic data for the European social science and humanities research community. It encourages the exchange of data and technology and fosters the development of new organisations in sympathy with its aims. It associates and cooperates with other international organisations sharing similar objectives.

Membership in CESSDA is available to European data organisations that are actively engaged in archiving data and provide the social science community with computerised numeric information, data, and documentation with support for secondary analysis. CESSDA members include university departments, data archives and research institutions.

Collectively the constituent CESSDA member organisations serve some 30,000+ social science and humanities researchers and students within the European Research Area each year, providing access to 25,000+ data collections, delivering over 70,000+ data collections per annum and acquiring further 1,000+ data collections each year. The CESSDA Catalogue enables users to locate datasets, as well as questions or variables within datasets, stored at CESSDA archives throughout Europe.

Preparations are underway to move CESSDA into a new organisation known as CESSDA European Research Infrastructure Consortium (CESSDA ERIC).

174

http://www.isi-web.org/ 175

http://isi.cbs.nl/iaos/ 176

http://www.iioa.org/ 177

http://isi.cbs.nl/iass/allUK.htm 178

http://www.iariw.org/ 179

http://www.cessda.org

Page 87: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

87/119

2.2 CESSDA European Research Infrastructure Consortium (CESSDA ERIC)

Archiving social science data optimally and ensuring access for researchers across national borders is a major challenge. To meet this challenge, CESSDA is shifting into a new organisation, ERIC (European Research Infrastructure Consortium). CESSDA Research Infrastructure is one of the 35 projects that were listed on the ESFRI Roadmap in 2006.

Several European countries have signed the Memorandum of Understanding to commit their financial and political support for the setting up of CESSDA-RI. The infrastructure will be hosted by Norwegian Social Science Data Services (NSD). The Research Council of Norway granted funding for CESSDA-ERIC Preparation for the period 10.10.2011 - 31.12.2012. The funding enables the consortium to prepare the actual establishment of CESSDA-ERIC.

The members of an ERIC are countries and each member country will appoint a service provider which fulfils the membership obligations. CESSDA-ERIC will perform a key role in ensuring that the necessary metadata standards are continuously developed, maintained, enhanced and implemented. Much of this work will be centred to DDI and how it inter-relates to other metadata and information standards.

As currently the ERIC statutes do not allow associated country (as Norway) to host an ERIC, 13 countries have signed a MoU, which allows for an interim period till 2015 a Norwegian legal society (CESSDA AS) to act legally for the new CESSDA consortium of which these governments are members. The 1st General Assembly was held on June 18th, 2013 in Bergen.

2.3 The International Federation of Data Organizations for Social Science (IFDO)180

IFDO was established in the late 1970's in response to advanced research needs of the international social science community. The founders felt it would be advantageous to coordinate worldwide data services and thus enhance social science research. IFDO’s main purpose is to facilitate and support research through cooperation between data organizations across countries, regions and continents.

180

http://www.ifdo.org

Page 88: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

88/119

APPENDIX 2: THE NATIONAL STATISTICAL INSTITUTES AND THE

DATA ARCHIVES

1. The NSIs Country Name of Organisation Homepage

Austria Statistics Austria http://www.statistik.at

Belgium Statistics Belgium http://statbel.fgov.be/en/

Bulgaria National Statistical Institute (NSI) http://www.nsi.bg

Croatia Croatian Bureau of Statistics (DZS) http://www.dzs.hr/

Cyprus Statistical Service of Cyprus (MOF) http://www.mof.gov.cy/

Czech Republic Czech Statistical Office (CZSO) http://www.czso.cz

Denmark Statistics Denmark (DST) http://www.dst.dk/

Estonia Statistics Estonia (STAT) http://www.stat.ee/

Finland Statistics Finland (STAT) http://www.stat.fi/

France National Institute of Statistics and Economic Studies (INSEE)

http://www.insee.fr/

Germany Federal Statistical Office (DESTATIS) http://www.destatis.de

Greece Hellenic statistical authority (EL.STAT) http://www.statistics.gr

Hungary Hungarian Central Statistical Office (KSH) http://portal.ksh.hu/

Iceland Statistics Iceland (STATICE) http://www.statice.is/

Ireland Central Statistics Office (CSO) http://www.cso.ie/

Italy Italian National Institute of Statistics (ISTAT) http://www.istat.it/

Latvia Central Statistical Bureau of Latvia (CBS) http://www.csb.gov.lv/

Liechtenstein Bureau of Statistics (AS) http://www.as.llv.li/

Lithuania Statistics Lithuania (LS) http://www.stat.gov.lt/en/

Luxembourg Service Central de la Statistique et des Etudes (STATEC)

http://www.statistiques.public.lu

Malta National Statistics Office (NSO) http://www.nso.gov.mt/

Netherlands Statistics Netherlands (CBS) http://www.cbs.nl/

Norway Statistics Norway (SSB) http://www.ssb.no/

Poland Central Statistical Office (GUS) http://www.stat.gov.pl/

Portugal Statistics Portugal (INE) http://www.ine.pt/

Romania National Institute of Statistics (INSSE) http://www.insse.ro/

Slovak Republic Statistical Office of the Slovak Republic http://portal.statistics.sk/

Slovenia Statistical Office of the Republic of Slovenia http://www.stat.si/

Spain National Statistics Institute (INE) http://www.ine.es/

Sweden Statistics Sweden (SCB) http://www.scb.se/

Switzerland Federal Statistical Office/Swiss Statistics (FSO) http://www.bfs.admin.ch/

United Kingdom Office for National Statistics (ONS) http://www.statistics.gov.uk

Turkey Turkish Statistical Institute (TurkStat) www.turkstat.gov.tr

Page 89: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

89/119

A complete list of national statistical offices websites can be found at United Nations Statistical Division’s (UNSD) website181. UNSD has also developed a central repository of country profiles of statistical systems182.

2. The (CESSDA) Data Archives

Country Name of DA Homepage

Austria WISDOM http://www.wisdom.at/

Czech Republic Czech Social Science Data Archive (CSDA) http://archiv.soc.cas.cz/

Denmark The Danish Data Archive (DDA) http://www.sa.dk/dda/

Estonia Estonian Social Science Data Archive (ESTA) http://psych.ut.ee/esta/

Finland Finnish Social Science Data Archive (FSD) http://www.fsd.uta.fi/

France Réseau Quetelet http://www.reseau-quetelet.cnrs.fr/

Greece The Greek Social Data Bank (GSDB-EKKE) http://www.gsdb.gr/

Germany Leibniz-Institut für Sozialwissenschaften (GESIS) http://www.gesis.org/

Hungary TÁRKI Social Research Institute (TARKI) http://www.tarki.hu/en/

Ireland The Irish Social Science Data Archive (ISSDA) http://www.ucd.ie/issda/

Italy Data Archive for Social Sciences (ADPSS) http://www.sociologiadip.unimib.it/sociodata/

Lithuania Lithuanian Data Archive for Social Science and Humanities (LiDA)

http://www.lidata.eu/

Luxembourg CEPS http://www.ceps.lu/

Netherlands Data Archiving and Networked Services (DANS) http://www.dans.knaw.nl/

Norway Norwegian Social Science Data Services (NSD) http://www.nsd.uib.no/

Romania The Romanian Social Data Archive (RODA) http://www.roda.ro/

Slovenia Social Science Data Archives (ADP) http://adp.fdv.uni-lj.si/

Spain ARCES/CIS http://www.cis.es

Sweden Swedish National Data Service (SND) http://snd.gu.se/

Switzerland Swiss Foundation for Research in Social Sciences (FORS)

http://www2.unil.ch/fors/

181

http://unstats.un.org/unsd/methods/inter-natlinks/sd_natstat.asp 182

http://unstats.un.org/unsd/dnss/cp/searchcp.aspx

Page 90: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

90/119

APPENDIX 3: WORK PACKAGE 7 - SURVEY OF (CESSDA) DATA

ARCHIVES The following questions were sent to the CESSDA Data Archives in November 2012. The replies from the DAs follow on the consecutive pages.

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

o Any documentation of data from the NSI?

o Any other cooperation?

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

o Are you using DDI-lifecycle (DDI3)? How?

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

o Are you using DDI Controlled Vocabularies? (all/some)

o Are you using the ELSST Thesaurus?

o Are you using your own CV's? Please describe briefly:

o Are you using other CVs? Please specify:

Country Archive Contact e-mail Status Austria WISDOM Christian Bischof [email protected] Responded

Czech Republic SDA Yana Leontiyeva [email protected] No response

Denmark DDA Bodil Stenvig [email protected] Responded

Estonia ESSDA Andu Rämmer [email protected] No response

Finland FSD Mari Kleemola [email protected] Responded

France RQ Raphaelle Fleureux [email protected] Responded

Germany GESIS Uwe Jensen [email protected] Responded

Greece EKKE Chryssa Kappi [email protected] Responded

Hungary TARKI Peter Hegedus [email protected] Responded

Ireland ISSDA Julia Barrett [email protected] Responded

Italy ADPSS Domingo Scisci [email protected] Responded

Lithuania LIDA [email protected] No response

Luxembourg CEPS [email protected] No response

Netherlands DANS Marion Wittenberg [email protected]

Responded

Norway NSD Atle Alvheim atle.alvhe@[email protected] Responded

Romania RODA Adrian Dusa [email protected] Responded

Slovenia ADP Irena Vipavc [email protected] Responded

Spain CIS Jesús Bouso Freijo [email protected] Responded

Sweden SND Iris Alfredsson [email protected] Responded

Switzerland FORS Andreas Perret [email protected] Responded

United Kingdom UKDA Melanie Wright [email protected] No response

Page 91: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

91/119

Austria: WISDOM

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

At the WISDOM we have the data from Austrian Microcensus from 1970 to 2003

o Any documentation of data from the NSI?

We have German and English documentation in Nesstar http://www.wisdoc.at:8080/webview/index.jsp?catalog=http%3A%2F%2F80.75.252.24%3A8080%2Fobj%2FfCatalog%2FCatalog22&submode=catalog&mode=documentation&top=yes

o Any other cooperation?

No

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

With Nesstar Publisher and Server

o Are you using DDI-lifecycle (DDI3)? How?

No

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

No. Because of the high effort for the documentation and we don’t have the resources at the moment.

Which controlled vocabularies are you using at the moment?

o Are you using the CESSDA topic classification?

Yes http://www.wisdoc.at:8080/webview/index.jsp?v=2&node=16&submode=ddi&study=http%3A%2F%2F80.75.252.24%3A8080%2Fobj%2FfStudy%2FMZ2002%212%21en&mode=documentation&top=yes

o Are you using DDI Controlled Vocabularies? (all/some)

No

o Are you using the ELSST Thesaurus?

Yes. For the Keywords.

o Are you using your own CV's? Please describe briefly: No

o Are you using other CVs? Please specify: No. We plan to use controlled vocabulary, if there is a multilingual (English-German) version available.

Page 92: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

92/119

Denmark: DDA

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

Since 2010 the cooperation between DDA and Statistics Denmark (SD) has developed in a positive direction characterised by a contact person arrangement and regular meetings (at least ones a year) with SD. DDA has taken initiatives to inform about the Data Documentation Initiative (DDI) and share knowledge of data documentation with SD. DDA do not disseminate data from SD, as SD has its own service when it comes to dissemination of micro data from statistical registers to researchers. Read about the service hear. SD’s production of micro data and statistical registers are achieved in the State Archives (SA) in Denmark. This is according to the legal requirements of official records in Demark. In other words statistical registers in the format of micro data are achieved in SA. DDA is part of SA so in time we expect to be able to disseminate the archived micro data from SD’s registers through DDA’s search catalogue.

o Any documentation of data from the NSI?

No

o Any other cooperation?

DDA cooperates with SD in two dimensions.

In order to give the best possible service to researches that work with micro data – either collecting or analysing micro data - DDA and SD have yearly meetings on how to improve and coordinate the service that we can give the researcher in relation to survey and register data. The legislation in Denmark makes it possible for researcher’s to process micro data with personal identifiers and to archive the data in DDA. In cooperation with the Danish Data Protection Agency DDA can give access to surveys with personal identifiers. The researches may if permitted by the Danish Data Agency combine the survey data with personal identifiers with micro data in SD.

Best practise of data documentation, DDI-L and DDI-L tools have within the last year developed to a field of cooperation between the two institutions. At the moment the DDA and SD are cooperating to start a national (Danish) DDI user group. DDA have been charring the knowledge we have about DDI-L from implementing DDI-Lifecycle in DDA (see below)

Are you using DDI at the data archive?

o Are you using DDI-codebook (DDI2)? How?

No

o Are you using DDI-lifecycle (DDI3)? How?

Yes. The year 2012 marks the full implementation of DDI-L in DDA for both curation and dissemination production processes. DDA has developed the DdiEditor. The DDA DdiEditor is the key tool in a framework of data processing tools and processes composing data processing of survey datasets. The end product is data

Page 93: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

93/119

documentation in accordance with international metadata standard of the collection of surveys – DDI-L.

The DdiEditor produces metadata documentation in DDI-Lifecycle. The DdiEditor uses English terms in accordance with the DDI-L standard.

A key objective for development of the DdiEditor is to provide users with a tool which is configurable, extendable and customisable allowing users to customise their personal working environment to their needs.

At the moment the DdiEditor is aimed at data processing for curation purposes (accommodating the requirements for a data archive). The user manual is designed for data processing in the DDA using the DDI Lifecycle version 3.1.

The DdiEditor process DDI-L documents. The DdiEditor allows users to import, export, validate and print as well as to create, change and delete DDI-L elements. However, the development project plans to move forward in the direction of providing functionality for the whole data lifecycle. That is future user groups will range from researchers carrying out a survey to re-users of data supplying additional metadata to the original study. The DdiEditor is an Open Source tool. We encourage interested parties to download and try out the product. Hopefully, this will provide us with feedback on the product as it stands as well as additional development on the product. http://code.google.com/p/ddieditor/.

Using the DdiEditor requires a basic familiarity with SPSS and the DDI Standard.

Installing and using the DdiEditor requires adaptation to the external partners’ systems which must be configured for the platform (Eclipse RCP).

The user manual contains instructions on how to process data using the DdiEditor. Metadata to be processed by the DdiEditor is imported as wiki syntax text consisting of the questionnaire (referred to as Wiki file) and SPSS-based meta data (variable names and labels).

The manual must be seen as a work in progress where information is continuously updated and changed. Contact information: [email protected] & [email protected]

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

N.A

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

Yes. The DDA has implemented CESSDA topic classification translated into Danish by the DDA (no external reviews). At the moment an update of these classifications is desired. Most strikingly topic classifications for medical science data are lacking. When we will update the CESSDA topic classification it will be considered to change to the English version and thereby avoiding the challenges and mistakes due to translation.

Page 94: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

94/119

o Are you using DDI Controlled Vocabularies? (all/some)

No. In 2013 we plan to evaluate and possibly implement DDI CV’s.

o Are you using the ELSST Thesaurus?

No. But we see this as an interesting possibility

o Are you using your own CV's? Please describe briefly: Yes.

- DDA Keywords: The DDA has a unique list of keywords. Addition to the list is made on demand by staff processing the data materials for long term preservation and dissemination.

- CV for access restrictions: Categories coded 1 to 6 for different kinds of access restrictions.

- CV for study state classes: Eleven categories for naming a study’s state in the data archive production process. The categories are coded with abbreviations (KON, MOD, ARK, ARE, OPD, FOA, FOB, FOC, FOD, FOE, NED).

o Are you using other CVs? Please specify: No

Finland: FSD

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

No

o Any documentation of data from the NSI?

No

o Any other cooperation?

Yes: - We did a documentation pilot several years ago but that did not lead to co-operation.

Currently we are discussing a possible new project. - Statistics Finland is our partner in ISSP

- Statistics Finland is represented in our National Advisory Board

Are you using DDI at the data archive?

o Are you using DDI-codebook (DDI2)? How?

Yes. We use DDI2.0 to document both quantitative and qualitative data. Quantitative data is documented in variable level. Our DDI template contains all Cessda mandatory and recommended elements. All metadata records are available: http://www.fsd.uta.fi/en/data/background/ddi-records.html

o Are you using DDI-lifecycle (DDI3)? How?

No

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

Yes. We are researching DDI-L during the FSD Upgrade project: http://www.fsd.uta.fi/en/news/FSDupgrade.html

Page 95: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

95/119

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

Yes

o Are you using DDI Controlled Vocabularies? (all/some)

Some (at the moment only time methodology and analysis unit but are planning to implement others)

o Are you using the ELSST Thesaurus?

Yes

o Are you using your own CV's? Please describe briefly: We use the following own CVs (all available in Finnish and English)

- topic classification: our own (based on Finnish general thesaurus YSA) - data kind: our own (quantitative / qualitative) - sampling procedure (Random sampling / Simple random sampling / Systematic

sampling / Systematic sampling / Stratified sampling / Stratified random sampling / Cluster sampling / Two-stage cluster sampling / Stratified quota sample / Multistage probability sampling / Multistage stratified quota samples / Deming zone selection / Probability sample / Oversample / Total study / Judgement sampling / Volunteer sample)

- collection mode (Telephone interview / Face-to-face interview / Face-to-face interview / Postal survey / Telepanel survey (Gallup Channel) / Internet survey / Guided questionnaire / Observation / Self-administered writings / Recording / Focus group)

- research instrument (structured / semi-structured) - background variables (not strictly a CV but list of recommended terms or

wordings) - list of organisations

o Are you using other CVs? Please specify: - keyword: the Finnish general thesaurus YSA

- country/nation: ISO 3166-1

France: RQ

Before going any further, here are some precisions about RQ: Réseau Quetelet is a network with 3 partners, which give access to different data: - ADISP (Data Archive of French National Statistics), official statistics data - CDSP (Centre for Socio-Political Data), socio-political data - INED (National Institute of Demographic Studies), socio-demographic surveys. We have different uses about data and documentation, but I try to answer to the questions for the all RQ.

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

INSEE (National Institute of Statistics and Economics Studies), the French NSI, and ministerial statistical offices give, to ADISP only, microdata, tabulations and their

Page 96: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

96/119

documentation of the surveys they conducted. We also can do bespoke tabulations on some INSEE data.

o Any documentation of data from the NSI?

We regularly cooperate to improve the data and the documentation, and by giving the remarks and questions of the researchers about the surveys.

o Any other cooperation?

- Are you using DDI at the data archive?

o Are you using DDI-codebook (DDI2)? How?

All partner of RQ have a Nesstar server (ADISP, CDSP, INED), so document their data in DDI-C. And we use the XML files exported from Nesstar to feed our Question Bank.

o Are you using DDI-lifecycle (DDI3)? How?

No

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

We plan to use DDI-L in two projects : the first is a new version of the Réseau Quetelet portal. We are working to extend the actual portal to the documentation of all surveys disseminated by RQ partners. We hope at the end to import and export the documentation in DDI3. The second project, hold by the CDSP, is an internet panel, named ELIPSS. Data of this panel will be entered in Questasy, using DDI3.

Which controlled vocabularies are you using at the moment?

o Are you using the CESSDA topic classification?

No

o Are you using DDI Controlled Vocabularies? (all/some) No

o Are you using the ELSST Thesaurus?

No

o Are you using your own CV's? Please describe briefly: We don't use CESSDA topic classification, DDI CV or ELSST thesaurus. We all use CV to describe our surveys in Nesstar but we don't have a common CV. It is not a fixed list, it evolves when we get new surveys.

Germany: GESIS

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

Partly > German Microdata Lab (GML) http://www.gesis.org/en/institute/competence-centers/rdc-german-microdata-lab/

o Any documentation of data from the NSI? Yes > See “Missy” System (Microdata Information System) http://www.gesis.org/missy/

Page 97: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

97/119

o Any other cooperation? Yes. With German Microdata Lab

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

- Data Holding Catalogue System http://www.gesis.org/en/services/research/dhc-advanced-search/

- ZACAT (Nesstar server System 4.0) http://www.gesis.org/en/services/research/zacat-online-study-catalogue/

- Dara (DOI Registration agency social and economic data) http://www.da-ra.de/en/home/ Data Cite Metadata standard is the base for the Dara Metadata Standard recently used: da|ra metadata schema version 2.2.1 (Released August 8th, 2012) see http://www.da-ra.de/en/for-data-centers/register-data/doi-and-metadata/ The dara schema is DDi compatible

- Missy System (see above)

o Are you using DDI-lifecycle (DDI3)? How? Internal export to DDI routines for long-term preservation (under development)

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? In current system developments or related updates: - STARDAT (In-house Production System for Study and Variable Documentation) - MISSY (included in updating of the System)

Which controlled vocabularies are you using at the moment?

o Are you using the CESSDA topic classification? Yes

o Are you using DDI Controlled Vocabularies? (all/some) Yes (all)

o Are you using the ELSST Thesaurus? Planned for use at the Data Holdings catalogue / ZACAT

o Are you using your own CV's? Please describe briefly: ZA classification system (“ZA Kategorien”): Content related keywords to classify a study in the DHC

o Are you using other CVs? Please specify Under Review: “Thesaurus for the Social Sciences” http://www.gesis.org/en/services/research/thesauri-und-klassifikationen/social-science-thesaurus/ The Thesaurus for the Social Sciences (Thesaurus Sozialwissenschaften) is a crucial instrument for the content-oriented search by keywords in SOFIS (Social Science Research Information System) and SOLIS (Social Science Literature Information System). The list of keywords contains about 12,000 entries, of which more than 8,000 are descriptors (authorised keywords) and about 4,000 non-descriptors. Topics in all of the social science disciplines are included.

Page 98: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

98/119

The Thesaurus for the Social Sciences is also available in interactive form via sowiport. You can choose between alphabetic and systematic lists and between translations in english, german or russian. It (is now available in SKOS-format German - English - French). The current release is version 0.92, which uses SKOS-XL and additionally defined extensions. Furthermore cross-concordances to the STW Thesaurus for Economics of the ZBW as well as to the AGROVOC of the FAO are included (only via SPARQL and HTML representation). Additional technical enhancements and content updates are planned for the future. A detailed documentation about the conversion of the thesaurus to a first experimental SKOS version can be found in the GESIS technical report 2009/07 (209 KB).

Greece: EKKE

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

No

o Any documentation of data from the NSI? No

o Any other cooperation? Yes, we have a cooperation with our NSI by getting involved in joint venture programs (a formal agreement between our organization was signed very recently). Concerning the data, we can use some data from NSI but we are not allowed to disseminate that data. The data is restricted to our use. However, a common database regarding census data processing will be available next year.

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

Yes, we are using DDI-codebook.

o Are you using DDI-lifecycle (DDI3)? How? No, we are not using DDI3.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? We are currently not using DDI3 but we are planning of getting involved with DDI3.

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

Yes

o Are you using DDI Controlled Vocabularies? (all/some) Yes, only the mandatory fields.

o Are you using the ELSST Thesaurus? Yes

o Are you using your own CV's? Please describe briefly: We have limited supplementary CVs for certain subjects

o Are you using other CVs? Please specify: No

Page 99: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

99/119

Hungary: TARKI

Do you have any cooperation with your country’s NSI? At current time we haven't, but the Databank distributes datasets from the NSI

o Any dissemination of data from the NSI?

Yes, we have

o Any documentation of data from the NSI?

Yes, we have

o Any other cooperation?

No, we haven't.

Are you using DDI at the data archive? Yes

o Are you using DDI-codebook (DDI2)? How?

Yes, the descriptions of the datasets are

o Are you using DDI-lifecycle (DDI3)? How?

No, we are not using this

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

We are thinking about it, but haven't got enough sources (funding, etc.) to use it.

Which controlled vocabularies are you using at the moment? We are using own CV's

o Are you using the CESSDA topic classification?

Our own version of the CESSDA classification

o Are you using DDI Controlled Vocabularies? (all/some)

Some of these

o Are you using the ELSST Thesaurus?

No

o Are you using your own CV's? Please describe briefly:

The bases of these CVs are the DDI Controlled Vocabularies, but we use a simplified version of these.

o Are you using other CVs? Please specify:

No

Ireland: ISSDA

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

Yes. The Chair of the National Statistics Board has chaired an ad hoc group of stakeholders to advise on ISSDA’s future and its place in the Irish information landscape; ISSDA disseminates selected microdata from the Central Statistical Office, which is the operational NSI within Ireland

Page 100: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

100/119

o Any documentation of data from the NSI? Yes. Metadata for datasets is acquired, refined, and disseminated via NESSTAR

o Any other cooperation? Yes. Conversation around scope of data to be disseminated from the Central Statistical Office are ongoing.

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

Not at this time

o Are you using DDI-lifecycle (DDI3)? How? Not at this time

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? Not at this time

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

No

o Are you using DDI Controlled Vocabularies? (all/some) No

o Are you using the ELSST Thesaurus? No

o Are you using your own CV's? Please describe briefly: To date descriptors have been applied on an ad hoc basis

o Are you using other CVs? Please specify: No

Italy: ADPSS

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

We get data from the NSI and distribute them ONLY inside our University's Department. People want to access to NSI outside our Department have to contact the NSI directly (it's not allowed distributing data without their permission, so when someone ask us NSI's data, we get them the NSI's contact).

o Any documentation of data from the NSI? All the data from the NSI come with data documentation (questionnaire, codebook, methodological notes, etc...). We have it.

o Any other cooperation? So far, no.

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

No, so far.

Page 101: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

101/119

o Are you using DDI-lifecycle (DDI3)? How? No, so far.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? We are renovating our website in these days (and our archive structure too) and we are thinking of checking all our data and deciding how to do. I hope we'll be able to use DDI2/DDI3 in the next years.

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

In the new website, yes. TopCClas are already in our database but not showed in the actual web pages.

o Are you using DDI Controlled Vocabularies? (all/some) No, we aren't. As for DDI2/3, we are thinking of adopting them during the archive/website update.

o Are you using the ELSST Thesaurus? No, we're not.

o Are you using your own CV's? Please describe briefly: We use a kind of CV for some metadata, for some DDI fields, like "timeMeth" ("cross-section", "panel", "longitudinal survey"), "SampProc" ( "two-stage stratified random sample", "multi-stage stratified random sample", etc...) or "collMode" (" face-to-face interview", "self-completion questionnaire", "CATI", etc...)

o Are you using other CVs? Please specify: No, we're not.

Netherlands: DANS

Do you have any cooperation with your country’s NSI? We do cooperate with Statistics Netherlands (CBS).

o Any dissemination of data from the NSI? We disseminate their scientific use files (secured micro data files).

The secured micro files are available free of charge to researchers at institutions which are authorized by the CBS law. The referred legally authorized institutions are: - universities within the meaning of the Law on Higher Education and Scientific

Research - by law established organizations or institutions for scientific research - Central Planning Office - CPB - Social and Cultural Planning Office - SCP - National Institute for Public Health and the Environment - RIVM - National Physical Planning Department - RPD - Statistical Office of the European Communities – Eurostat

Researchers working with other Dutch organizations and institutions can apply for an official authorization by the Central Commission for Statistics - CCS. This authorization has to be requested from CBS (Centre for Policy Related Statistics). Researchers who want to make use of the unsecured files should to go to CBS itself.

Page 102: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

102/119

o Any documentation of data from the NSI? We only document 4 Dublin Core fields in our catalogue, for discovery purposes. We don't document the data itself. We also cooperate in a project about digitizing historical censuses

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

We are using DDI codebook for our NESSTAR server. However only a very little part of our holdings we disseminate through NESSTAR. The bulk we disseminate through our archiving system EASY. For EASY we only document in Dublin Core (DC). The data of Statistics Netherlands are only documented in DC. (in the dutch language)

o Are you using DDI-lifecycle (DDI3)? How? For one project (Question bank Cultural Changes)we are starting to use DDI lifecycle. But this project has nothing to do with data of Statistics Netherlands.

Which controlled vocabularies are you using at the moment? We don't use any controlled vocabularies at the moment.

o Are you using the CESSDA topic classification? We don't use the CESSDA topic classification

o Are you using DDI Controlled Vocabularies? (all/some) We don't use DDI Controlled Vocabularies

o Are you using the ELSST Thesaurus? We don't use the ELSST Thesaurus

o Are you using your own CV's? Please describe briefly: We don't use our own CV

o Are you using other CVs? Please specify: We don´t use any other CV

Norway: NSD

Do you have any cooperation with your country’s NSI? Yes, a formal agreement since 1975, see (pp. 4-5, English language columns): http://www.nsd.uib.no/nsd/doc/nsd_annualreport2011.pdf

o Any dissemination of data from the NSI? Yes, for microdata data see http://www.fsd.uta.fi/en/CES2012/christophertonnesen.pdf SN put surveys with doc on an ftp server for NSD to pick up. For aggregate statistics, see appended note

o Any documentation of data from the NSI? Yes, see http://www.nsd.uib.no/nsddata/dataleverandorer.html

o Any other cooperation? Occasionally on projects and activities.

Page 103: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

103/119

Are you using DDI at the data archive? Yes, DDI2.5

o Are you using DDI-codebook (DDI2)? How? Appended manual

o Are you using DDI-lifecycle (DDI3)? How? To some degree within ESS

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? To develop various longitudinal solutions, ESS, HBSC, etc

Which controlled vocabularies are you using at the moment? Appended manual

o Are you using the CESSDA topic classification? Yes

o Are you using DDI Controlled Vocabularies? (all/some) Some, but a little of sync with latest developments

o Are you using the ELSST Thesaurus? Yes, to some degree

o Are you using your own CV's? Please describe briefly: See appended manual.

o Are you using other CVs? Please specify: For our opinion poll data archive, we classify down to question level, see https://trygg.nsd.uib.no/meningsmalingsarkivet/search This requires a somewhat more ad hoc categorization system

Romania: RODA

Do you have any cooperation with your country’s NSI? We do have an informal good cooperation with our NSI, but not a formal cooperation just yet. The DwB project made a difference and we are very optimistic that such an institutional formal cooperation will start in the very near future.

o Any dissemination of data from the NSI?

Not yet.

o Any documentation of data from the NSI?

Not yet.

o Any other cooperation?

The dissemination department from our NSI has been very helpful in identifying micro-data for research purposes, and even mediated the relationship with the Eurostat where the data should have been disseminated from there.

Page 104: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

104/119

Are you using DDI at the data archive? Yes.

o Are you using DDI-codebook (DDI2)? How? Yes. All our previous codebooks have been edited and saved in XML format, using the Nesstar Publisher.

o Are you using DDI-lifecycle (DDI3)? How? Not for the moment.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? Yes, we do plan to use the DDI3 (lifecycle). We are currently upgrading our data archiving system and create a dual mode storage: one in a relational database for that information which is very systematic, and the other storage in a DDI3 compliant, XML file for all information in the relational database plus the information which is very specific to individual studies. The XML file will be able to cover the data lifecycle part.

Which controlled vocabularies are you using at the moment? None for the moment.

o Are you using the CESSDA topic classification? No, but will do in the near future.

o Are you using DDI Controlled Vocabularies? (all/some) No, but will do so in the near future.

o Are you using the ELSST Thesaurus? We have just been licensed by the University of Essex to translate ELSST in the Romanian language, therefore will certainly use ELSST in the near future.

o Are you using your own CV's? Please describe briefly: No.

o Are you using other CVs? Please specify: No

Slovenia: ADP

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

Some. Like Labour force survey (1995- 2000), Time use survey 2001, Crime Victim survey 2001, Census 1991, 2002 and Household budget survey 2000.

o Any documentation of data from the NSI?

For mentioned above. Other classifications they use are available on NSI’s web site, so we do not save or distribute it.

o Any other cooperation?

And more descriptive answer – adding also some info to above fields.

Page 105: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

105/119

The cooperation with the Slovene NSI dates back to the nineties and the beginning of the 21st century. Since then we have distributed their LFS and Census PUF microdata etc. Both organizations have been partners of the DwB project so after almost a decade with almost no cooperation we decided to work more intensely on the national level. It started at the end of 2011. ADP, with a support of NSI’s departments, have been preparing NSI’s non-anonymised LFS microdata and metadata (from 2001 on) for the future use of registered researchers. We have also been supporting the redesigning of their website by collecting information on the availability of different datasets which could be later used by researchers in the safe room or by remote access. In the not too distant future, public use files will be prepared and distributed by the Archive. And then other surveys follow.

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

Yes. We have an xml template (for every part of DDI except data description) that contains fields we use the most (more than those offered by Nesstar templates). Export to this template is first made from our local DB. Than we manually fill in the xml in xml editor (Oxygen) referring to the information we receive from depositors. If needed additional fields from DDI are added. Data description part of the DDI we create through Nesstar Publisher and merge it to other parts of the xml.

o Are you using DDI-lifecycle (DDI3)? How? No, we are not using DDI3 yet. We have tried, but we are still waiting for useful tool for editing and visualisation.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? Yes, we are. For now we are trying to stay up to date on all the activities and developments related to DDI3. Unfortunately we have not enough resources for developing our own solutions or tools to move on from DDI2 to DDI3.

Which controlled vocabularies are you using at the moment?

o Are you using the CESSDA topic classification? Yes, we are. In study description we often add additional (but not classified) topics, for easier recognition of the study.

o Are you using DDI Controlled Vocabularies? (all/some) Some of it

o Are you using the ELSST Thesaurus? Not explicitly. ELST is used in DwB project to some extent – in suggested metadata scheme. So metadata prepared for that will have ELSST topics included.

o Are you using your own CV's? Please describe briefly: Yes, we use our own CV's for almost every other field in DDI.

o Are you using other CVs? Please specify: We also use CERIF – Common European Research Classification Scheme (http://www.arrs.gov.si/en/gradivo/sifranti/inc/CERIF.pdf)

Page 106: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

106/119

Spain: CIS

Do you have any cooperation with your country’s NSI? The major cooperation with Spanish NSI is our participation in Commissions to coordinate statistical production in Spain (coordination between ministries and between regions). We use NSI classifications (like National Occupational Classification) and, in specific studies, their sample designs, as well.

There is no any other cooperation between CIS and Spanish NSI but a fluent relationship between statistical institutions.

o Any dissemination of data from the NSI?

o Any documentation of data from the NSI?

o Any other cooperation?

Are you using DDI at the data archive?

o Are you using DDI-codebook (DDI2)? How? CIS is not currently using DDI-codebook (DDI2) but we are planning to implement it in the new release of our database, in six or eight months.

o Are you using DDI-lifecycle (DDI3)? How? CIS is not currently using DDI-life-cycle.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? We are planning to use it in the future but we have not decided already the way for implementing it.

Which controlled vocabularies are you using at the moment?

o Are you using the CESSDA topic classification? No

o Are you using DDI Controlled Vocabularies? (all/some) No

o Are you using the ELSST Thesaurus? No

o Are you using your own CV's? Please describe briefly: Yes. It is a thesaurus only in Spanish, not closed. It has more than 3,600 terms and is updated regularly but without a fixed schedule. Terms are classified into 17 thematic blocks (DESCRIPTORS) (1,724 terms), a list of IDENTIFIERS (1,368 terms), another one of FUNCTIONAL TERMS (26 terms) and a list of PLACE NAMES AND DATES (463 terms). Descriptors are organized hierarchically. In some cases, they reach up to a level 8 of specificity.

It includes Synonyms and scope notes. However, with the current database, it is not possible to look it up. It does not include related terms.

Currently, it is used only to index the questions. For thematic classification of studies we use a more general hierarchical index with a smaller breakdown. We use another index for time series classification.

Page 107: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

107/119

o Are you using other CVs? Please specify: Yes. The library uses its own thesaurus to index the publications.

Sweden: SND

Do you have any cooperation with your country’s NSI? o Any dissemination of data from the NSI?

A few (mainly election results)

o Any documentation of data from the NSI?

Just the ones disseminated by SND

o Any other cooperation?

No

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

Yes. SNDs metadata management system (SIMS) is compliant with DDI-codebook with import and export routines to the standard. Although we don't every feature of the standard (e.g. nCubes are not supported), most parts are implemented in the system. DDI-codebook is also used as a metadata carrier between the management system and Nesstar.

o Are you using DDI-lifecycle (DDI3)? How? Yes. The above-mentioned metadata management system have export routines to version 3.1 of DDI-lifecycle. SND does however not support DDI-lifecycle in full. A DDI Profile of supported elements is available. Metadata in DDI-lifecycle format is automatically created whenever changes are made in SIMS and stored in an xml database (eXist-db). Metadata from this database is used as a basis for different DDI-lifecycle based services as e.g. our question bank and our future online catalogue.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How? -

Which controlled vocabularies are you using at the moment? o Are you using the CESSDA topic classification?

Yes

o Are you using DDI Controlled Vocabularies? (all/some) Yes

o Are you using the ELSST Thesaurus? Yes

o Are you using your own CV's? Please describe briefly: -

o Are you using other CVs? Please specify: -

Page 108: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

108/119

Switzerland: FORS

Do you have any cooperation with your country’s NSI? Yes, we have been collaborating directly since 2008 with the COMPASS program

o Any dissemination of data from the NSI?

Data is only disseminated by the NSI, COMPASS offers an ordering platform for researchers.

o Any documentation of data from the NSI?

Documentation available from the NSI and from COMPASS, though ours is standardized, based on the NSI information.

o Any other cooperation?

Yes, with creation of public use samples and open access policy.

Are you using DDI at the data archive? o Are you using DDI-codebook (DDI2)? How?

We use DDI 1.2.2 as in NESSTAR 4 for public data. The DARIS archive documents academic studies with the NESSTAR 3.5 platform.

o Are you using DDI-lifecycle (DDI3)? How?

No, we don’t have either tools or adequate IT staff.

o If you are not currently using DDI-lifecycle, are you planning for DDI-lifecycle? How?

FORS is developing a proprietary platform that should have some DDI interface capacities.

Which controlled vocabularies are you using at the moment?

o Are you using the CESSDA topic classification?

No

o Are you using DDI Controlled Vocabularies? (all/some)

NESSTAR 4 doesn’t offer CV, as far as I know.

o Are you using the ELSST Thesaurus?

The DARIS archive has its abstracts tagged with the ELSST

o Are you using your own CV's? Please describe briefly: We use our own classification for topics (a crossover to CESSDA and ELLST is available).

We also use classification in De, En and Fr for Survey Mode, Coverage, Periodicity, Geographical Level and attributes of scientific use files and Public Use Files. Please see our website http://compass.unil.ch

o Are you using other CVs? Please specify: No

Page 109: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

109/119

REFERENCE LIST

Alvheim, A., 2009. A CESSDA Common Data Portal; metadata harvesting, indexing and search technology (D5.3), Deliverable D5.3 in the CESSDA-PPP project, Available at: http://www.cessda.org/project/doc/WP5.3_Portal_harvestingx_indexing_and_search.pdf [Accessed 2013-03-22]

Askitas, N., Gregory, A., Hoogerwerf, M., 2009. DDI Working Paper Series - Best Practices, No. 10. DDI Alliance. DOI: http://dx.doi.org/10.3886/DDIBestPractices10

Australian Bureau of Statistics, 2009. Internal ABS Working Document on Data Metadata Strategy. Available at: http://www.abs.gov.au/websitedbs/d3310114.nsf/4a256353001af3ed4b2562bb00121564/7057eb9a73a186c4ca2576c00018b2ba!OpenDocument [Accessed 2012-12-13]

Bargmeyer, B. and Gillman, D., 2000. Metadata standards and metadata registries: an overview. Bureau of Labor Statistics, Washington DC. Available at: http://stats.bls.gov/ore/pdf/st000010.pdf [Accessed 2012-12-08 & 2013-01-14]

Batini, C. and Scannapieco, M., 2006. Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications), Springer-Verlag New York, Inc., ISBN:3540331727

Beedham, Hilary, et. al., Assessment of UKDA and TNA Compliance with OAIS and METS Standards (Essex: UK Data Archive, 2005). Available at: http://www.jisc.ac.uk/uploaded_documents/oaismets.pdf

Bender, S., Himmelreicher, R., Zühlke, S. and Zwick, M, 2009. Improvement of Access to Data Set from the Official Statistics. German Data Forum (RatSWD). Available at: http://www.ratswd.de/download/RatSWD_WP_2009/RatSWD_WP_118.pdf [Accessed 2013-06-24]

BIS, ECB, EUROSTAT, IMF, OECD and UN, 2002. ” Common Open standards for the Exchange and Sharing of Socio-economic Data and Metadata: the SDMX Initiative”, UNECE/Eurostat Work Session on Statistical Metadata, Working Paper No. 11, In conference proceedings of CONFERENCE OF EUROPEAN STATISTICIANS, 6 - 8 March 2002, Luxembourg. Available at: http://sdmx.org/docs/2002/wp11.pdf [Accessed 2013-03-27]

Blomqvist, K. n.d. Version 1.1 CASE STUDY 2012-11-26. Unpublished manuscript. Statistics Sweden, Sweden. Presentation expert seminar. Available at: http://www1.unece.org/stat/platform/display/metis/Statistics+Sweden [Accessed 2012-12-13]

CCSDS - Consultative Committee for Space Data Systems, 2012. Reference Model for an Open Archival Information System (OAIS). Magenta Book. Issue 2. June 2012. Recommendation published by CCSDS Secretariat, NASA, Washington, DC, USA. Available at: http://public.ccsds.org/publications/archive/650x0m2.pdf [Accessed 2013-06-19]

Page 110: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

110/119

Doan, A. and Halevy, A., 2005. Semantic-Integration Research in the Database Community A Brief Survey. Copyright Association for the Advancement of Artificial Intelligence. Available at: http://www.aaai.org/ojs/index.php/aimagazine/article/view/1801/1699 [Accessed 2013-07-01]

Doorn, P. and Tjalsma, H., 2007. Introduction: archiving research data. Archival Science 7(1), 1-20. DOI:10.1007/s10502-007-9054-6

Duşa, A., Krejčí, J., Štebe, J., Fábián, Z., Hegedus, P., Hausstein, B., 2010. WP6 Final report: Strengthening the CESSDA RI (D6.1). Deliverable in the CESSDA-PPP project.Available at: http://www.cessda.org/project/doc/WP6_Final_Report.pdf [Accessed 2013-06-12]

El-Haj, M., 2012. UKDA Keyword Indexing with a SKOS Version of HASSET Thesaurus. Available at: https://hassetukda.wordpress.com/2012/09/24/ukda-keyword-indexing-with-a-skos-version-of-hasset-thesaurus/ [Accessed 2012-11-29]

Elmasri, R. A., Navathe, S. B., 2007. Fundamentals of Database Systems (5th ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. ISBN 0-321-36957-2

European Commission, 2007. Improving the Socio-economic Knowledge Base. Final report for the Madiera project, Project reference HPSE-CT-2002-00139. ISBN 978-92-79-07754-8

Eurostat, 2010. Monitoring of National Metadata Systems. European Commission. Available at: http://www1.unece.org/stat/platform/download/attachments/57835554/Monitoring+of+National+Metadata+Systems+2008_2009+-+2009_2010.pdf [Accessed 2013-06-19]

Eurostat, European Commission, 2011. Monitoring of National Metadata Systems 2009/2010 – 2010/2011. Available at http://epp.eurostat.ec.europa.eu/portal/page/portal/pgp_ess/0_DOCS/estat/Monitoring.pdf

Gregory, A., 2011a. The Data Documentation Initiative (DDI): An Introduction for National Statistical Institutes. Open Data Foundation. Available at: http://odaf.org/papers/DDI_Intro_forNSIs.pdf [Accessed 2012-12-13]

Gregory, A., 2011b. SDMX and DDI: How do They Fit Together in Practical Terms? In: DDI – The Basis of Managing the Data Life Cycle. Gothenburg, Sweden 5-6 December 2011. Available at: http://www.iza.org/conference_files/EDDI2011/call_for_papers/EDDI11%20C2%20-%20Arofan%20Gregory%20-%20SDMX%20and%20DDI%20How%20Do%20They%20Fit%20Together%20in%20Practical%20Terms.pptx [Accessed 2013-01-14]

Gregory, A. and Heus, P., 2007. DDI and SDMX: Complementary, not Competing, Standards. Open Data Foundation. Available at: www.opendatafoundation.org/papers/DDI_and_SDMX.pdf [Accessed 2012-12-08].

Page 111: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

111/119

Gregory, A., Heus, P. and Ryssevik, J., 2009. Metadata (March 1, 2009). RatSWD (German Council for Social and Economic Data) Working Paper No. 57. Available at SSRN: http://ssrn.com/abstract=1447866 or http://dx.doi.org/10.2139/ssrn.1447866 [Accessed 2013-03-22]

Gregory, A., Heus, P. and Ryssevik, J., 2010. Metadata. In: Building on Progress. German Data Forum, ed. 2011. Available at: http://www.ratswd.de/eng/publ/Building_on_Progress.html [Accessed 2012-12-08].

Hammer, J. and Mcleod, D., 1993. An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems, International Journal of Intelligent and Cooperative Information Systems, Vol. 2, no. 1, p. 51-83

Heckman, J. H., 2004. Micro Data, Heterogeneity and the Evaluation of Public Policy. Part 1. The American Economist 48(2), 3-25.

Hoffman, E., and Chamie, M., 1999. Standard Statistical Classifications: Basic Principles. [online] UN Statistical Commission, Thirtieth session, New York, 1-5 March 1999. Availabel at: http://unstats.un.org/unsd/class/family/bestprac.pdf [Accessed 2012-10-29]

Jensen, U., 2010. Funding models for the future development of metadata standards and software tools (D8.2a). Deliverable in the CESSDA-PPP project. Available at: http://www.cessda.org/project/doc/WP6_Final_Report.pdf [Accessed 2013-06-12]

Jensen, U., 2012. Leitlinien zum Management von Forschungsdaten. Sozialwissenschaftliche Umfragedaten. Available at: http://www.gesis.org/fileadmin/upload/forschung/publikationen/gesis_reihen/gesis_methodenberichte/2012/TechnicalReport_2012-07.pdf [Accessed 30 June 2013]

Jääskeläinen, T., Moschner, M., and Wackerow, J., 2009. Controlled Vocabularies for DDI 3: Enhancing Machine-Actionability. [pdf] IASSIST Quarterly Spring - Summer 2009 Available at: http://www.iassistdata.org/iq/controlled-vocabularies-ddi-3-enhancing-machine-actionability [Accessed 2013-01-14]

Karge, R., (n.d.) Metanet - Reference Model Available at: http://www.epros.ed.ac.uk/metanet/working_groups/Reference_model/ReferenceModel.doc [Accessed 2012-11-22]

Karge, R., 2005, A terminology model approach for defining and managing statistical metadata. Available at http://en.wikipedia.org/wiki/Terminology_model [Accessed 2012-11-22]

Kleemola, M., 2012. Improving Operations Using Standards and Metrics: Self-Assessment of Long-Term Preservation Practices at FSD. Presentation at the 38th IASSIST Conference, June 4-8, 2012, Washington DC. Available at: http://www.iassist2012.org/indexfolder/program/files/14_IASSIST2012%20Kleemola.pps [Accessed 2013-06-18]

Page 112: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

112/119

Lalor, T.,(2011). National Implementation of the GSBPM – A Summary Based on METIS Case Studies. Available at: http://www1.unece.org/stat/platform/download/attachments/55476343/National+Implementations.doc?version=1&modificationDate=1320830772287 [Accessed 2013-06-12]

Lundell, L-G., 2009. Data warehouse for efficient statistics production. Available at: https://sites.google.com/site/bosundgren/my-library/Datawarehousingbasedproduction.docx?attredirects=0 [Accessed 2012-12-13]

Mochmann, E., 2002. International Social Science Data Service: Scope and Accessibility. Report for the International Social Science Council (ISSC). German Social Science Infrastructure Services (GESIS), Cologne.

NACE Rev. 2, 2008. Statistical classification of economic activities in the European Community. Available at http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-07-015/EN/KS-RA-07-015-EN.PDF [Accessed 2013-06-22]

National Information Standards Organization (NISO), 2004. Understanding Metadata, NISO Press, ISBN: 1-880124-62-9, Available at: http://www.niso.org/publications/press/UnderstandingMetadata.pdf [Accessed 2013-03-15] Research Data MANTRA [online course]. EDINA and Data Library, University of Edinburgh. Available at: http://datalib.edina.ac.uk/mantra [Accessed 2012-11-30]

Neiswender, C. 2009. "What is a Controlled Vocabulary?." In The MMI Guides: Navigating the World of Marine Metadata. Available at: http://marinemetadata.org/guides/vocabs/vocdef. [Accessed 2013-06-22]

Netterstrøm, S. et al., 2004, Neuchâtel Terminology Model Classification database object types and their attributes. Available at: http://www1.unece.org/stat/platform/download/attachments/14319930/Part+I+Neuchatel_version+2_1.pdf?version=1 [Accessed 2012-11-22]

OECD, 2007. Data and Metadata Reporting and Presentation Handbook. Available at: http://www.oecd.org/dataoecd/46/17/37671574.pdf [Accessed 2013-06-22]

OECD Glossary of Statistical Terms 2007. Available at: http://stats.oecd.org/glossary/glossaryWord.zip [Accessed 2013-06-22]

Pellegrino, M., 2011. The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms. EDDI 2011, Gothenburg, Sweden 5-6 December 2011. Available at: http://www.iza.org/conference_files/EDDI2011/call_for_papers/EDDI11%20C2%20-%20Marco%20Pellegrino%20-%20The%20Ongoing%20Work%20for%20a%20Technical%20Vocabulary%20of%20DDI%20and%20SDMX%20Terms.ppt [Accessed 2013-01-14]

Quandt, M., compiled from WP9 team contributions in CESSDA-PPP project, 2010. Recommendations for, and requirements of, a CESSDA Harmonisation Infrastructure (D9.4). Available at: http://www.cessda.org/project/doc/D9.4_Consolidated_report.pdf [Accessed 2013-03-15]

Page 113: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

113/119

RatSWD 2010. Kriterien des Rates für Sozial- und Wirtschaftsdaten (RatSWD) für die Einrichtung der Forschungsdaten-Infrastruktur. Available at: http://www.ratswd.de/download/publikationen_rat/RatSWD_FDZKriterien.PDF[Accessed 2013-07-04]

Regionale Standards 2013. Federal Statistical Office, the Arbeitsgemeinschaft Sozialwissenschaftlicher Institute e. V. (ASI) and the ADM Arbeitskreis Deutscher Markt- und Sozialforschungsinstitute e. V. Available at: https://www.destatis.de/DE/Methoden/Methodenpapiere/Download/RegionaleStandards_Ausgabe2013.html [Accessed 2013-06-22]

Rolf-Engel, G., 2010. The German Commission on Improving the Information Infrastructure between Science and Statistics (KVI) and their Realization since 2001. German Data Forum (RatSWD). Available at: http://www.ratswd.de/download/RatSWD_WP_2010/RatSWD_WP_145.pdf [Accessed 2013-06-24] Ruas de Oliviera, L. B., Felizardo, K. R., Feitosa, D., Nakagawa, E. Y., 2010. Reference Models and Reference Architectures Based on Service-Oriented Architecture: A Systematic Review. ECSA'10 Proceedings of the 4th European conference on Software architecture. Pages 360-367. Springer-Verlag Berlin, Heidelberg. ISBN:3-642-15113-2 978-3-642-15113-2

Ryssevik, J. and Musgrave, S., (2001). The Social Science Dream Machine: Resource Discovery, Analysis, and Delivery on the Web. Social Science Computer Review 2001 19:163. DOI: 10.1177/089443930101900203 Available at http://ssc.sagepub.com/content/19/2/163.full.pdf+html [Accessed 2013-04-18]

Scannapieco, M., Vaccari, C., 2011. Standardizing European Statistical processes:CORA and CORE projects. Accepted for conference MeTTeG 2011 - 5th International Conference on Methodologies, Technologies and Tools enabling e-Government 30 June - 1 July 2011 Camerino, Italy. Available at: http://www.academia.edu/754392/Standardizing_European_Statistical_processes_CORA_and_CORE_projects [Accessed 2013-01-14]

SDMX Initiative, 2011a. SDMX Standards: Section 1 Framework for SDMX Technical Standards Version 2.1. SDMX Initiative. Available at: http://sdmx.org/wp-content/uploads/2011/04/SDMX_2-1_SECTION_1_Framework.pdf [Accessed 2013-05-28].

SDMX Initiative, 2011b. SDMX Standards: Section 2 Information Model UML Conceptual Design. SDMX Initiative. Available at: http://sdmx.org/wp-content/uploads/2011/08/SDMX_2-1-1_SECTION_2_InformationModel_201108.pdf [Accessed 2013-05-28].

Sheth, A. P. & Larson, J. A., 1990. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys (CSUR), 22(3), 183-236.

Thomas, W., 2005. Data Distribution and Cataloging. In Encyclopedia of Social Measurement, Available through: Science Direct http://dx.doi.org/10.1016/B0-12-369398-5/00369-8 [Accessed 2012-12-08].

Page 114: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

114/119

Thomas, W., Gregory, A., Gager, J., Kuo I-L., Wackerow, A., Nelson, C., 2009. Data Documentation Initiative (DDI) Technical Specification Part I: Overview Version 3.1. Available at: http://sourceforge.net/projects/ddi-alliance/files/Data%20Documentation%20Initiative/DDI%203.1%20%282009-10-18%29/DDI_3_1_2009-10-18_Documentation_XMLSchema.zip/download [Accessed 2013-05-28]

UNECE Secretariat, 2009. Generic Statistical Business Process Model. Version 4.0 - April 2009. Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS). Available at: http://www1.unece.org/stat/platform/download/attachments/8683538/GSBPM+Final.pdf [Accessed 2013-04-18] or http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model [Accessed 2013-03-27]

UNECE, 2012a. Generic Statistical Information Model (GSIM): Communication Paper for a General Statistical Audience (Version 1.0, December 2012). Available at: http://www1.unece.org/stat/platform/display/metis/GSIM+Communication+Paper [Accessed 2013-07-16]

UNECE, 2012b. GSIM: Generic Statistical Information Model: Specification Version 1.0 December 2012. Available at: http://www1.unece.org/stat/platform/display/metis/GSIM+Specification [Accessed 2013-01-14]

Vale, S., 2010. Exploring the relationship between DDI, SDMX and the Generic Statistical Business Process Model. Second Annual European DDI User Group Meeting (EDDI). Utrecht, Netherlands 8-9 December 2010. Available at http://www1.unece.org/stat/platform/download/attachments/57835554/EDDI+paper.pdf?version=1 [Accessed 2012-02-13]

Vardigan, Mary (2013). Strategic priorities of the Data Documentation Initiative (DDI) Alliance, Spring 2013. Presentation at the Work Session on Statistical Metadata, 06 - 08 May 2013, Geneva, Switzerland. Available at: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.40/2013/WP6.pdf [Accessed 2013-06-11]

Vardigan, M., Heus, P., and Thomas, W., 2008. Data Documentation Initiative: Toward a Standard for the Social Sciences. In: The International Journal of Digital Curation, 1(3), Available through: http://www.ijdc.net/index.php/ijdc/article/view/66/45 [Accessed 2012-12-08].

Vardigan, M. and Whiteman, C., 2007. ICPSR meets OAIS: applying the OAIS reference model to the social science archive context. Available at: http://deepblue.lib.umich.edu/bitstream/handle/2027.42/60440/Vardigan.Whiteman.Applying%20OAIS.pdf?sequence=1

Page 115: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

115/119

Willeboordse, A. et al., 2006, Neuchâtel Terminology Model Part II: Variables and related concepts object types and their attributes. Available at: http://www1.unece.org/stat/platform/download/attachments/14319930/Neuchatel+Model+V1.pdf?version=1 [Accessed 2012-11-22]

Page 116: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

116/119

GLOSSARY .BASE-system – Links between data and metadata exist in the .BASE-system

ABS - Australian Bureau of Statistics

ADS Metadata – Advertising Distribution Specification

AIP - Archival Information Package

BA – German Federal Employment Agency

BIS – Bank for International Settlements

BPEL – Web Services Business Process Execution Language

BPMN – A standard Business Process Modeling Notation

CCSDS - Consultative Committee for Space Data Systems

Cenex - the Centres and Networks of Excellence (now called ESSnet) CESSDA - The Council of European Social Science Data Archives

CESSDA ERIC - CESSDA European Research Infrastructure Consortium

CIS - Commonwealth of Independent States

CMF – Common metadata framework CMR - Corporate Metadata Repository CODED - Eurostat's Concepts and Definitions Database

CORA - Common Reference Architecture

CORE - Common Reference Environment

CoSSI – Common System of Statistical Information

CRISTAL – Generic model for complex structuring and restructuring of hierarchical classification structures and classification code systems

CROMETA – The central metadata repository (CROMETA) is the essential part, the core of the Integrated Statistical Information system (ISIS) which is in the final stage of development. In other words, ISIS is developed upon CROMETA. CSV – Comma Separated Value. File format

CV - Controlled Vocabularies

CVG - Controlled Vocabularies Working Group

CWM - Common Warehouse Metamodel D - Deliverable

DA – Data Archive

DatML/ASK – metadata to set up electronic questionnaires

DatML/EDT – holds the metadata that defines the data editing rules

DatML/SDF - Survey Definition Format

DC(MI) - Dublin Core (Metadata Initiative) DDI - Data Documentation Initiative

DDI-C - DDI Codebook DDI-L - DDI Lifecycle DG - Directorate-General of the European Commission

DGINS - directors General of NSI DIP - Dissemination Information Package

DOI - Digital Object Identifer

DQAF - Data Quality Assurance Framework

DSA - Data Seal of Approval DSBB - Dissemination Standards Bulletin Board

DSD - Data Structure Definitions

Page 117: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

117/119

DUR - Datawarehouse and Register Development Program

DwB - Data without Boundaries

e-CORA - extended CORA model ECB – European Central Bank

ECOSOC - United Nations Economic and Social Council EEA countries – European Economic Area

EFTA countries – The European Free Trade Association

EGP - Classification schema of Erikson, Goldthorpe, Portocarero

ELSST - European Language Social Science Thesaurus

ESeC - European Socio-economic Classification

ESMS - Euro SDMX Metadata Structure

ESQR - European Statistical System Standard for Quality Reports

ESS – European Statistical System

ESSnet – the Centres and Networks of Excellence (Cenex, now called ESSnet)

ESSnet CORE - continues the work of a previous ESSnet called CORA

EU - European Union

EU-SILC – European Union Statistics on Income and Living Conditions

EUROSTAT – The statistical office of the European Union

FDZ – the Research Data Centre

FGDC – Federal Geographic Data Committee

FP7 - Framework 7 Programme FSD – Finnish Social Science Data Archive

GBPMS – Generic Statistical Business Process Model (stämmer det? GSBPM står också för detta, är GBPMS bara en felaktig skrivning) GENESIS – GENESIS is used to send data and metadata to Eurostat with the SDMX-standard. GSBPM - Generic Statistical Business Process Model GSIM - Generic Statistical Information Model HLG - High-level group for the Modernisation of Statistical Production and Services

HLG-BAS - High-level group for strategic development in business architecture in statistics

IAB – the Institute for Employment Research

ICPSR – Interuniversity Consortium for Political and Social Research

IFDO - The International Federation of Data Organizations for Social Science

ILO - International Labour Organization

IMD SDMS – Integrated Metadata Driven Statistical Data Management System

IMF – International Monetary Foundation IMTP - Information Management Transformation Program

INSPIRE - Infrastructure for Spatial Information in the European Community

IS - International Standard

ISCED - International Standard Classification of Education ISCO - International Standard Classification of Occupations

ISEI - International Socio-Economic Index of Occupational Status

ISO 11179 – Information Technology -- Metadata registries (MDR) ISO/IEC 11179 – an international standard for representing metadata for an organization in a metadata registry

ISSC - International Social Science Council JSP - Java Server Pages

KDB – classification database

Page 118: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

118/119

LFS - Labour Force Survey

MARC – MAchine-Readable Cataloguing MCV - Metadata Common Vocabulary

MetaPlus – A component of Statistics Sweden’s metadata system

METIS – Metadata Information System

MEETS – Modernisation of European Enterprise and Trade Statistics

METS – Metadata Encoding & Transmition Standard MIDRAS - The Micro Data Remote Access System

MOF - Meta-Object Facility

MR - Metadata Registry

MRI – Magnetic Resonance Imaging

MS – Member States NSI - National Statistical Institute

NACE - “Nomenclature générale des Activités économiques dans les Communautés Européennes”

NUTS - Nomenclature of territorial units for statistics OAIS - Open Archival Information System

ODM - Ontology Definition Metamodel OECD – Organisation for Economic Co-operation and Development

OGC - Open Geospatial Consortium

OMG - Object Management Group

OS – Official Statistics

PDI – Preservation Description Information PID - Persistent Identifier

PPP – Preparatory Phase Project

PREMIS – Preservation Metadata Maintenance Activity

RAMON - Eurostats Metadata Server

RDF Resource Description Framework REEM - Remote Execution Environment for Microdata

SDDS - Special Data Dissemination Standards

SDMX – Statistical Data and Metadata eXchange

SDMX MCV – SDMX Metadata Common Vocabulary

SDMX/DDI dialogue - Dialogue engages the two standards bodies SIOPS - Standard International Occupational Prestige Scale SIP - Submission Information Package

SIS – Software Installation Script

SKOS - Simple Knowledge Organization System

SMS – statistical metainformation system

SN – Statistics Netherlands

SNA - System of National Accounts

T - Task

TEI - Text Encoding Initiative

TIC - Technical Implementation Committee

UML - Unified Modeling Language

UN – United Nations

UNECE – United Nations Economic Commission for Europé

UNSD – United Nations Statistics Division

Page 119: DELIVERABLE D7.1 Metadata Standards usage and · PDF file · 2014-03-22Project N°: 262608 ACRONYM: Data without Boundaries DELIVERABLE D7.1 ... Data Documentation Initiative ...

119/119

WB – World Bank

WP - Work Package

XBRL - eXtensible Business Reporting Language

XMI - XML Metadata Interchange

XML – Extensible Markup Language