Combining Metadata Standards: Approaches and Benefits
description
Transcript of Combining Metadata Standards: Approaches and Benefits
![Page 1: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/1.jpg)
Combining Metadata Standards: Approaches and Benefits
Arofan Gregory
Open Data Foundation
![Page 2: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/2.jpg)
Overview
• Recent events of interest
• The Standards: Comparison and Explanation
• Emerging Implementation Approaches– DDI and SDMX– SDMX and the Semantic Web Technologies– Classifications & Multiple Standards
• Ideas about Future Work
![Page 3: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/3.jpg)
Recent Events of InterestNote: Some of these
events/implementations have been or will be described in detail in other papers – they are only mentioned here.
• Schloss Dagstuhl, Germany, November 2009 (DDI 3 Workshop)– SDMX 2.0 – DDI 3 field-level mapping work
started– Topic: DDI and the Semantic Web???
![Page 4: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/4.jpg)
Recent Events of Interest (2)
• Semantic Web and SDMX– ONS hosted 2-day meeting in the UK, February 2009
(produced draft “SDMX-RDF”)– Banca d’Italia has a prototype project– New project launched at University of Tillburg in the
Netherlands (RDF expression of OECD SDMX data)
• Australian Bureau of Statistics (ABS) starts looking at SDMX and DDI to support data production lifecycle– Prototype implementations– Some other NSIs also very interested
![Page 5: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/5.jpg)
Recent Events of Interest (3)
• Classifications and ISO/IEC 11179– Australia: Government agencies looking to
exchange classifications with ABS from existing ISO/IEC 11179 system, using SDMX, DDI
– Statistics Canada: Evaluation of IMDB (ISO/IEC 11179-based metadata repository) for use in coordination with Canadian RDC Network (based on DDI 3)
![Page 6: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/6.jpg)
What Does This Mean?
• Not a complete list of events/implementations, but…
• Indicates the interest we are seeing in the combined use of standards!– These are not just experiments!– Organizations are looking at implementation
in a serious way now
![Page 7: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/7.jpg)
Characterizing the Standards
• SDMX:– Data structures and formats– Reference metadata structures and formats– Web-services architecture based on registry services– Content-oriented gudelines
• ISO/IEC 11179:– Model for managing concepts and data elements– Metadata registries and lifecycle
• ISO 19115:– Standard metadata model for geographies– Used by DDI as geographical model
![Page 8: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/8.jpg)
Characterizing the Standards (2)
• Dublin Core:– Citation metadata– Widely used in the Semantic Web– Used natively by DDI for citations
• Semantic Web/ “Linked Data” / RDF– See “Open Issues on the Semantic Web”
• DDI 3:– Will give more detail, as it is not as familiar to
the METIS community…
![Page 9: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/9.jpg)
Characterizing the Standards (3)
• DDI 1.*/2.* was a standard used by archives and data libraries– Based on a “codebook” model– Used by some NSIs, especially in the developing world because
of the IHSN Metadata Management Toolkit– Used by the European network of data archives, CESSDA– Used by many data archives in North America
• Documentation of a single “Study” (survey)– Designed to help researchers find and use microdata
• DDI 3 is more ambitious – capture and use of metadata throughout the entire data lifecycle
![Page 10: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/10.jpg)
DDI 3 Lifecycle Model
Notice: This is very like a high-level view of the METIS model!
![Page 11: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/11.jpg)
Characterizing the Standards (4)
• DDI 3 provides machine-actionable metadata to support “metadata-driven” systems throughout the lifecycle– Focus is on upstream metadata capture and reuse
• Describes tabulation/aggregation of microdata• Provides support for comparison across surveys,
detailed geography, data processing, register data
• Aggregate “NCube” model aligned with SDMX• No architecture/web services support (yet)
![Page 12: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/12.jpg)
An Observation…
• It is easy to say that two standards are “aligned”– Many of these standards were intentionally
aligned as they were developed
• It is much more difficult to understand how to use them in combination effectively…
![Page 13: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/13.jpg)
Approaches and Benefits
• SDMX and DDI– DDI microdata production/SDMX aggregate
dissemination– Using SDMX data in DDI-based systems (combining
aggregates and microdata)– Combined SDMX/DDI supporting the entire data
lifecycle– DDI register data reported to SDMX collection system
• SDMX and the Semantic Web• Classifications and the Standards
![Page 14: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/14.jpg)
Inputdata
Surveys
RegistersCleaning, editing,estimation, aggregation,etc.
Disseminationdata
DDI 3 Metadata
Website/Web Service
SDMX-MLData, Metadata, Structure
![Page 15: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/15.jpg)
DDI – SDMX: Benefits
• The benefits of this approach are those found by using the standards generally– Supports “metadata-driven” system for data
production throughout the lifecycle (DDI)– Metadata-rich dissemination format, preferred
by data collectors (SDMX)– Shared tools; SDMX registry services, Web
Services for discovery and use of aggregates
![Page 16: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/16.jpg)
SDMX – DDI: Integrating Aggregates and Microdata
• Scenario is common in some research– Economic data is often only available as
aggregates– Challenge is to combine aggregates and other
microdata
![Page 17: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/17.jpg)
SDMX Web Service
Data archive/repository
Surveys
Registers
(DDI 3)
(DDI 3)
SDMX-to-DDI 3 Transform
Processing to produceIntegrated data and Metadata (DDI 3)
![Page 18: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/18.jpg)
SDMX – DDI: Benefits
• Allows for easy use of official statistics by researchers– Solves problems of combining aggregates
and microdata
• Note: This does not involve dis-aggregation of published data– Structural transformation only, to allow DDI 3
systems to process aggregates easily
![Page 19: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/19.jpg)
DDI + SDMX: The Data Lifecycle
• Uses a metadata model capable of expression as either SDMX or DDI, depending
• Provides support for process management– Uses many features of SDMX (process
model, structure sets, reporting taxonomies, etc.)
• Uses SDMX architecture/services model– Designed to allow incorporation of other
standards
![Page 20: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/20.jpg)
Process-management system
SDMX Registry
Data and metadata repositories/application databases
Input datastore
Dissemination data store
Surveys
Registers
(DDI 3)
(DDI 3)
All registry interactions use SDMX
(BPML)
(SDMX)
Web site/Print/Web Services
(SDMX, DDI, etc.)
Interactions between systems are DDI orSDMX Web Services,as appropriate
![Page 21: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/21.jpg)
SDMX + DDI: Benefits
• Leverages Web-Services technologies (registry, event triggers, etc.) for efficient automation, migration, flexibility
• Choice of tools is broad– Use the “best” format for any given task
• All the benefits of DDI-SDMX case
• Good support for process management as well as data management
![Page 22: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/22.jpg)
SDMX and the Semantic Web Technologies
• Potentially applies to other standards as well (DDI, ISO/IEC 11179, etc.)
• Note that Semantic Web technologies only apply to dissemination– Not designed to support data production
• Terms:– “Raw data” in an SW context does not mean “raw
data”– “Data” in an SW context means “anything that can be
described using RDF” – not numeric data
![Page 23: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/23.jpg)
Assumptions
• Creation of a harmonized statistical model based on proven models/standards, but expressed as RDF (“ontology” or “vocabulary” in SW terms)
• Implementation of an “SDMX-RDF” in standard SDMX dissemination packages
![Page 24: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/24.jpg)
Dissemination data store (SDMX)
(SDMX-driven production system)
SDMX Web Service
Internal (production environment) External (dissemination to Web)
(SDMX-ML)
“SDMX-RDF”Transform Triplestore
(SDMX-RDF)
(SPARQLQueries)
(RDF)
![Page 25: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/25.jpg)
SDMX and the Semantic Web: Benefits
• Leverages the “Linked Data” phenomenon without requiring a deep understanding of RDF, etc.
• Uses existing standards/models and best practices to do “heavy lifting” (data production)
• Puts a lot of reliable, quality data into the “Linked Data Web”– Helps address issues of provenance
![Page 26: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/26.jpg)
Warning
• RDF is verbose!
• 4.5 Megs of GESMES/TS = 45 Megs of “compact” SDMX-ML XML = 420 Megs of RDF triples
• This may encourage the on-demand production of RDF data from web services, rather than static files
![Page 27: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/27.jpg)
Standards and Classifications
• Some maintainers of standard classifications are looking at expressing them in useful formats (SDMX, DDI)– This is an easy thing to do– It is very useful: promotes re-use,
comparability, etc.– Could apply to Semantic Web RDF
expressions as well as XML-based standards
![Page 28: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/28.jpg)
Ideas for Future Work
• Endorse SDMX – DDI mappings now being produced
• Develop an “SDMX-RDF” (?) or…• Develop a harmonized statistical model for
expression in RDF (based on DDI, SDMX, ISO/IEC 11179) (?)– Encourage tools developers to implement it in
standard dissemination packages
• Publish standard classifications in standard formats
![Page 29: Combining Metadata Standards: Approaches and Benefits](https://reader035.fdocuments.in/reader035/viewer/2022062519/56815156550346895dbf786c/html5/thumbnails/29.jpg)
Summary
• Combined use of standards is becoming a reality
• Proactive engagement with the Semantic Web world could provide benefits to all concerned parties, as well as users