Pascal Heus Open Data Foundation pheus@opendatafoundation opendatafoundation

36
Workshop on Metadata Standards and Best Practices November 19-20 th , 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation [email protected] http:// www.opendatafoundation.org

description

Workshop on Metadata Standards and Best Practices November 19-20 th , 2007 Session 1 Leveraging Metadata Standards in RDC. Pascal Heus Open Data Foundation [email protected] http://www.opendatafoundation.org. Outline. PART 1: General issues & ODaF - PowerPoint PPT Presentation

Transcript of Pascal Heus Open Data Foundation pheus@opendatafoundation opendatafoundation

Workshop on Metadata Standards and Best PracticesNovember 19-20th, 2007

Session 1Leveraging Metadata Standards in RDC

Pascal Heus

Open Data Foundation

[email protected]

http://www.opendatafoundation.org

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Outline

• PART 1: General issues & ODaF– Needs and challenges in statistical data and

metadata management– Metadata and XML solutions– Selecting specifications– Need for tools– Open Data Foundation

• PART 2: RDC Specific issues– Metadata in RDCs– Solutions and benefits– Tools and ongoing initiatives

• Conclusions / Q&A

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

What is Metadata?

• Common definition: Data about Data

Unlabeled stuff Labeled stuff

The bean example is taken from: A Manager’sIntroduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Managing data and metadata is challenging!

We are in charge of the data. We support our users but also need to protect our respondents!

We want easy access to high quality and well documented data!

We need to collect the information from the producers, preserve it, and provide access to our users!

Producers

Librarians

Users

General Public

Policy Makers

Sponsors

Media/Press

Academic

Business

Government

We have an information

management problem

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

XML to the rescue!

• XML is driving today’s web service oriented architecture of the Internet and Intranets

• Using XML, we can capture, structure, transform, discover, exchange, query, edit and secure metadata and data

• XML is platform & language independent and can be used by everyone

• XML is both machine and human readable• XML is non-proprietary, public domain and

many open tools exist• Domain specific standards are available!

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

XML Technical Overview

StructureDTD

XSchema

TransformXSL, XSLT

XSL-FO

DiscoverRegistriesDatabases

ExchangeWeb Services

SOAPREST

SearchXPath

XQuery

ManageSoftwareXForms

CaptureXML

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

XML Solutions

Producers

Librarians

Users

General Public

Policy Makers

Media/Press

Academic

Business

Government

Sponsors

XML Specs

Use our specifications and your will be happy! It will harmonize everything.

Great, I can provide public metadata!

Well documented data, here we come!

Now we can talk to each other!

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Let’s use XML, but….

Producers

Librarians

Users

XML Specs

Which specifications should we adopt?

How do we do this? Where are the tools and guidelines?

?Open Data Foundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation (ODaF)

• US Based non-profit organization, established 2006

• Directors, advisors and managers from statistical and ICT communities

• Project oriented• Mission

– Focus on socio-economic data– Adoption of global metadata standards – Coordinated development of open-source tools– Capacity building– Improving data and metadata accessibility and

overall quality – Operate at the global level

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Selecting XML specifications

• A single specification is not enough!– XML specifications commonly focus on a specific

area of knowledge and/or set of functionalities– Cannot answer the needs of all actors

• XML mappings between specifications are possible– Information can be converted from one domain to

another and be carried across communities

• Which ones should we use?– Fit for purpose– Widely accepted and supported– Can be mapped to a cross-domain family

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

A suggested set for socio-economic data

• Statistical Data and Metadata Exchange (SDMX)– Macrodata, time series, indicators, registries– http://www.sdmx.org

• Data Documentation Initiative (DDI)– Microdata (surveys, studies)– http://www.ddialliance.org

• ISO 11179– Semantic modeling, concepts, registries– http://metadata-standards.org/11179/

• ISO 19115– Geography– http://www.isotc211.org/

• Dublin Core– Resources (documentation, images, multimedia)– http://www.dublincore.org

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

The need for Tools

We produce data not tools! We don’t have the expertise.

We use data and software but we don’t build tools! We don’t have the expertise

We preserve and disseminate data not software! We don’t have the expertise

Producers

Librarians

Users

XML Specs

We set specifications and standards. Tools are not our mandate

Open Data Foundation

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Open Data Foundation

The need for Tools

Mandated to develop toolsProvide cross-domain expertise in ICT and statistics

Provide umbrella for coordinated developmentEnsure inter-operability

Outline harmonized architecture and environmentPromote open source / maximize reusability

Foster global registriesResources/Fund raising

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

ODaF Vision

• Promote and facilitate the production and use of “open data”– Public metadata, high quality, fully documented, respondent

protected, easy to find, accessible in accordance to statistical principles and legislations

• Foster a global harmonized framework– Facilitate the flow of data and metadata– Promotes dialog between all stakeholders

Unlock the Data!

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Some Projects & Ideas

• Guidelines for an harmonized architecture and development environment

• Roadmap for tools development• XML mappings• Facility to host development of open source

projects (GForge)• Provide hosting services for agencies• Implement registries / catalogs• Produce training and reference material• Technical support & capacity building• Advocacy

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

ODaF partners / clients

• Statistical agencies / producers• Data Archives• Academic & Research communities• Standard settings agencies & consortiums• Governmental organizations• International organizations• Open source community• Software developers• IT Vendors

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Growing solutions in a complex environment

USEPRODUCTION

PRESERVATION

ANALYSIS

DISCOVERY

DISSEMINATION

QUALITY

SECURITY

SPSS

Excel

SDDS

GDDS

Toolkit

Blaise

CSPro

SAS METADATAStata

DQAF

Access

Privacy Disclosure

Legal

ISO 11179

TECHNOLOGY

Accessibility

DDI

ISO 19115

SDMX

DCMIRegistries

XML

WebDatabases

XML-DB

Infrastructure

XSLTXPath SOAP

Programming

GIS

Warehouse

What are we concerned with?

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Growing solutions in a complex environment

USEPRODUCTION

PRESERVATION

ANALYSIS

DISCOVERY

DISSEMINATION

QUALITY

SECURITY

SPSS

Excel

SDDS

GDDS

Toolkit

Blaise

CSPro

SAS METADATAStata

DQAF

Access

Privacy Disclosure

Legal

ISO 11179

TECHNOLOGY

Accessibility

DDI

ISO 19115

SDMX

DCMIRegistries

XML

WebDatabases

XML-DB

Infrastructure

XSLTXPath SOAP

Programming

GIS

Warehouse

CHALLENGEWe need a set of tools that work

together in an harmonized framework. This requires

coordinated efforts and expertise from the various communities

OPEN DATA FOUNDATION• Provide cross-domain & IT expertise• Coordinate and support development• Knowledge sharing• Capacity Building• Provide global vision and guidance

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Challenges

• The technology is available today• The right people are available today• The need and the will are there• The real challenges are:

– Tools availability– Awareness / Understanding of technology– Change management– Coordination & Guidance– Focused resources and funding– Institutional commitment

• Learn for the past for a better future• It’s not about data, it’s about people

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Summary

• Managing data and metadata is challenging– Solutions exist to make it easier and provide

better information to unlock the data

• Adopt a set of specifications that answer your requirements and can connect across domains– DDI, SDMX, ISO 11179, Dublin Core, ISO 19115

• Promote the use and development of open tools, do not work in isolation, get the appropriate expertise– Open Data Foundation

PART 2: Metadata & RDCs

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

PART 2

• RDC metadata perspective• List of stakeholders / initiatives • Benefits of adopting metadata• Challenges• Tools demo (IHSN Toolkit)

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDC Objectives

• Provide a secure environment for the researcher to perform the in depth analysis of sensitive/confidential data in a cost effective way

• Facilitate the capture, sharing and dissemination of research knowledge

• Provide feedback to the producer on data usage and quality

• Exchange information with other RDC’s / agencies / public

• Overall: benefit all stakeholders: producers, librarians, researcher, general public, etc.

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDC metadata

• Simple access to data file and codebook is insufficient. Researcher need high quality comprehensive metadata and a collaborative environment to promote dynamic research

• Traditionally, survey metadata has focused on archiving/preservation (current DDI 1/2.x)

• This however insufficient and should extended into both the survey production process and the secondary use of the data

• New DDI 3.0 meets such requirements• RDC ideal environment for capture of

researcher metadata

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

DDI 3.0 and the Survey Life Cycle

• A survey is not a static process• It dynamically evolved across time and involves many

agencies/individuals• DDI 2.x is about archiving, DDI 3.0 extends to life cycle• 3.0 is a modular framework available for multiple purposes (use cases)• Metadata is key to comprehensive capture of knowledge

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDC issues

• Without producer metadata– researchers can’t work discover data or perform efficient

work

• Without researcher metadata– producer don’t know about data usage and quality issues– Other researcher are not aware of what has been done

• Without standards– Information can’t be properly managed and exchanged

between agencies or with the public

• Without tools:– Can’t capture and preserve/share knowledge

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

When to capture metadata?

• Metadata must be captured at the time the event occurs!• Documenting after the facts leads to considerable loss of

information• This is true for producers and researchers

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDCRDC

RDC

Data

RDC Metadata Framework

Producers

Researcher

Producer/ArchiveMetadata

ResearchMetadata

Research Output

Public Usemetadata

External users

1. Producer provide data & basic docs

2. Need to enhance existing metadata

3. Start capturing researcher metadata

4. Knowledge grows and gets reused

5. Provides usage and quality feedback to producer / RDC6. Repeat across surveys/topics

7. Metadata facilitates output

8. Public metadata facilitates data discovery / fosters global knowledge

9. Metadata exchange between agencies

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Metadata Components

• Producer metadata:– Codebook, questionnaires, reports, methodologies,

processing, scripts, quality, admin, etc.

• Research metadata– Recodes, analysis, table, scripts, papers, references, logs,

quality, usage– Activities, discussions, knowledge base

• Outputs– Papers, presentations, tables

• Public metadata– Metadata stripped out of sensitive information (summary

statistics, sensitive variables, etc.)

• Metadata capture can be manual, semi-automated, automated

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDC Solutions

• Metadata management– Adopt standards and provide researcher with

comprehensive metadata– Use related tools to capture research process– Metadata mining and reporting utilities

• Collaborative environment– Used web technologies to foster a dynamic research

environment

• Connected and Remote enclaves– Connect RDCs through secure networks– Consider virtual data enclave or batch analysis

• Data disclosure– Protect respondent through sound data disclosure

techniques (using metadata as well)

• Train producers/researchers (methods and data)

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Solution Examples

• Simple solutions: use good practices– File and variable naming conventions, sound

statistical methods– Comment source code– Document the work

• Metadata solutions:– DDI tools, citation database, source code level

metadata capture, variable recodes, table disclosure, data quality feedback, comparability

• Web based collaboration environment– Wiki, blogs, discussion groups, events/todo

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Benefits (1)

• Comprehensive data documentation– Through good metadata practices,

comprehensive documentation is available to the researchers

• Preservation, integration and sharing of knowledge– Research process is captured and preserved in

harmonized formats– Research knowledge becomes integrant part of

the survey and available to all – Reduce duplication of efforts and facilitates reuse– Producer gets feedback from the data users

(usage, quality issues)

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Benefits (2)

• Research outputs and dissemination– Facilitate production of research outputs– Facilitate dissemination and fosters broader

visibility of research outputs

• Exchange of information– Metadata exchange between RDC, producers,

librarians– Importance of public metadata for sensitive

datasets– Facilitate data discovery (inside and outside

RDC)

• Advanced metadata mining / comparability

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

Answering the tools challenge

• Metadata standards are available but there is a lack of tools for metadata management

• Several efforts are ongoing– DDI Alliance, International Household Survey

Network, UK Data archive, NORC Data Enclave, Canada RDC, Open Data Foundation

– DDI Foundation Tools Program, UK DExT, Canada RDC, EU Framework 7

• Joint efforts will minimize costs, maximize reusability and foster tool harmonization / interoperability

• Open source model: availability & sustainability

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

RDC challenges

• Adopting good metadata management framework takes effort– Survey metadata must first be compiled– ICT capacity building and tools development– Producer and researchers need to be trained

• Not only a technological challenge – change management, training

• Leads to better research, shared knowledge, better user/producer dialog, improved data quality

• Meets the mandate of RDC

http://www.opendatafoundation.org Open Data Foundation – IZA 2007/11

IHSN Toolkit Quick Demo

Generate HTML based CD-ROM

Import metadata and prepare CD-ROM

Import data and compile metadata

1

2

3