Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory...

42
Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation

Transcript of Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory...

Page 1: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Toward a Global Infrastructure for Data and Metadata:

The Open Data Foundation

Arofan Gregory

Executive Manager

The Open Data Foundation

Page 2: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Something Really Amazing• Spaceships aren’t that amazing…• Aliens aren’t that amazing…• Mobile telephones aren’t that amazing…

• These devices have access to the complete set of human (well, Federation) knowledge, via ship’s computer - That’s AMAZING!

An Epic Feat of Data Standardization!

• Tasers aren’t that amazing…

Page 3: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

A Big Idea

• It might seem too outrageous to imagine that every data source could be accessible and usable via a global network, but…– Consider all the domain “grids” which are emerging– Consider the number of modern technologies for

leveraging data across networks– Consider the tools we have for solving problems of

semantic interoperability

• Maybe Star Trek was only a few decades ahead of its time!

Page 4: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Something Missing…

• Technology alone cannot solve this problem• For centuries, scientists, librarians, and

archivists have worked to perfect taxonomies and classifications for organizing and accessing human knowledge– Technologists can’t replace the disciplines which have

evolved from this work with technology alone– They can only automate it

• Having an ontology doesn’t mean you have an agreed, tried, and workable standard classification system!– A thousand little ontologies still produce chaos!

Page 5: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Why Now?• The idea of a global data infrastructure is

practical today because…– We have good, standards-based, networked

technology– We have a highly sophisticated population of

archivists and librarians who understand the challenges of large-scale classification, for all types of media

– We have an emerging culture of data producers and users who are beginning to understand the potential offered by modern technology

Page 6: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

The Open Data Movement

From Wikipedia:

“Open Data is a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control. It has a similar ethos to a number of other "Open" movements and communities such as Open Source and Open Access.”

Page 7: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

The Open Data Foundation (ODaF)

• Although we respect this traditional goal of the Open Data movement, we feel that the technology issues, as opposed to the legal ones, have a different focus:– Much public data is inaccessible or unusable– Confidential data is less accessible than it could be – The collection and publication of some critical data is

lacking, notably in the Developing World

• It is not enough to put the rights to data into the public domain – it must also be practically accessible to all potential users

Page 8: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

What Do We Mean by “Data”?

• Official statistics collected by government agencies and international organizations– Usually aggregates and time-series data– Covers a huge range of social, scientific, and

economic topics

• Numeric research data supporting social sciences and hard sciences– Often lower-level “microdata”– May be gathered by survey or sourced from registers

• Qualitative data used in social sciences research– Not research papers, but source data (eg, interviews)

Page 9: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF’s Mission

• To bring together individuals from the statistics community, the research community, and the technology standards community

• To promote the creation of a global infrastructure for data and metadata by providing open-source tools and supporting the adoption of a coordinated set of open technology standards

• To promote the creation and use of knowledge, and fact-based decision-making, through improved access to data and metadata

Page 10: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF - Timeline

• The idea started at IASSIST 2006 in Edinburgh• Incorporated in 2006 as a US scientific non-profit• First face-to-face meeting in Washington DC in

December 2006 at the National Opinion Research Center (NORC)

• September 2007: next face-to-face meeting in St. Helena, California

• Next face-to-face meeting: NORC in DC, December 2007, followed by a European meeting (UK, Netherlands, or Germany) in early 2008

• NOTE: We are a virtual organization – we don’t rely on face-to-face meetings for conducting work (Thanks, Skype!)

Page 11: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF - Directors

– Bob Glushko – head of the UC Berkeley Center for Document Engineering and member of OASIS Board of Directors

– Julia Lane – Vice President, NORC and world-class expert in data confidentiality issues

– Ernie Boyko, former President of IASSIST– Rune Gloersen – head of IT at Statistics

Norway

Page 12: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF - Executive Managers• Arofan Gregory – background in SGML/XML, technology

standards (notably ebXML, UBL, UN/CEFACT, ISO TC154, DDI, and SDMX)

• Pascal Heus - lead developer for World Bank and International Household Survey Network, much experience with field-work in Africa, DDI implementor

• Chris Nelson – veteran OMGer (CWM), worked with many technology standards (UN/EDIFACT, GESMES, ebXML, SDMX, DDI), consummate UML modeler

• Jostein Ryssevik – former CEO of Nesstar North America, now with Ideas2Evidence, associated with Gallup Europe; longtime DDI implementor

Page 13: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF - Advisors

• Sandra Cannon - Board of Governors of the Federal Reserve System

• Gilles Collette- Visual Communications, Pan-American Health Organization (WHO)

• Daniel Gillman - US Bureau of Labor Statistics • Eduardo Gutentag – Chair, OASIS Board of

Directors • Paul Johanis - Statistics Canada • Graeme Oakley - Australian Bureau of Statistics • Dr. Andrew Nelson - Joint Research Centre of

the European Commission

Page 14: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF – Advisors (cont.)

• Ken Miller- UK Data Archive / Economic and Social Data Service

• Duane Nickull- Chair, OASIS SOA Reference Architecture TC

• Juraj Riecan - United Nations Economic Commission for Europe (UNECE)

• Gerard Salou - European Central Bank • Professor Bo Sundgren, Ph.D - Statistics Sweden • Wendy Thomas - Minnesota Population Center,

University of Minnesota • Wendy Watkins - Data Centre Coordinator, Maps, Data

and Government Information Centre, Carleton University Library

Page 15: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF - Organization

• We are project-oriented:– Any member can participate in projects

• May be paid consultants for specific work, or volunteers– Project proposal is put before Directors by

Management team in consultation with Board of Advisors for approval

– Work is conducted by specified project team, using specified resources

– All Directors, Managers, and Advisors are volunteers• Work is focused on coordination of projects, with

resources coming from other participating organizations

Page 16: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

The Problem Space

• The flows of data can be seen as forming a type of “supply chain”– Collected data are aggregated and

reported/disseminated to other organizations– The points where data are exchanged can be

problematic:• Loss of metadata• No automated integration into receiving systems• Time- and resource-intensive

• This exchange of data and metadata must be managed in an efficient, standard fashion if we are to build a global infrastructure

Page 17: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

International OrganisationsRegional Organisations

accountsstatistics

Banks, CorporatesIndividual Households

trans-actions

accounts

National Statistical Organisations

accountsstatistics

180

+ C

ount

ries

180

+ C

ount

ries

Inte

rnet

, S

earc

h, N

avig

atio

nIn

tern

et,

Sea

rch,

Nav

igat

ionwww.z.org

www.hub.org

www.x.org

www.y.org

Page 18: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Data Lifecycle Model

• Within each level of the information chain, we see a process:– Data sourcing or collection– Data processing (re-coding, harmonization, aggregation)– Data dissemination and archiving– Data reporting and re-purposing

• Throughout this cycle, each step generates important metadata which can be captured to provide better downstream processing and understanding of the data

• Today, this metadata is often lost– Between steps of the lifecycle– When the final data product is exchanged in the information

chain

Page 19: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Data Lifecycle Model

Page 20: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

An Observation on Organizations

• Governmental, supra-governmental, and research organizations which produce data have as a primary mission the collection of data– To support policy making– To support research– To support regulatory activities

• They do not have a primary mission to focus on the exchange of data with other organizations– This is often perceived as a burden rather than a part of the

primary mission of the organization• They are often not well-skilled in the latest technology for

data exchange and interoperability• Standards organizations tend to be too busy promoting

their own standards to be worried about how users might combine them with other standards in implementations

Page 21: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Issues

• Issues with public data:– Public data which is not released: "Users won't

understand it“ - Too little metadata!– Public data which is unusable: formats are bad, too

little metadata about formats, terminology, methodology, coding, and concepts

– Public data which cannot be accessed because its location/existence is not known

– Public data which loses value because it cannot be published and accessed in a timely manner

Page 22: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Issues (cont.)• Issues with confidential data:

– Public data sets derived from confidential data have been damaged by anonymization

– Confidential data which are not seen because access produces unacceptable disclosure risk

• There are secure “Research Data Centers” for allowing access to confidential data to qualified researchers– These are not as accessible or as open as they could

be, due to their physical nature and the fact that they generally are not in communication with each other

– Better metadata management and shared metadata leads to a better understanding of disclosure risk, and thus improved access for researchers

Page 23: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Note on Data Confidentiality

• You might think proponents of Open Data would disapprove of confidential data– Response rates are falling for all types of survey data

collection due to fears of disclosure– There are many new ways of collecting data about

individuals (RFID chips, security cameras, cell phones, etc.)

– The standards for data confidentiality are there for a good reason – to protect individuals!

• We believe that confidential data should be as open as possible and not more!

Page 24: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Issues (cont.)

• Issues with data in the Developing World:– Absent data due to inefficient or nonexistent data

collection/publication– Unsustainable data collection/publication produces

insufficient continuity of data• Once educated, IT workers get jobs in Europe and America• Funding is typically not on-going, but only for a limited period

• The vast majority of the world’s population is in the Developing World, and the trend is increasing– To understand our world and make good policy, we

must support sustainable data collection and publication about this huge segment of the population!

Page 25: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

How Can We Solve These Problems?

• Many of these issues can be solved with modern technology– Better documentation using standard metadata

formats– Better mechanisms for data discovery and access

between organizations of all types– Better mechanisms for managing semantic

interoperability– Free or inexpensive tools for metadata capture and

data/metadata exchange– Improved mechanisms for sustainable collection and

publication of data in the Developing World

Page 26: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF’s Vision• A network of standard, federated registries provide the

ability to discover data and metadata globally• Standard data and metadata formats and models

provide the basis for automated use and integration between applications

• Standard semantic registries and mappings to standard classifications/ontologies allow for semantic interoperability

• All of these standards would be coordinated to work together predictably in an open architecture

• Domains are self-governing – each has its own registries, classifications, etc. There must be minimum governance at the center for operation of the entire network.– Interoperability through mapping to the standards-based open

architecture

Page 27: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Which Standards?

• ISO 17369 Statistical Data and Metadata Exchange (SDMX)

• Data Documentation Initiative (DDI)• ISO/IEC 11179 Metadata Registries• ISO 19115 Digital Geographic Data• Metadata Encoding and Transmission Standard

(METS)• Extensible Business Reporting Language

(XBRL)• Many others (SOA, ebXML, Web Services,

Semantic Web, Dublin Core)

Page 28: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ISO 17369 SDMX

• Produced by official statistics organizations (BIS, ECB, Eurostat, IMF, OECD, World Bank, UN/SD)

• Now available as a 2.0 version– Supports all aggregate data & time-series– Supports all types of metadata (structural & “reference”

metadata)– Provides standard registry interfaces for data sourcing and

exchange (not specific to SDMX formats)• Based on a formal meta-model (similar to OMG’s

Common Warehouse Metamodel, but more focused) • Data and metadata formats and classifications are

completely configurable• Also provides recommendations for concepts, codes,

and classifications for official statistics

Page 29: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Data Documentation Initiative (DDI)

• Produced by a consortium of members (data archives and libraries, national statistical organizations, universities, etc.)

• Now in 3.0 candidate version which supports full data lifecycle (release Q1 2008)

• Fine-grained metadata for describing:– Data collection (surveys, registers, etc.)– Data processing (for recodes, harmonization, data comparison)– Data archiving and dissemination– Data can be stored inline or in native file formats– Supports microdata and n-dimensional cubes

• Aligned with SDMX, ISO/IEC 11179, METS, ISO 19115, and Dublin Core

Page 30: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ISO/IEC 11179 Metadata Registries

• Model for managing semantics of a data dictionary and the lifecycle of concepts/terms

• There is a separate ISO specification under development for providing bindings in XML, C, and other languages

• In widespread use in many other standards, as well as for terminology management within large organizations

Page 31: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ISO 19115 Digital Geographic Data

• Provides the standard metadata model for describing geographies

• Implemented in several XML standards, including DDI (there is also a standard ISO XML)

• Well-accepted within the technology community and among communities of use (geographers, etc.)

Page 32: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

METS

• A packaging standard for digital libraries/archives– Pulls together associated sets of files and establishes

their relationship to one another

• Can carry metadata payloads in their native XML namespaces as “metadata sections”

• Cooperatively developed with DDI– METS left the description of data to DDI– DDI supports METS for archival packaging

Page 33: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

XBRL

• XML standard from the accounting world for describing business reports

• Widely used by banking supervisory organizations– Major source of financial statistics

• Well marketed and widely supported

• Ongoing alignment project with SDMX

Page 34: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF Vision - StandardsFederated Registries (Based on SDMX, ebXML, web services)

Aggregated Data/Metadata

(SDMX)

XBRLBusiness Reports

DDI Microdata

Sets

ISO 19115Geographies

Dublin CoreCitations

Used in

registered

References to source data

Standard classifications

Organizedusing

ISO 11179

Semantic definitions

METS Packaging

Page 35: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF Activities

• We are early in our efforts to create such an infrastructure– To establish a sufficient set of well-aligned

standards– To build open-source tools to support the use

of these standards– To otherwise support the adoption and use of

standard models, formats, and registries

Page 36: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF Projects• Standards Alignment Project: on-going effort to

establish an agreed mapping between the mentioned standards

• SDMX Registry Hosting: Host SDMX registries on our own servers for those wishing to do prototype implementations

• DDI Development Support: provide hosting and infrastructure to support the use and development of DDI 3.0

• DDI Foundation Tools Program: providing technical coordination and infrastructure for a multi-institution effort to build an Eclipse-based open-source toolkit for working with DDI 3.0, including transforms to/from SAS, SPSS, and STATA

• SDMX Browser: Developing an open-source tool (using Adobe Flex) for collecting, updating, and viewing statistical data in SDMX format – working in informal collaboration with ECB and OECD

Page 37: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF Project (cont.)

• DeXtris Browser: beta end-user tool for viewing and searching DDI 1/2.* and 3.0 metadata files – supports version transformations

• UKDA QuDEX Draft Standard: Working as technical support for UKDA in their development of a standard for qualitative metadata (may become part of DDI)

• Canadian RDC Network: Providing technical advice to the Canadian RDC network on metadata management and implementation in support of DDI 3.0.

• NORC Virtual Data Enclave: Working to help develop and deploy the first “virtual” RDC in the US with data from NIST, others

• Also involved in proposals to build a European “virtual” RDC

Page 38: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF Projects (cont.)

• Have contributed to the creation of training materials and online support for DDI 3.0, for general use

• White papers: DDI & SDMX (a comparison), guidelines for open-source tools development, others

• Member, DDI Alliance• Sponsored IASSIST 2007 in Montreal (planned

also for IASSIST 2008 in Palo Alto, CA)

Page 39: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

ODaF - Where We Are Today

• New organization, lots of interest and support thus far

• Interesting projects are emerging, some early deliverables have been finished

• Looking for participation from interested, serious individuals

• Still at the stage of supporting and promoting a coordinated set of standards

Page 40: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

To Learn More…

• ODaF: www.opendatafoundation.org• SDMX: www.sdmx.org• DDI: www.ddialliance.org• ISO/IEC 11179: http://metadata-stds.org/11179/• METS: http://www.loc.gov/standards/mets/• ISO 19115:

http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=26020

• XBRL: http://www.xbrl.org/Home/

Page 41: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Tools and Training

• For some free SDMX tools, implementation support site, and SDMX and DDI training courses: www.metadatatechnology.com

Page 42: Toward a Global Infrastructure for Data and Metadata: The Open Data Foundation Arofan Gregory Executive Manager The Open Data Foundation.

Questions?