3rd International Digital Curation ConferenceWashington, DC, Dec 2007
Paper Presentations: Interoperability, Metadata & Standards
Data Documentation Initiative: Toward a Standard for the Social Sciences
Mary Vardigan, Pascal Heus, Wendy Thomas
ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center
DDI Alliance – http://www.ddialliance.org
What is Metadata?
• Common definition: Data about Data
Unlabeled stuff Labeled stuff
The bean example is taken from: A Manager’sIntroduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf
DDI Alliance – http://www.ddialliance.org
Managing data and metadata is challenging!
We are in charge of the data. We support our users but also need to protect our respondents!
We want easy access to high quality and well documented data!
We need to collect the information from the producers, preserve it, and provide access to our users!
Producers
Librarians
Users
General Public
Policy Makers
Sponsors
Media/Press
Academic
Business
Government
We have an information
management problem
DDI Alliance – http://www.ddialliance.org
Metadata issues
• Without producer / archive metadata– researchers can’t work discover data or perform efficient
analysis
• Without researcher metadata– Research process is not documented and cannot be
reproduced (Gary King replication standard!)– Other researchers are not aware of what has been done
(duplication / lack of visibility)– Producer don’t know about data usage and quality issues
• Without standards– Such information can’t be properly managed and
exchanged between actors or with the public
• Without tools:– We can’t capture, preserve or share knowledge
DDI Alliance – http://www.ddialliance.org
XML to the rescue!
• XML stands for eXtensible Markup Language• Technology that is driving today’s web service
oriented architecture of the Internet and Intranets• Using XML, we can capture, structure, transform,
discover, exchange, query, edit and secure metadata and data
• XML is platform & language independent and can be used by everyone
• XML is both machine and human readable• XML is non-proprietary, public domain and many
open tools exist• Domain specific standards are available!
DDI Alliance – http://www.ddialliance.org
Suggested XML metadata specifications for socio-economic data
• Statistical Data and Metadata Exchange (SDMX)– Macrodata, time series, indicators, registries– http://www.sdmx.org
• Data Documentation Initiative (DDI)– Microdata (surveys, studies)– http://www.ddialliance.org
• ISO 11179– Semantic modeling, concepts, registries– http://metadata-standards.org/11179/
• ISO 19115– Geography– http://www.isotc211.org/
• Dublin Core– Resources (documentation, images, multimedia)– http://www.dublincore.org
DDI Alliance – http://www.ddialliance.org
The Data Documentation Initiative (DDI)
• International XML based specification for the documentation of social and behavioral data– Started in 1995, now driven by DDI Alliance (30+
members)– Became XML specification in 2000 (v1.0) – Current version is 2.1 with focus on archiving
(survey/codebook)• New Version 3.0 (2008)
– Focus on entire survey “Life Cycle”– Provide comprehensive metadata on the entire survey
process and usage– Aligned on other metadata standards (DC, MARC, ISO
11179, SDMX, …)– Include machine actionable elements to facilitate
processing, discovery and analysis• DDI is being adopted by producers/archives but
needs to extends to the researchers (who are using the data!)
DDI Alliance – http://www.ddialliance.org
DDI 3.0 and the Survey Life Cycle
• A survey is not a static process: It dynamically evolved across time and involves many agencies/individuals
• DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”• 3.0 focus on metadata reuse (minimizes redundancies/discrepancies,
support comparison)• Also supports multilingual, grouping, geography, and others• 3.0 is extensible
DDI Alliance – http://www.ddialliance.org
Metadata Components
• Producer metadata:– Codebook, questionnaires, reports,
methodologies, processing, scripts, quality, admin, etc.
• Research metadata– Recodes, analysis, table, scripts, papers, logs,
data quality, usage– Citations, references– Activities, discussions, knowledge base
• Outputs– Papers, presentations, tables, reports
DDI Alliance – http://www.ddialliance.org
When to capture metadata?
• Metadata must be captured at the time the event occurs! (not after the facts)
• Documenting after the facts leads to considerable loss of information
• This is true for producers and researchers
DDI Alliance – http://www.ddialliance.org
Solutions?
• Simple solutions: use good practices– File and variable naming conventions, sound
statistical methods (metadata in names!)– Comment source code– Document your work
• Adopt DDI & other standard based metadata solutions:– DDI tools, citation database, source code level
metadata capture, variable recodes, table disclosure, data quality feedback, comparability
• Take advantage of web based collaborative tools– Wiki, blogs, discussion groups, lists
DDI Alliance – http://www.ddialliance.org
Benefits
• Comprehensive data documentation– Through good metadata practices, comprehensive
documentation captured by producers, librarians and users is available to ALL researchers
• Preservation, integration and sharing of knowledge– Research process is captured and preserved in standard
formats– Research knowledge becomes integrant part of the survey
and available to all – Reduce duplication of efforts and facilitates reuse– Producer gets feedback from the data users (usage, quality
issues), which lead to better and more relevant data
• Research outputs and dissemination– Facilitate production of research outputs– Facilitate dissemination and fosters broader visibility of
research results
DDI Alliance – http://www.ddialliance.org
Conclusions
• Metadata is a crucial component of social and behavioral science
• The Data Documentation Initiative (DDI) is a globally accepted specification for capturing microdata documentation and knowledge
• Latest version 3.0 extends into the entire survey Life Cycle
• Producers and data archives are rapidly adopting metadata standards.
• This adoption process should extend into the research community
• Best practices in data and metadata management benefit all users and have the potential to change the way we conduct research
• http://www.ddialliance.org or [email protected]
Top Related