Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007.
-
Upload
frank-harper -
Category
Documents
-
view
221 -
download
1
Transcript of Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007.
Introduction to DDI 3.0
Sanda Ionescu ICPSR
CESSDA Expert Seminar, September 2007
DDI Version 3.0
• Radically different.
• More complex…
(…but certainly doable!)
• Brings important benefits.
Workshop Schedule
14:30 – 15:10 Overview (40) 15:10 – 15:35 Structure and Technical Mechanisms (25) 15:35 – 15:45 Break (10) 15:45 – 16:10 Study Unit – Modules Content (25) 16:10 – 16:30 Variable Markup Example (20) 16:30 – 16:40 Break (10) 16:40 – 17:10 Grouping – Modules Content and Examples (30) 17:10 – 17:30 Getting Started (20)
DDI 3.0
Overview
DDI BackgroundDevelopment History
• 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks.
• First drafts used SGML, then converted to Web-friendly XML.
• 2000 – DDI Version 1.0 published as a mainly document- and codebook-centric standard.
DDI BackgroundDevelopment History
• 2003 – DDI Version 2.0 published with extended scope:– Aggregate data coverage (based on matrix structure)– Additional geographic representation to assist
geographic search systems and GIS users
• Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.
DDI BackgroundDevelopment History
• February 2003 – Formation of the DDI Alliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification.
http://www.ddialliance.org/
DDI BackgroundDevelopment History
Version 3.0:
• 2004-2006: Planning and Development
• November 2006: Internal Review
• February 2007: Public Review
• July 2007: Candidate Draft Release
http://www.ddialliance.org/ddi3/index.html
Benefits of using DDI as an XML-based standard
• Interoperability: – Enables seamless exchange and reuse by other systems.
• Repurposing: – Provides a core document from which different types of outputs can be
generated.
• Value-added documentation: – Tagging carries “intelligence” in the document by describing content.
• Enhanced Data Discovery: – Increases precision and granularity of searches.
• Support for Data Analysis: – Variables description is accepted as input by online analysis systems.
• Multiple presentation formats: – ASCII – text; PDF; HTML; RTF.
• Preservation-friendly: – Non-proprietary format.
Why DDI 3.0?
DDI 3.0 presents new features in response to:• Perceived needs of:
-Data users
-Data producers
-Data archivists/librarians
• Developments in documenting and archiving data• Advances in XML technology
DDI 3.0 and the Data Life Cycle Model
DDI Versions 1/2 were codebook-centric:
• Closely followed the structure of traditional print codebooks.
• Captured data documentation at a single, “frozen” point in time – archiving.
DDI 3.0 and the Data Life Cycle Model
Version 3.0 is Life Cycle oriented:-Designed to cover all stages in the life cycle of a
data collection: pre-production production post-production
secondary use
Life Cycle Coverage in DDI 3.0
Planning for the Study: Proposal / Design
Study Purpose / OutlineConceptsStudy PopulationAuthor(s)Funding Sources
Version 3.1Survey / Sample Design
Pre-testing
Life Cycle Coverage in DDI 3.0
Proposal becomes reality…
Data Collection methodology: sampling, time, etc.Instrument characteristics QuestionnaireData cleaning, weighting, coding, etc.
Life Cycle Coverage in DDI 3.0
Publishing the data…
Intellectual content:Variables, Categories, Codes.
Physical representation:Data format, Record structure, Statistics.
Life Cycle Coverage in DDI 3.0
Archiving / (Re)Distributing the data collection…
Processing checksHoldings, availability and access conditions
Life Cycle Coverage in DDI 3.0
DDI becomes “visible” to the outside world…
DDI Instance:Pulls together all life cycle stagesAcquires its own identity as an objectBecomes a tool for data discovery and analysis
Life Cycle Coverage in DDI 3.0
Secondary use of data – new conceptual framework…
New DDI Instance:New PurposeNew Logical ProductNew Physical Description of Data
DDI 3.0 and the Data Life Cycle Model
Advantages of Life Cycle orientation:
• Allows capture and preservation of metadata generated by different agents at different points in time.
• Facilitates tracking changes and updates in both data and documentation.
DDI 3.0 and the Data Life Cycle Model
Advantages of Life Cycle orientation:
• Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability.
• Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.
New / Extended Functionalities in DDI 3.0: Questionnaire
Versions 1/2:- No instrument coverage.- Question text only as part of variable description.- No documentation for question flow / conditions.
Version 3.0:- Full description of instrument as a separate entity.- Documents specific use of questions: flow, conditions,
loops.- Compatible with Computer Assisted Interviewing
software.
New / Extended Functionalities in DDI 3.0: Complex Data
Versions 1/2:- Inadequate representation of complex / hierarchical
data
Version 3.0:- Detailed documentation for complex / hierarchical
data
Logical structure of recordsRecord Types and RelationshipsRelevant variables: key-link, case identification, record type locator
Physical layout of records Single “hierarchical” file for all records, multiple rectangular files,
relational database, etc.
New / Extended Functionalities in DDI 3.0: Aggregate Data
Versions 1/2:- Initially designed for microdata only- Aggregate data section added in V 2.1 to support limited
representation (Census-type data, delimited files)
Version 3.0:- Adds support for tabular, spreadsheet-type, representation of
aggregate data- Aggregate data transport option: cell content may be
included inline with the data item description
New / Extended Functionalities in DDI 3.0: Data Transport
Versions 1/2:-None
Version 3.0:-In-line inclusion enabled for both aggregate data
and microdata
New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national Data
Comparability
Versions 1/2:-None
Version 3.0:-Grouping structure documents studies related on
one or several dimensions (time, geography, language, etc.) as well as their comparability
New / Extended Functionalities in DDI 3.0: Increased Multilingual Support
Versions 1/2:- Limited <anytag xml:lang=“”>
Version 3.0:- Support for multiple language use and translations <InternationalStringType xml:lang=“” translated=“” translatable=“”>
<Variable> <Label xml:lang=“ger” translated=“false” translatable=“true”> Geburtsjahr</Label> <Label xml:lang=“eng” translated=“true”>Year of Birth</Label> </Variable>
DDI 3.0 Specification: Schema-based
Versions 1/2:- DTD-based
Version 3.0:- Schema-based:
Data typing supports machine actionability
Use of namespaces supports- Modularity- Extensibility and reuse- Alignment with / use of other standards
DDI 3.0 Specification: Machine-actionable
Versions 1/2:- Machine-readable
Version 3.0:- Machine-actionable:
1. Data typing: increased use of controlled vocabularies and standard codes
2. Larger set of required elements
Predictable content = a more consistentbase for programming
DDI 3.0: Modular Structure
Version 1/2:- Single file, hierarchical design
Version 3.0:- Modular design:
- Facilitates reuse- Facilitates versioning and maintenance- Supports life cycle model- Allows flexibility in organizing the DDI Instance- Supports grouping and comparing studies- Supports creation of metadata registries
DDI 3.0: Alignment with other metadata standards
Versions 1/2:- MARC, Dublin Core (bibliographic standards)
Version 3.0:- MARC, DC, but also…- SDMX (Statistical Data and Metadata Exchange)- ISO 11179 (Metadata Registries)- FGDC (Digital Geospatial Metadata)- ISO 19115 (Geographic Information Metadata)
DDI 1/2 or DDI 3.0?
• DDI 3.0 will not supersede DDI 2.1.
• Both versions will– coexist– continue to be maintained– be used according to specific needs.
• All DDI 1/2 markup will not have to be migrated to Version 3.0.
DDI 3.0
Structure and Mechanisms
DDI 3.0 – Modular Structure
Building blocks of DDI 3.0:
» Modules
» Schemes
DDI 3.0 – Modular Structure
Modules:• Document different aspects of a study, or group
of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.)
Schemes:• Include collections of sibling “objects” that are
traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes.
DDI 3.0 – Modular Structure
Modules:• Can live independently (have their own
schemas) or connected to one another within a hierarchical structure.
Schemes:• Can live semi-independently (need a higher-
level wrapper as they do not have their own schemas) or in-line within a Study Unit or Group module.
DDI 3.0 – Modular Structure DDI 3.0 model = a multi-branched hierarchyModule level:
DDI Instance
Resource PackageGroupStudy Unit
SubgroupStudyUnit
ConceptualComponents
DataCollection
Archive
OrganizationsStudyUnit
Subgroup
(Sub)groupStudyUnit
DDI 3.0 – Modular Structure
DDI 3.0 model = a multi-branched hierarchy
Within modules:
DataCollection
Question Scheme ProcessingMethodology
Sampling Time MethodQuestion
ItemQuestion
ItemWeighting Coding
DDI 3.0 – Modular Structure
Relationships are established through:
• In-line inclusion
(Relational order is explicit)
• Referencing Internal
External (Relational order is implicit)
DDI 3.0 – Structural mechanisms
Enable modular design and help actualize its benefits.
• Inheritance
• Referencing
• Identification
DDI 3.0: Inheritance
• Inheritance is based on the hierarchical structure of the model.
• In DDI 3.0 a number of elements are reused at different levels of the hierarchy.
• When the same element is present at multiple levels, lower levels inherit content from the upper levels, and only need to specify differences (=local overrides).
DDI 3.0 InheritanceExample
• Instance: Coverage: Spatial: 50 US states
-Study Unit A – no Spatial Coverage defined
= will be inherited from Instance
-Study Unit B – Coverage: Spatial: 48 coterminous states
= supersedes definition in Instance
DDI 3.0: Referencing
• DDI 3.0 modular structure is dependent upon creating relationships by reference.
• Referencing implies bringing up the content of a DDI object within, or in association with, another object, by specifying its Unique Identifier.
• Identifiers are the key links between DDI objects.
DDI 3.0: ReferencingExample
Data Collection Module: Question Scheme: Question: ID: “Q1”
Text: “How many days in the past week did you watch the national network news on TV?”
Conceptual Components Module:Concept Scheme: Concept: ID: “C1”
Description: “Exposure to national TV news”
Logical Product Module: Variable Scheme: Variable: ID: “V1”Name: V043014 Label: Days past week watch natl news on TV Question Reference: ID: “Q1” Concept Reference: ID : “C1”
DDI 3.0: ReferencingExample
DDI 3.0: Identification
Consistency in building and using identifiers is needed for:
– Proper functioning of reference systems, enabling a smooth exchange and reuse of existing metadata.
– Machine-actionability of DDI instances, allowing them to serve as a basis for running programs and processes.
DDI 3.0: Identification
Element types used in the Identification system:
All elementsIdentifiableVersionableMaintainable
DDI 3.0: IdentificationElement Types
Non-identified elements:
– Require context, which is provided by containing parents.
Example: codes within code schemes– Are not reusable.
Example: variable and category statistics
DDI 3.0: IdentificationElement Types
Identifiables
– Carry their own ID– May be referenced / reused– Cannot be versioned or maintained, except as
part of a complex parent element
(Example: Variable – a change implies a new version of the entire scheme).
DDI 3.0: IdentificationElement Types
Versionables
– Carry their own ID– Carry their own Version: content changes are
important to note
(Example: Concept – may be independently versioned within a scheme).
DDI 3.0: IdentificationElement Types
Maintainables
– Are higher level DDI objects– Are both identifiable and versionable– Can also be published and maintained as
separate entities
(Example: all modules, schemes, comparison maps)
DDI 3.0: Identification Structure
• Maintainable elements:– URN and / or ID + Identifying Agency
+ Versioning Information:
Version Version Date
Version Responsibility
Version Rationale
• Versionable elements:– URN and / or ID + Versioning Information
• Identifiable elements:– URN and / or ID
DDI 3.0: Identification StructureNon-specified Identification information is inherited from the
levels above.
Example 1:
Inheritance is assumed….Maintainable: Variable Scheme:
ID: VarScheme_AIdentifying Agency: ICPSR
Version: 1.0
Identifiable: Variable:
ID: Var_1
[Identifying Agency]
[Version]
DDI 3.0: Identification StructureNon-specified Identification information is inherited from the
levels above.
Example 1:
Inheritance is assumed…Maintainable: Variable Scheme:
ID: VarScheme_A
Identifying Agency: ICPSR
Version: 1.0
Identifiable: Variable:
ID: V1 [Identifying Agency]
[Version]
Example 2:
Inheritance is applied by defaultMaintainable: Logical Product
ID: LogicalProd_Y
Identifying Agency: ICPSR
Version: 1.0
Maintainable: Variable Scheme:
ID: VarScheme_A
Identifying Agency: [ ]
Version: [ ]
DDI 3.0: Identification Structure: IDs
Uniqueness of Identifiers is necessary for both internal and external referencing:
1) All IDs MUST be unique within a maintainable
2) All maintainables MUST have unique IDs across an Agency
DDI 3.0: Identification Structure: Creating unique Identifiers
A DDI Instance may include multiple maintainables at different hierarchical levels:
Instance (maintainable) – unique ID within Identifying Agency Study Unit (maintainable) – unique ID within Identifying Agency
Logical Product (maintainable) – unique ID within Identifying Agency
Variable Scheme (maintainable) – unique ID within Identifying Agency
DDI 3.0: Identification Structure: Creating Unique Identifiers
Instance_A (unique at ICPSR)
StudyUnit_1
Logical Product_1
VariableScheme_1
Variable_1
Instance_B (unique at ICPSR)
StudyUnit_1
Logical Product_1
VariableScheme_1
Variable_1
Post-markup:Variable ID: Instance_AStudyUnit_1LogicalProduct_1VariableScheme_1Variable_1Instance_BStudyUnit_1LogicalProduct_1VariableScheme_1Variable_1
Markup:
DDI 3.0: Identification Structure: URNs
• Have a fixed structure and MUST include object ID, Identifying Agency, and Version.
• For versionable and identifiable elements, the containing maintainable is specified.
• Take precedence when both a URN and the Identification sequence are used for the same object.
• May be constructed post-markup from the Identification sequence.
DDI 3.0: Identification:URN Structure
Examples:• Maintainables:
urn:ddi:3.0:StudyUnit:ddialliance.org:StudyUnit_ID:1.0
• Versionables:
urn:ddi:3.0:ConceptScheme:ddialliance.org:ConceptScheme_ID:1.0: Concept:Concept_ID:2.1
• Identifiables:
urn:ddi:3.0:VariableScheme:ddialliance.org:VariableScheme_ID:1.0: Variable:Variable_ID
Object nameIdentifying
Agency Object IDObjectVersion
DDI 3.0: Referencing
Reference structure:
• URN, and/or:• [Referenced object’s] ID + Identifying Agency + Version
+ [Containing] Module ID
+ [Containing] Scheme ID
DDI 3.0: Reuse of Information
Referencing Mechanisms for REUSE Inheritance
Reuse of Information:
1. Facilitates development of documentation throughout the study life cycle
2. Promotes interoperability and standardization across organizations
3. Saves markup time and effort4. Reduces the risk of human entry error5. Provides a basic level of implicit comparability
DDI 3.0 Modules
Content, Markup Examples
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup Resource PackageResource Package
Study UnitStudy Unit SubgroupSubgroup Study UnitStudy Unit Sub(Group)Sub(Group)ConceptsConcepts
Data Coll.Data Coll.
Logical Pr.Logical Pr.
etc…
Other “specialized” DDI 3.0 modules
• Aggregate Data:– NCube Logical Product– Inline NCube Record Layout– NCube Record Layout– Tabular NCube Record Layout
• Inline Microdata:– Dataset
• User-specific Markup Templates:– DDI Profile
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
DDI 3.0
Modules used to mark up a simple study
DDI 3.0 modules for documenting a single, survey-type study
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
DDI 3.0 modules for documenting a single, survey-type study
• [Reusable]• [XHTML]
• Instance– Study Unit
• Conceptual Component• Data Collection• Logical product• Physical Data Product• Physical Instance• Archive
– Organizations
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
DDI Instance -- wrapper for all modules --
• Identification– URN– Identification Sequence– Name
• Citation … (+ optional DC Elements)• Coverage
– Topical– Spatial– Temporal
• Group (module) – repeatable• Resource Package (module) - repeatable• Study Unit (module) - repeatable• Other Material(s)• Note(s)• Translation Information
Coverage in DDI 3.0
Study: American National Election Study (ANES), 2004• Topical Coverage:
– Subject:• Historical and Contemporary Electoral Processes
– Keyword:• Electoral campaigns • Political attitudes• Political participation
• Spatial Coverage:– Description: United States– Top level: nation– Lowest level: congressional district
• Temporal Coverage: – Date: 2004
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Study Unit -- documents a single “study” --
• Identification, Other Material(s), Note(s)• Citation• Abstract• Universe Reference• Funding Information• Purpose• Coverage • Analysis Unit• Embargo• Conceptual Component (module)• Data Collection (module)• Logical Product (module)• Physical Data Product (module)• Physical Instance (module)• Archive (module)
– Organizations (module)
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Conceptual Component-- lists concepts and universes --
• Identification, Other Material(s), Notes• Coverage• Concept Scheme… or Reference to External Scheme
– Vocabulary – describes vocabulary used– Concept
• Label• Description• Similar Concept
– Difference– Concept Group
• Concept Reference (nestable)
• Universe Scheme … or Reference to External Scheme– Universe
• Human Readable• Machine Readable• Subuniverse
– Subuniverse
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Data Collection• Identification, Other Material(s), Note(s)• Coverage• Methodology
– Time Method– Sampling
• Collection Event– Data Collector– Data Source– Collection Date (s)– Mode of data collection
• Question Scheme – lists actual questions• Instrument – documents question flow, conditions• Processing Event
– Control and cleaning operations– Weighting– Data Appraisal Information– Coding
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Logical Product-- documents intellectual content of data --
• Identification, Other Material(s), Note(s)• Coverage• Category Scheme … or Reference to external category scheme
– Category• Label• Derivation (if applicable)• Definition
• Code Scheme … or Reference to external code scheme– Category Scheme Reference– Hierarchy Type– Level (in the hierarchy)– Code
• Category Reference• Value• Code (nestable)
• Variable Scheme … or Reference to external variable scheme
Logical ProductVariable Scheme: Variable
• Variable … or Reference to an externally documented variable
– Identification• Name
– Label– Definition– Universe Reference– Concept Reference– Question Reference – Embargo Reference– Response Unit– Analysis Unit
– Representation• Imputation• Derivation• Coding Instructions• Value Representation:
» Text» Date / Time» Numeric» Code
Logical ProductVariable Scheme: Variable Group
• Variable Group:– Type– Label – Definition– Universe Reference– Concept Reference– Variable Reference (lists variables in the group)– Variable Group Reference (allows nesting of groups)
• Variable Group Reference (use for externally documented Variable Group)
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Physical Data Product-- Describes Physical Layout of Data --
• Identification, Other Material(s), Note(s)
• Logical Product Reference
• Gross Record Structure:– Records Per Case– Variable Quantity– Logical Record Reference– Physical Record Reference
• Related Logical Records
• Record Layout:– Data Item
– Variable Reference– Physical Location
– Value Location» StartPosition» Width
• Dataset (module)
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Physical Instance-- Documents a specific data file ---
• Identification, Other Material(s), Note(s)• Citation• Coverage• Physical Data Product Reference• Data File Identification
– Location– URI
• Gross File Structure– Creation Software– Case Quantity– Overall Record Count
• Statistics– Logical Product Reference– Variable Statistics
• Variable Reference• Total Responses• Summary Statistics• Category Statistics
» Value» Statistic
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Archive
• Identification, Other Material(s), Note(s)• Archive Specific
– Item• Location• Call Number• URI• Format• Media• Availability Status
– Access• Confidentiality Statement• Access Permission• Restrictions• Citation Requirement • Deposit Requirement• Access Conditions• Disclaimer• Contact
– Funding Information• Life Cycle Information
– Event• Type• Date• Agency • Description
• Organizations (module)
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Organizations
• Identification• Organization
– URL– Individual
• Individual– Organization– Title– Language
• Role– Entity Reference– Organization Reference– Individual Reference– Description– Period
• Relation– Organization Reference– Individual Reference– Description– Period
• Name• Description• Location• Telephone• E-mail• Relation
DDI 3.0 Markup Example
A Survey Variable
Version 2.1 vs. Version 3.0 Example: A survey variable
ASCII codebook:
Version 2.1 vs. Version 3.0 Example: A survey variable in Version 2.1
Data Description:Variable
Version 2.1 vs. Version 3.0 Example: A survey variable in Version 2.1
name=“V043015”
Version 2.1 vs. Version 3.0 Example: A survey variable in Version 3.0
Logical Product: Variable Scheme
Data Collection: Question Scheme
Logical Product:Code Scheme
Logical Product:Category Scheme
Conceptual Component:Concept SchemeUniverse Scheme
Physical Instance:Statistics
Version 2.1 vs. Version 3.0 Example: A survey variable in Version 3.0
Logical ProductVariable Scheme: ID
Variable: ID
Data Collection: Question Scheme: ID
Question: ID
Logical Product:Code Scheme: ID
Code
Logical Product:Category Scheme: ID
Category: ID
Physical Instance:Statistics:
Variable StatisticCategory Statistics
Conceptual ComponentConcept Scheme:
Concept: IDUniverse Scheme:(Sub)Universe: ID
DDI 3.0 Markup: A Survey VariableConcept
Concept: Attention to
Presidential Campaign
on National TV
Conceptual Component:Concept Scheme:
Concept
DDI 3.0 Markup: A Survey VariableConcept
DDI 3.0 Markup: A Survey VariableUniverse
Conceptual Component:Universe Scheme:
(Sub)Universe
(A7:How many days in the PAST WEEK did you watch theNATIONAL network news on TV?
0-7; 8=DK; 9=RF)
DDI 3.0 Markup: A Survey VariableUniverse
DDI 3.0 Markup: A Survey VariableQuestion ID, Question Text
Data Collection:Question Scheme:
Question Item
DDI 3.0 Markup: A Survey VariableQuestion ID, Question Text
Other Response Domains:
DDI 3.0 Markup: A Survey VariableVariable name, label, type of physical representation
Logical Product:Variable Scheme:
Variable
DDI 3.0 Markup: A Survey VariableVariable name, label, type of physical representation
Other types of Representation:
DDI 3.0 Markup: A Survey VariableCategory labels, missing data information
Logical Product: Category Scheme:
Category
DDI 3.0 Markup: A Survey VariableCategory labels, missing data information
missing=“true”
DDI 3.0 Markup: A Survey VariableCategory Values
Logical Product:Code Scheme:
Code
DDI 3.0 Markup: A Survey VariableCategory Values
DDI 3.0 Markup: A Survey VariableStatistics
Physical Instance:Statistics
Variable Statistics:Category Statistic
DDI 3.0 Markup: A Survey VariableStatistics
DDI 3.0 Markup: A Survey Variable Logical Product Module
DDI 3.0 MarkupModules used in a full variable description
ConceptUniverse
Question
ValuesValue LabelsVariable nameVariable label
Statistics
Location:Physical Data
Product
DDI 3.0 Modular ApproachAdvantages
• Modules and schemes can be independently maintained.
• Pieces of information can be reused without being repeated.
DDI 3.0 Modular Approach:Reusing information
Variable Markup in Version 2-- carries redundant information--
Variable Markup in Version 3.0 Modular Approach: Reusing Information
DDI 3.0
Grouping
DDI 3.0: Groups
• Entirely new feature in DDI 3.0.
• Designed to document and compare related studies.
DDI 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Group-- documents “families” of studies --
• Identification, Other Material(s), Note(s)• Citation• Abstract• Universe• Funding Information• Purpose• Coverage • Universe Reference• Conceptual Component (module)• Data Collection (module)• Logical Product (module)• Archive (module)
– Organizations (module)• Study Unit (module)• Group (module)• Comparative (module)
DDI 3.0 Grouping Attributes
• Set of mandatory attributes indicate the nature of the relationships among group members
• Group parameters:– Time– Instrument– Panel (population of respondents)– Geography– Datasets– Language
DDI 3.0 Grouping Attributes Example
DDI 3.0: Types of Groups
• Groups of studies may be:– Formal (“by design”):
• Designed to be compared (longitudinal, time-series, or cross-national studies)
• Documented and compared through use of Inheritance
– Informal (“ad-hoc”): • Decision to group and compare is taken post-
production, or “after the fact”.• Comparability documented in the Comparative
module
Formal Groups: Inheritance
Example 1: Time-series: Same questions repeated over time, same resulting variables.
Group (Studies A-C)Temporal Coverage_G1:1991-1993Data Collection: Question SchemeLogical Product: Variable Scheme
Study ATemporal Coverage: 1991
(Replace Ref:G_1)Physical Data Product
Physical Instance: Statistics
Study BTemporal Coverage: 1992
(Replace Ref:G_1)Physical Data Product
Physical Instance: Statistics
Study CTemporal Coverage: 1993
(Replace Ref:G_1).......
Physical Data ProductPhysical Instance
Study ATemporal Coverage: 1991
(Replace Ref:G_1)……
Physical Data ProductPhysical Instance
Study BTemporal Coverage: 1992
(Replace Ref:G_1)……
Physical Data ProductPhysical Instance
Formal Groups: InheritanceAttributes “Add”, “Replace”, “Delete”.
• In a complex grouping structure inheritance paths may become quite intricate.
• ID attributes ADD, REPLACE and DELETE are introduced to resolve potential inheritance ambiguities:– ADD = [empty] -> flags element as a new addition.– REPLACE = “ReferenceType” -> referenced element
is being replaced at the lower level (“local override”).– DELETE = “ReferenceType” -> referenced element is
being deleted at the lower level.
Formal Groups: Inheritance
Example 2: Time-series: Same core questions repeated over time, different topical modules added to each iteration.
Group (Studies A-C)
Data Collection: Core Questions(Q1-Q50)Logical Product: Core Variables (V1-V50)
Study A
Topical Module “Health Status”
Data Collection:
ADD: Questions (Q51A-Q80A)Logical Product:
ADD: Variables (V51A-V80A)
Study B
Topical Module “Gun Control”
Data Collection:
ADD: Questions (Q51B-Q80B)Logical Product:
ADD: Variables (V51B-V80B)
etc…
Formal Groups: Inheritance
Example 3: Any group by design: some questions are not asked in some iterations.
Group (Studies A-E)
Data Collection: All Questions (Q1-Q100)Logical Product: All Variables (V1-V100)
Study A
Study BData Collection:
DELETE: Question Q55Logical Product:
DELETE: Variable V55
Group (Studies C-E)
Data Collection: DELETE: Questions Q60-Q69
Logical Product:DELETE: Variables V60-V69
Study C Study D Study E
Formal Groups: Inheritance
Example 4 (SOEP, Germany): Longitudinal: Same variables, with different name each year.
(No name)
ADD: Name only
Formal Groups: InheritanceExample 5 (SOEP, Germany): Longitudinal: In 2002
variable “Income” changes currency from DM to Euro: change in question wording.
(No question)
ADD: question only
Formal Groups: Inheritance
Example 5 (SOEP, Germany) continued: These variables also change names every year…
Formal Groups: Inheritance
Example 5 (SOEP, Germany) – the final picture: information is inherited down the hierarchy.
Inheritance in Formal Groups
• Simplification of DDI Instances: common metadata is only entered once.
• More efficient means of documentation: for new additions, only differences need to be specified.
• Relational information embedded in the inheritance structure: comparison becomes machine-actionable.
DDI Version 3.0 Modules-- Structural Overview --
DDI InstanceDDI Instance
Study UnitStudy Unit GroupGroup
Conceptual ComponentConceptual Component
Data CollectionData Collection
Logical ProductLogical Product
Physical Data ProductPhysical Data Product
Physical InstancePhysical Instance
ArchiveArchive
OrganizationsOrganizations
Conceptual Component Conceptual Component
Data CollectionData Collection
Logical ProductLogical Product
ArchiveArchive
Study UnitStudy Unit GroupGroup
ComparativeComparative
Comparative -- documents comparability in ad-hoc groups --
• Identification, Note(s)• Comparison Description (human-readable)• Concept Map
– Source Scheme Reference– Target Scheme Reference– Item Map
• Source Item• Target Item• Map Type • Difference
• Variable Map• Question Map• Category Map• Code Map• Universe Map
DDI 3.0 Using the Comparative Module
Instructions on how to use the Comparative Module and build comparison maps:
“DDI 3.0 User Guide”, pp. 45-49. http://www.ddialliance.org/DDI/ddi3
Producing DDI 3.0 markup
Getting started
DDI 3.0: Tools projects
DDI Toolkit:
• Core library for developing open source tools
• Version 1/2 <-> Version 3.0 converters• DDI 3.0 URN resolution tool• DDI 3.0 validation tool• Version 3.0 stylesheets with display and editing
layers
• Grouping tool• Concept management tool• Registry applications
Producing DDI 3.0 markup-- Getting started --
Software to assist in document creation:
• DeXtris:– XML browser– Converts DDI 1/2 to DDI 3.0
http://www.opendatafoundation.org/tools/dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
DDI 3.0 Tools: Using Dextris
Producing DDI 3.0 markup-- Getting started --
Software to assist in document creation:
• SPSS system to DDI 3.0 converter:(See description and link on DDI 3.0 Proof of Concept
page)
http://www.ddialliance.org/DDI/ddi3/proof.html
Producing DDI 3.0 markup-- Getting started --
XML editors
oXygen:
• Create new DDI instance
• Edit/update DDI instance
• Validate DDI instance
• View schemas
DDI 3.0: Viewing Schemas in oXygen
DDI 3.0: Viewing Schemas in oXygen
Producing DDI 3.0 markup-- Getting started --
Other tools to assist in producing DDI 3.0 markup:
• DDI “core” template
• Version 3.0 documentation:– Module descriptions– Field level documentation– DDI Help Center
http://www.ddialliance.org/ddi3/index.html
Producing DDI 3.0 markup -- Using multiple modules --
Resource:
“Getting Started with DDI 3.0”
http://www.ddialliance.org/DDI/ddi3/getting-started.html
DDI Version 3.0Displaying Markup
Stylesheets:
• Basic:
Web presentation in XHTML
• Enhanced:
Adds graphics for presenting frequencies
Automated calculation of valid percentages
http://www.ddialliance.org/DDI/ddi3/proof.html
DDI Version 3.0Questions? Comments?
• Sanda Ionescu: [email protected]
• DDI Users Listserv:
http://www.ddialliance.org/codebook/listserv.html
The End