Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

36
Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008

Transcript of Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Page 1: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Evolution of Data Documentation

Providing Social Science Data ServicesJim Jacobs, 2008

Page 2: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Evolution of Data Documentation

Page 3: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

In the beginning…

…was the codebook.

Page 4: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

…early digital codebooks…

Codebook listed to tape

Page 5: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

…early digital codebooks…

OSIRIS Dictionaries

Page 6: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

…early digital codebooks…

SPSS (and SAS) code

Page 7: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

…early digital codebooks…

PDFs

Page 8: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

What do early digital codebooks have in common?

1. Tied to a particular physical layout of a data file

VARIABLE 6 OPINION OF COUNTRY OVERALL DECK 1/35

Page 9: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

What do early digital codebooks have in common?

1. Tied to a particular physical layout of a data file2. Each uses its own special syntax.

VARIABLE 6 OPINION OF COUNTRY OVERALL DECK 1/35

D HUFAMINC 2 39

CITY $ 77-94

Page 10: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

What do early digital codebooks have in common?

3. Some included information intended for human consumption.

Q1. THINKING ABOUT THE COUNTRY OVERALL, DO YOU THINK THINGS IN THE U.S. ARE GENERALLY GOING IN THE RIGHT DIRECTION, OR DO YOU FEEL THINGS ARE SERIOUSLY OFF ON THE WRONG TRACK?

VALUE LABEL VALUE N OF CASES ----------- ----- ---------- RIGHT DIRECTION 1 223 WRONG TRACK 2 237 NO OPINION 8 48 NOT APPLICABLE* 9 500 ------- TOTAL 1008

*NOT FORM A

Page 11: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

PDF

Osiris dictionary

SPSS cards

CBLT

Book

Osiris

SPSS

Problems of early digital codebooks(part 1)

Page 12: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

PDF

Osiris dictionary

SPSS cards

CBLT

Book

Osiris

SPSS

(user has to re-create information inorder to re-use information)

Machine “readable” but not

Machine “actionable”

Page 13: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

XML helps solve the problem

• XML is not tied to any single piece of software.

• XML is designed to be easily parsed by computer.

• XML is (to some extent) self-documenting or self-descriptive.

• XML can include information intended both for humans and machines.

• XML is non-proprietary, open, flexible.

Page 14: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

XML helps solve the problem

• Many tools exist to read/convert XML. (Java, javascript, perl, PHP, etc.)

• XSL and XSLT were created explicitly for converting XML. With them XML can be converted to HTML, PDF, other XML, etc.

• XML is highly structured so it can be predictably converted.

Page 15: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

DDI 1 and 2

1.0 DOCUMENT DESCRIPTION 2.0 STUDY DESCRIPTION 3.0 DATA FILES DESCRIPTION 4.0 VARIABLE DESCRIPTION 5.0 OTHER STUDY-RELATED MATERIALS

Built to emulate early code BOOKS and digital Codebooks…

Page 16: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Problems of early digital codebooks(part 2)

• Static, inflexible.

• Meant to document the end point of research -- Views research as linear.

• Hard to re-use the information for new research.

Page 17: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Problems of DDI 1 and 2

• Emulated the Code Book

• Not flexible enough

• We could do so much more…

Page 18: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Three Stages of Technological Change

Type of Change Characterized by

Modernization Doing what we’ve always done, but using technology to do more and to increase efficiency

Innovation Doing things we’ve wanted to do, but could not do without the technology

Transformation Doing things that we didn’t imagine until technology made it possible.

Page 19: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Three Stages of Technological Change

Type of Change Characterized by

Early digital codebooks

Doing what we’ve always done, but using technology to do more and to increase efficiency

DDI 1 and 2 Doing things we’ve wanted to do, but could not do without the technology

DDI 3 Doing things that we didn’t imagine until technology made it possible.

Page 20: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Three Stages of Technological Change

Type of Change Characterized by

Early digital codebooks

Making codebooks machine readable

DDI 1 and 2 Making codebooks re-usable, even machine actionable…

DDI 3 Re-thinking “documentation”

Re-thinking the research process

Page 21: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

DDI 1 and 2

• Document Description • Study Description • Data Files Description • Variable Description • Other Study-Related

Materials

Page 22: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

DDI 1 and 2

• Document Description • Study Description • Data Files Description • Variable Description • Other Study-Related

Materials

• Study Concept• Data Collection• Data Processing• Data Distribution• Data Archiving• Data Discovery• Data Analysis• Repurposing

DDI 3

Page 23: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

Life Cycle of Research,Data, Documentation

Page 24: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

- Research question - Funding - Concepts - Background research

Page 25: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection- Instrument - Data collection process - Questionnaire

Page 26: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection

• Logical Product- Intellectual content of data - Relationship to questions and concepts- Relationship to processing (recodes, weighting, derivations, imputations)

Page 27: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection

• Logical Product

• Physical Data Product- Describes the structure (microdata, tabular,aggregate, Ncube…) (e.g., STF 1A)

Page 28: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection

• Logical Product

• Physical Data Product

• Physical instance

- Each describes a single data file (e.g., STF1 A by state...each state is an instance)

Page 29: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection

• Logical Product

• Physical Data Product

• Physical instance

• “Instance”-An instance module “wraps” the other modules. Like a table of contents to a group of studies and files and modules it brings everything together.

Page 30: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection

• Logical Product

• Physical Data Product

• Physical instance

• “Instance”

• Archive

- Each archive can add its own local information with an archive module.

Page 31: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit

• Data Collection

• Logical Product

• Physical Data Product

• Physical instance

• “Instance”

• Archive

Page 32: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach(but wait… there’s more!)

• Group module

- Describe concepts, questions, and variables that occur in several studies.- Describe a series (e.g., CBP, CPS, Eurobarometer) - Describe a collection of studies (not a series) and identify the common comparable concepts, questions and variables.

Page 33: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach(but wait… there’s more!)

• Group module

• Comparative module-The Comparative module contains information for comparing concepts, questions, and variables between or among Study Units that have been housed in a Group.

Page 34: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach(but wait… there’s more!)

• Group module

• Comparative module

• Conceptual components module

- Describe concepts and their relationships as concept groups. - Use known vocabularies and can indicate the level of similarity between two concepts by describing the extent of difference.

Page 35: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.

A modular approach

• Study Unit• Data Collection• Logical Product• Physical Data

Product• Physical instance• “Instance”• Archive

• Group module• Comparative

module• Conceptual

components module

Page 36: Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.