February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI...

9
February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel

Transcript of February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI...

Page 1: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1

SOEP and DOIRequirements and Challenges

Jan Goebel

Page 2: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 2

Content

1. SOEP Overview

2. Problems

3. Conclusions

Page 3: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 3

SOEP Overview

• Socio-Economic Panel Study (SOEP) is a representative longitudinal study of private households in Germany

• Annual survey since 1984 of about 10,000 households (around 20,000 persons)

• Some of the many topics include household composition, occupational biographies, employment, earnings, health and indicators of subjective well-being

Page 4: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 4

SOEP is an ongoing Survey

• Common with all panel surveys• Each year we distribute an enhanced version with new

and changed data• Question are changing, new topics, ...

→ We do a lot but not just replication!• Even changes for „archived data“, like a change in the

coding scheme of ISCO

Page 5: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 5

• The SOEP currently (User DVD) consists of:– More than 320 data files– About 40.000 Variables

• Granulation to choose for citation? – Complete SOEP distribution of one year?– „Connected“ SOEP parts, e.g. Individual

questionnaires, HH-questionnaires, generated datasets

– Each data file– Each Variable (for each year or only once, longitudinal

concept?)

SOEP is not one dataset but a complex data structure

Page 6: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 6

• European user: 100% Version (English, German, different formats for SAS/SPSS/Stata/ASCII)

• Non-EU user: 95% Version (of cases)

• International comparative research: Part of the CNEF (Cross National Equivalent File)

• SOEP Geocodes (supplementary CD): Regional Planning Regions, Community types, etc.

• Country codes, Community codes, zip codes, microm:only by remote execution or at the Research Data Center (RDC SOEP)

• SOEP Pretests

• SOEP Related Studies

„The SOEP” is available in different versions

Page 7: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 7

SOEP can change during the period, because of updates

• Updates of weighting schemes or even bug fixes (also possible for older waves)

• Sometimes more than one update between distributions (cumulative updates?)

• How can a user know what version she is using?• Message-Digest Algorithm (MD5)• Secure Hash Algorithm (SHA-2)• Universal Numeric Fingerprint (UNF)

• Does rounding matter?• German/English Labels, different formats (SPSS, STATA, …)• Only update of a label bug?

Page 8: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 8

Conclusions

• Nesting of DOI should be possible:

Print DOI SOEP example SOEP DOI

Edited book Survey SOEP DVD 10.1000/soep.26

Article in book Data file SOEP dataset $PGEN 10.1000/soep.26.hgen

Table in article in book

Variable SOEP dataset $pgen variable ihinc$$

10.1000/soep.26.hgen.ihinc

• It should be possible for a user to identify the data, including version

The metadata of a DOI should include a SHA for each data file and format, which must also be persistent, like SHA-2

• Commitment about the persistence of the data provider

• It is not enough to identify the data source to make an scientific empirical analysis reproducible, you normally need the syntax also

Page 9: February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 1 SOEP and DOI Requirements and Challenges Jan Goebel.

February 1, 2011 Workshop: Persistent Identifiers for the Social Sciences 9

Thank you for your attention!