INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March...
-
Upload
lewis-gilmore -
Category
Documents
-
view
215 -
download
0
Transcript of INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March...
![Page 1: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/1.jpg)
INFO 7470/ECON 7400/ILRLE 7400Citing Literature, Citing Data
John M. Abowd and Lars VilhuberMarch 11, 2013
![Page 2: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/2.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
2
CITING LITERATURE
3/11/2013
![Page 3: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/3.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
3
Citing Literature
• Why? To prevent plagiarism, to establish provenance of ideas
• How? Why do we cite as we do – publishing cycles, uniqueness of sources
• Plagiarism: appropriating other people’s ideas• Examples (Bruno Frey)• Citing literature today: does it still work?– Issues of versioning of articles– Revisions/retractions/corrections
3/11/2013
![Page 4: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/4.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
4
Why Do We Cite Literature?
• To give credit to the original authors of ideas– To not give credit is plagiarism
• To allow readers to find the information cited– Trace the evolution of ideas– Document cited results
3/11/2013
![Page 5: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/5.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
5
Plagiarism
• More easily detected nowadays– http://plagiarism.repec.org/offenders.html – http://ideas.repec.org/a/che/chepap/v20y2008i1p20-25.html
• Software– http://plagiarism.bloomfieldmedia.com/z-wordpress/software/wcopyfind
/– Turnitin– AEA uses http://www.aeaweb.org/crosscheck.php
3/11/2013
Source: http://www.elsevier.com/authors/author-rights-and-responsibilities#responsibilities via RePEc
![Page 6: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/6.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
6
Prominent Recent Examples of Plagiarism
• Bruno Frey– AEA PP, others (see FreyPlag_Wiki but also
responses by Frey)• German ministers– Defense
Karl-Theodor Maria Nikolaus Johann Jacob Philipp Franz Joseph Sylvester Freiherr von und
zu Guttenberg [German source]– Education… Annette Schavan [German source]
• Russian presidents? [2006]
3/11/2013
![Page 7: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/7.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
7
How Do We Cite?
• Multiple typographical standards• Generally enough unique keys to correctly
identify the source• Current conventions driven to a large extent
by the publishing model in effect through the end of the 20th century (see also Margo Anderson’s Session 1 on data publishing)
3/11/2013
![Page 8: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/8.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
8
ExamplesBased on and using images from http://bcs.bedfordstmartins.com/resdoc5e/RES5e_ch09_s1-0002.html (2013-03-08)
3/11/2013
![Page 9: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/9.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
9
ExamplesBased on and using images from http://bcs.bedfordstmartins.com/resdoc5e/RES5e_ch09_s1-0002.html (2013-03-08)
3/11/2013
Declining uniqueness:
Online documents:
![Page 10: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/10.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
10
Permanent Links
• The URL (Uniform Resource Locator or Web address) may be temporary, may not function in the near or far future
• Links designated as “permanent”, “persistent” or “stable” are designed specifically to remain active and useable over time.
• Permanent links– Digital Object Identifier (DOIs) (more formally: Handle System)
• actionable, interoperable, persistent link
– Other Types of Permanent Links• JSTOR (old)• EBSCO
Adapted from http://library.concordia.ca/services/users/faculty/permanentlinks.php
3/11/2013
![Page 11: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/11.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
11
DOI
3/11/2013
![Page 12: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/12.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
12
DOI
3/11/2013
![Page 13: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/13.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
13
DOI in References
3/11/2013
![Page 14: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/14.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
14
Up to Here …
• … nothing new, or mostly• Starting in 5th grade, we’ve been thoroughly
trained in citing our “sources” • Or have we?
3/11/2013
![Page 15: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/15.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
15
CITING DATA
3/11/2013
![Page 16: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/16.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
16
Neal (1999)
• http://www.jstor.org/stable/10.1086/209919
3/11/2013
![Page 17: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/17.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
17
References
3/11/2013
![Page 18: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/18.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
18
References
3/11/2013
![Page 19: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/19.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
19
No Data …
3/11/2013
![Page 20: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/20.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
203/11/2013
![Page 21: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/21.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
213/11/2013
![Page 22: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/22.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
22
The Problem
• I want to replicate Neal’s analysis• Process:– Download NLSY data (latest!)– Read article, replicate his described analysis in
software of my choice– Get results, compare
• What happens if the results are not the same– Qualitatively– Quantitatively
3/11/2013
![Page 23: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/23.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
23
Attempts to Falsify
“5. Every genuine test of a theory is an attempt to falsify it, or to refute it [...]6. Confirming evidence should not count except when it is the result of a genuine test of the theory; and this means that it can be presented as a serious but unsuccessful attempt to falsify the theory. (I now speak in such cases of ‘corroborating evidence.’)”
Karl Popper, Science : Conjectures and Refutations, pg. 47
3/11/2013
![Page 24: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/24.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
24
Replication Study
• Different result driven by– Differences in data– Differences in software– Differences in implementation– Errors by the original author…
• Start by keeping as much as possible the same setup– Same data– Same software– Same implementation (programs)
3/11/2013
![Page 25: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/25.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
25
Data for Replication
• What does “same data” imply?– Ability to find the data– Assurance that the data are, in fact, the same
• Data curation and citation are critical to the replication exercise
• Increasing impetus by funding agencies– NSF– NIH
3/11/2013
![Page 26: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/26.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
26
Not Futile
• Neal JOLE article is much cited (60 citations on RePEc, undercount)
• Only instance of a substantive correction of a JOLE article (as of 2013-03-08, search term: erratum)
• Notable because the author publishing the erratum was– Referring to a (seminal) article from 5 years earlier– Was the editor-in-chief at the time
3/11/2013
![Page 27: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/27.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
27
Example: JOLE
• “In the April 1999 issue of this Journal, I published an article entitled “The Complexity of Job Mobility among Young Men” (Journal of Labor Economics 17, no. 2 [1999]: 237–61). Recently, I began a dialogue with another researcher who was attempting to replicate the empirical results in that article. Through this dialogue, I learned that, for some workers, I erred in constructing my original counts of the number of employer changes within specific careers.1 I have corrected this error and have found that, given correct variable constructions, several empirical results differ quantitatively, although not qualitatively, from the results reported in the original article.”
3/11/2013
![Page 28: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/28.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
28
Other Items to Note
• Not available if not a subscriber …• The original author’s publication count increased by 1• The discrepancy’s reporter (Ronni Pavan) was not an author on
the erratum (Pavan did publish in the same journal in 2011)• Neither the original data nor the corrected data (and the
associated programs) are available from the journal (they are probably available from the author).– The original data are public-use NLSY data, referenced as “1979-92”
• The online version of the original article contains a link to the erratum (and is found when searching for “Erratum”)
3/11/2013
![Page 29: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/29.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
29
Why Do We Cite Data This Way?
• Used to be sufficient– Data were the same as a book (see Margo’s Session
1) – If not, then they were rarely modified (punch cards,
tapes)– Example “NLSY 1979-1992” was a well-defined
CDROM• No longer sufficient– Where is the NLSY CDROM? – Which version does your library have?
3/11/2013
![Page 30: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/30.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
30
Publications by the Census Bureau
• Decennial Census: SF1, SF2, SF3 … once every ten years
• Economic Census: Limited number of tables every 5 years
• LEHD: 4860 tables every three months
3/11/2013
![Page 31: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/31.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
31
Improvements
• https://usa.ipums.org/usa/cite.shtml
3/11/2013
![Page 32: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/32.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
32
Improvements
3/11/2013
![Page 33: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/33.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
33
These Are the Easy Cases
• NLSY, IPUMS-USA, ICPSR data– Public-use datasets or– Data distributor is also data custodian –
guarantees availability of the data• Many other public-use datasets– QCEW – no (can be defined by latest date on file,
but not officially defined)– QWI – version.txt, but hidden– BDS – “yearly” releases (two listed, in fact three)
3/11/2013
![Page 34: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/34.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
34
Data Availability Not a New Issue
• “In its first issue, the editor of Econometrica (1933), Ragnar Frisch, noted the importance of publishing data such that readers could fully explore empirical results. Publication of data, however, was discontinued early in the journal’s history. [...] The journal arrived full-circle in late 2004 when Econometrica adopted one of the more stringent policies on availability of data and programs.”
http://www.econometricsociety.org/submissions.asp#4 as cited in Anderson et al (2005)
3/11/2013
![Page 35: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/35.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
35
Citing Restricted-use Data
• Abowd, Kramarz, Margolis (1999): “The data used in this paper are confidential but the authors’ access is not exclusive.”
• But– No current statistical agency has in place a way to
uniquely cite data– Black box of restricted-access data enclaves– Worries about “leakage” of confidential
information
3/11/2013
![Page 36: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/36.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
36
Declining Role of Public-use Data
3/11/2013
(Chetty, 2012)
![Page 37: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/37.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
37
Increasing Use of Administrative Data
3/11/2013
(Chetty, 2012)
![Page 38: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/38.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
38
Not Just in Social Sciences
• Nature, 2012 “Many of the emerging ‘big data’ applications come from private sources that are inaccessible to other researchers. The data source may be hidden, compounding problems of verification, as well as concerns about the generality of the results.”
Huberman, Nature 482, 308 (16 February 2012), doi:10.1038/482308d
3/11/2013
![Page 39: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/39.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
39
Verification Is Important
• Falsifying data– Andrew Wakefield (autism and vaccines)– Yoshitaka Fujii (fabricated data in 172 out of 249 papers)
• “Believe it or not: how much can we rely on published data on potential drug targets?” doi:10.1038/nrd3439-c1 – Drug maker cannot replicate more than 20-25% of findings
• “Why Most Published Research Findings Are False” Ioannidis JPA (2005) doi:10.1371/journal.pmed.0020124
3/11/2013
![Page 40: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/40.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
40
But …
• Even studies that worry about replication… do not provide their own data in a replicable way “The questionnaire can be obtained from the authors.” (doi:10.1038/nrd3439-c1)
3/11/2013
![Page 41: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/41.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
41
Other Approaches: Replication for a Fee
• “The Reproducibility Initiative takes advantage of Science Exchange’s existing network of more than 1,000 core facilities and commercial research organizations. Researchers submit their studies (…) [which] will attempt to replicate the studies for a fee.
• Submitting researchers will have to pay for the replication studies (…) one-tenth that of the original study (…) 5 percent transaction fee to Science Exchange.
• Participants will remain anonymous unless they choose to publish the replication results in a PLoS ONE Special Collection (source)
3/11/2013
![Page 42: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/42.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
42
CORE ISSUES
3/11/2013
![Page 43: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/43.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
43
Core Issues
a. Insufficient curation (starting with archiving)b. No consistent way to learn about the data
(metadata)c. No way to reference data (unique identifiers)
3/11/2013
![Page 44: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/44.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
44
Core Requirements for Data Access
• Royal Society (2012)– Accessible (a researcher can easily find it);– Intelligible (to various audiences);– Assessable (are researchers able make judgments
about or assess the quality of the data);– Usable (at minimum, by other scientists).
3/11/2013
![Page 45: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/45.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
45
Identifying Data
“DOI names are assigned to any entity for use on digital networks. They are used to provide current information, including where they (or information about them) can be found on the Internet. Information about a digital object may change over time, including where to find it, but its DOI name will not change.”
http://datacite.org/whatisdoi, accessed on Sept 26, 2012.
3/11/2013
![Page 46: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/46.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
46
Data Curation
• First step: make (some of) the data accessible• Repositories/data custodians can address the
issue for some types of data• Generally provide a way to identify data (DOI)
3/11/2013
![Page 47: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/47.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
47
Repositories
• DataOne (bio sciences)• Dryad (ecological data)• DataVerse (data extracts and programs
accompanying papers)• University Libraries (Dspace)• UK Data Archive• ICPSR (researcher-initiated surveys)• FRED (St. Louis Fed, time-series)
3/11/2013
![Page 48: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/48.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
48
Journals and Data Curation
• PLOS ONE– Policy– Limitations: data limited to 10MB…
• AEA– Policy– Example
• Econometrica– Policy
3/11/2013
![Page 49: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/49.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
49
PLoS ONE
• http://www.plosone.org/static/policies#sharing • “PLOS is committed to ensuring the availability of
data and materials that underpin any articles published in PLOS journals.”
• “PLOS reserves the right to post corrections on articles, to contact authors' institutions and funders, and in extreme cases to withdraw publication, if restrictions on access to data or materials come to light after publication of a PLOS journal article.”
3/11/2013
![Page 50: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/50.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
50
PLoS ONE (cont.)
• “(…)appropriate accession numbers or digital object identifiers (DOIs) published with the paper”
• Also guidelines for software (in particular when it is critical to the paper)
3/11/2013
![Page 51: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/51.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
51
AEA Policy
• http://www.aeaweb.org/aer/data.php • “Authors of accepted papers that contain
empirical work, simulations, or experimental work must provide to the Review, prior to publication, the data, programs, and other details of the computations sufficient to permit replication.”
3/11/2013
![Page 52: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/52.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
52
AEA Policy (cont.)
• http://www.aeaweb.org/aer/data.php • For econometric and simulation papers, the minimum
requirement should – include the data set(s) and programs used to run the final
models, – plus a description of how previous intermediate data sets
and programs were employed to create the final data set(s). – Authors are invited to submit these intermediate data files
and programs as an option– […] as well as instructing a user on how replication can be
conducted.
3/11/2013
![Page 53: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/53.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
53
AEA Example: Abowd and Vilhuber (2012)
• Article: http://www.aeaweb.org/articles.php?doi=10.1257/aer.102.3.589
• Appendix– Description at http://
www.aeaweb.org/aer/data/may2012/2012_2790_app.pdf (note: no DOI!)
– Tried to be careful about referencing data, but no DOIs available on any of the data• Even our own data (National QWI, 38MB compressed)
– Only generic programs– Final dataset was too large – not accepted.
3/11/2013
![Page 54: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/54.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
54
Econometrica Policy• http://
www.econometricsociety.org/submissionprocedures.asp#replication • “Econometrica has the policy that all empirical, experimental and
simulation results must be replicable. • Therefore, authors of accepted papers must submit data sets,
programs, and information on empirical analysis, experiments and simulations that are needed for replication and some limited sensitivity analysis”
• Limited-access/proprietary datasets: “detailed data description and the programs used to generate the estimation data sets must be provided, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results”
3/11/2013
![Page 55: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/55.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
55
Limitation of Current Repositories
• Do not (yet) provide full provenance– For lack of citation tools– For lack of guidance
• Limitations when using “big data”– Repository not the solution (suggested size: <10MB,
although Econometrica has some in the 400MB range)– Unique references to data publication, onus on
publisher?• Do not work (well) for restricted-access data
3/11/2013
![Page 56: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/56.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
56
Metadata Access
• Information about the data• Can be– Variable names– Formats– Values– Distribution of values– Description– Provenance
3/11/2013
![Page 57: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/57.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
57
Metadata on Public-use Data
• IPUMS: Structured/browsable metadata• Most other sites:– PDF or ASCII files– Generally not linked to actual data
• Restricted-access data in Census RDC– Generic information outside– PDF once access granted
3/11/2013
![Page 58: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/58.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
58
IPUMS Metadata
3/11/2013
![Page 59: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/59.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
59
IPUMS Metadata (Details)
3/11/2013
![Page 60: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/60.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
60
ICPSR Metadata on ATUS
3/11/2013
![Page 61: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/61.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
61
BLS Metadata on ATUS
3/11/2013
![Page 62: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/62.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
62
Current Metadata on Confidential Data
• Mostly by inference• Census Bureau (CES): – links to public-use tabulations, documents (some by yours
truly), codebooks (Snapshot S2004)– PDFs of detailed data in RDC– Codebooks for a few data sets at ICPSR
• 1960 (ICPSR 21980); 1970 (21981); 1980 (21982); 1990 (21983); 2000 (21820)
• NCHS: – what is in questionnaire (PDF) but not in public-use
codebook (PDF) might be accessible
3/11/2013
![Page 63: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/63.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
63
Approaches and Solutions
• NCRN-Cornell node: Comprehensive Extensible Data Documentation and Access Repository (CED²AR) – Based on existing metadata standards (DDI) with
possible extensions– Provide structured mechanism to synchronize
confidential and public-use metadata– Assign DOI where needed
3/11/2013
![Page 64: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/64.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
64
NCRN-Cornell
3/11/2013
![Page 65: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/65.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
65
Pruning Confidential Metadata
3/11/2013
![Page 66: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/66.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
66
End Result (mid-2013)
3/11/2013
![Page 67: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/67.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
67
End Result (mid-2013)
3/11/2013
![Page 68: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/68.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
68
EASE OF ACCESS/REPLICATION
3/11/2013
![Page 69: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/69.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
69
FRED: Federal Reserve Economic Data
• http://research.stlouisfed.org/fred2/ – Excellent job in providing easy access to a large
number of data series– Also provide archival versions (data series ‘as-of’)– Online graphs
3/11/2013
![Page 70: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/70.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
70
Issues with FRED
• No link back to original data provider’s unique ID (in large part because there is nothing to link back to)
• Archival versions identified by “publication” date (may be imprecise at times)
• Incomplete …
3/11/2013
![Page 71: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/71.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
71
Accessing FRED
• Demo using Stata’s “freduse”• Program used in this demo:– stata-recession-fred.do
3/11/2013
![Page 72: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/72.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
72
Stata
3/11/2013
![Page 73: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/73.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
73
Stata Results
3/11/2013
![Page 74: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/74.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
74
Accessing FRED
• Demo using R’s “quantmod”• Program used in this demo:– r-recession-fred.R
3/11/2013
![Page 75: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/75.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
75
Using quantmod
3/11/2013
![Page 76: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/76.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
76
Results with R
3/11/2013
![Page 77: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/77.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
77
FRED Issues
• Positives: it’s available! • Trains people to use keys to look up online
references• Issues: – Not able to link to archival versions (always the
latest version), – But does store local copies (-> repository, onus
back on ad hoc data archiving)– How to cite the data?
3/11/2013
![Page 78: INFO 7470/ECON 7400/ILRLE 7400 Citing Literature, Citing Data John M. Abowd and Lars Vilhuber March 11, 2013.](https://reader035.fdocuments.in/reader035/viewer/2022070417/56649e4a5503460f94b3d7da/html5/thumbnails/78.jpg)
© John M. Abowd and Lars Vilhuber 2013, all rights reserved
78
Tools and Replicability
• Tools help to do replicability analysis– Ability to reference URL of data (handle, DOI, etc.)– Ability to access data through URL • Even if/when run in restricted-access environments
3/11/2013