Data Publication at CDL for IDCC14

38
#IDCC14 February 2014 Data Publication Etcetera at the CDL Carly Strasser & John Kratz California Digital Library @carlystrasser

description

Talk for IDCC14 workshop on Data Publication, 24 February 2014.

Transcript of Data Publication at CDL for IDCC14

Page 1: Data Publication at CDL for IDCC14

#IDCC14 February 2014

Data Publication Etcetera at the CDL

Carly Strasser & John Kratz California Digital Library

@carlystrasser

Page 2: Data Publication at CDL for IDCC14

Zooming out

mileskm

4060

Zooming out

Page 3: Data Publication at CDL for IDCC14

From Wikimedia Commons

Back in the day…

From ahswhg.wikispaces.com

Page 4: Data Publication at CDL for IDCC14

Back in the day…

Da Vinci

Curie Newton

classicalschool.blogspot.com

Darwin

Page 5: Data Publication at CDL for IDCC14

Research has changed

Better

Page 6: Data Publication at CDL for IDCC14

From wikimedia

Such Internet!

So many tools!

From Flickr by John Jobby

So much data!

Page 7: Data Publication at CDL for IDCC14

Research has changed Worse

Page 8: Data Publication at CDL for IDCC14

Digital data Fr

om F

lickr

by

Flick

mor

From

Flic

kr b

y US

Arm

y En

viron

men

tal C

omm

and

From

Flic

kr b

y D

W08

25

C. Strasser

Cour

tese

y of

WHO

I

From

Flic

kr b

y d

eltaM

ike

Page 9: Data Publication at CDL for IDCC14

Digital data +

Complex workflows

Page 10: Data Publication at CDL for IDCC14

“Reproducibility Crisis”

“Digital Dark Age”

“Erosion of Trust”

Page 11: Data Publication at CDL for IDCC14

Can we fix science? the way we

communicate our

v All of the science

Early & often Transparently & openly

Page 12: Data Publication at CDL for IDCC14

Zooming out

mileskm

4060

Zooming in

Page 13: Data Publication at CDL for IDCC14

feetmeters

2000700

Page 14: Data Publication at CDL for IDCC14
Page 15: Data Publication at CDL for IDCC14

Data Publication @

Page 16: Data Publication at CDL for IDCC14

John Kratz, CLIR Postdoc From Flickr by lindyjb

Page 17: Data Publication at CDL for IDCC14

“Data Publication”

Page 18: Data Publication at CDL for IDCC14

What does “data publication” mean?

Props to Sarah Callaghan

& colleagues!

Page 19: Data Publication at CDL for IDCC14

What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*

Data are

Page 20: Data Publication at CDL for IDCC14

What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*

Data are

*peer reviewed? certified?

Page 21: Data Publication at CDL for IDCC14

Available | Citable | Trustworthy

Publish means to “make public”. You should not have to email the author. The data doesn’t have to be open access.

“Email me!” CC-0 on web

Page 22: Data Publication at CDL for IDCC14

Simple case…

Data citations should be in reference list. Five-element citation: author, year, title, publisher, identifier

Available | Citable | Trustworthy

Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in adaptive evolution. Theoretical Population Biology. Published in Dryad. doi:10.5061/dryad.j8n0p7vc

Page 23: Data Publication at CDL for IDCC14

More complicated…

Deep data citation: what if you want to cite a subset? Dynamic data: how to create a reliable citation when a dataset is changing?

Available | Citable | Trustworthy

Page 24: Data Publication at CDL for IDCC14

Technical VS. Scientific

Sometimes consider impact and/or novelty

Guidelines provided

Available | Citable | Trustworthy

From Flickr by Percival Lowell

Page 25: Data Publication at CDL for IDCC14

1.  Data as supplemental material

Data published alongside a traditional journal article. Available + citable. Review varies. Potential issues with long-term availability.

What does a data publication look like?

From Flickr by subsetsum

Page 26: Data Publication at CDL for IDCC14

2.  Data paper: Data + descriptive “data paper”

Most require data be in a trusted repository. All have a component of peer review. Examples: •  Standalone journals: Nature Scientific Data, Geoscience Data

Journal, Ecological Archives •  Journals that publish data papers: GigaScience, F1000 Research,

Internet Archaeology

What does a data publication look like?

From Flickr by subsetsum

Page 27: Data Publication at CDL for IDCC14

3.  Standalone data

Data published without a related journal article. Rich metadata (structured or unstructured) Examples: •  Open Context •  NASA PDS Peer Review Data •  figshare (but no validation)

What does a data publication look like?

From Flickr by subsetsum

Page 28: Data Publication at CDL for IDCC14

“Publish”

“Paper”

“Peer review” “Sharing”

“Available”

“Article” “Publication”

Page 29: Data Publication at CDL for IDCC14

From Flickr by Sandia Labs

C. Strasser

C. Strasser

World Bank Photo Collection From Flickr

What do researchers think of data publication?

Page 30: Data Publication at CDL for IDCC14

•  Publishing •  Sharing •  Citation •  Peer review •  Trustworthiness

Share with researchers! tinyurl.com/datapubsurvey

Page 31: Data Publication at CDL for IDCC14

Academic

Govt Other

79% US | 21% Not

PI

Postdoc

Other

Grad student

Bio

Archaeology

Envi/earth Sci

Math, physics

Info Sci Other

Survey Demographics

Type of researcher Discipline

N=274

Page 32: Data Publication at CDL for IDCC14

In the meantime…

Page 33: Data Publication at CDL for IDCC14

UCSF

Page 34: Data Publication at CDL for IDCC14
Page 35: Data Publication at CDL for IDCC14

For all UCs

• Use institutional credentials to log in • Enter metadata & deposit data • Get identifier • Optional PDF download • Landing page is the publication

Page 36: Data Publication at CDL for IDCC14

data publishing data sharing

Focus on solving simple bits first: Easy sharing � Citable datasets

Page 37: Data Publication at CDL for IDCC14
Page 38: Data Publication at CDL for IDCC14

Website Email Tweet Slides

Survey

carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser

Big thanks to John Kratz, CLIR Postdoc

tinyurl.com/datapubsurvey