Data Publication at CDL for IDCC14

Post on 26-Jan-2015

103 views 1 download

description

Talk for IDCC14 workshop on Data Publication, 24 February 2014.

Transcript of Data Publication at CDL for IDCC14

#IDCC14 February 2014

Data Publication Etcetera at the CDL

Carly Strasser & John Kratz California Digital Library

@carlystrasser

Zooming out

mileskm

4060

Zooming out

From Wikimedia Commons

Back in the day…

From ahswhg.wikispaces.com

Back in the day…

Da Vinci

Curie Newton

classicalschool.blogspot.com

Darwin

Research has changed

Better

From wikimedia

Such Internet!

So many tools!

From Flickr by John Jobby

So much data!

Research has changed Worse

Digital data Fr

om F

lickr

by

Flick

mor

From

Flic

kr b

y US

Arm

y En

viron

men

tal C

omm

and

From

Flic

kr b

y D

W08

25

C. Strasser

Cour

tese

y of

WHO

I

From

Flic

kr b

y d

eltaM

ike

Digital data +

Complex workflows

“Reproducibility Crisis”

“Digital Dark Age”

“Erosion of Trust”

Can we fix science? the way we

communicate our

v All of the science

Early & often Transparently & openly

Zooming out

mileskm

4060

Zooming in

feetmeters

2000700

Data Publication @

John Kratz, CLIR Postdoc From Flickr by lindyjb

“Data Publication”

What does “data publication” mean?

Props to Sarah Callaghan

& colleagues!

What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*

Data are

What does “data publication” mean? 1. Available 2. Citable 3. Trustworthy*

Data are

*peer reviewed? certified?

Available | Citable | Trustworthy

Publish means to “make public”. You should not have to email the author. The data doesn’t have to be open access.

“Email me!” CC-0 on web

Simple case…

Data citations should be in reference list. Five-element citation: author, year, title, publisher, identifier

Available | Citable | Trustworthy

Boettiger C, Dushoff J, Weitz JS (2009). Data from: Fluctuation domains in adaptive evolution. Theoretical Population Biology. Published in Dryad. doi:10.5061/dryad.j8n0p7vc

More complicated…

Deep data citation: what if you want to cite a subset? Dynamic data: how to create a reliable citation when a dataset is changing?

Available | Citable | Trustworthy

Technical VS. Scientific

Sometimes consider impact and/or novelty

Guidelines provided

Available | Citable | Trustworthy

From Flickr by Percival Lowell

1.  Data as supplemental material

Data published alongside a traditional journal article. Available + citable. Review varies. Potential issues with long-term availability.

What does a data publication look like?

From Flickr by subsetsum

2.  Data paper: Data + descriptive “data paper”

Most require data be in a trusted repository. All have a component of peer review. Examples: •  Standalone journals: Nature Scientific Data, Geoscience Data

Journal, Ecological Archives •  Journals that publish data papers: GigaScience, F1000 Research,

Internet Archaeology

What does a data publication look like?

From Flickr by subsetsum

3.  Standalone data

Data published without a related journal article. Rich metadata (structured or unstructured) Examples: •  Open Context •  NASA PDS Peer Review Data •  figshare (but no validation)

What does a data publication look like?

From Flickr by subsetsum

“Publish”

“Paper”

“Peer review” “Sharing”

“Available”

“Article” “Publication”

From Flickr by Sandia Labs

C. Strasser

C. Strasser

World Bank Photo Collection From Flickr

What do researchers think of data publication?

•  Publishing •  Sharing •  Citation •  Peer review •  Trustworthiness

Share with researchers! tinyurl.com/datapubsurvey

Academic

Govt Other

79% US | 21% Not

PI

Postdoc

Other

Grad student

Bio

Archaeology

Envi/earth Sci

Math, physics

Info Sci Other

Survey Demographics

Type of researcher Discipline

N=274

In the meantime…

UCSF

For all UCs

• Use institutional credentials to log in • Enter metadata & deposit data • Get identifier • Optional PDF download • Landing page is the publication

data publishing data sharing

Focus on solving simple bits first: Easy sharing � Citable datasets

Website Email Tweet Slides

Survey

carlystrasser.net carlystrasser@gmail.com @carlystrasser slideshare.net/carlystrasser

Big thanks to John Kratz, CLIR Postdoc

tinyurl.com/datapubsurvey