Smith RDAP11 NSF Data Management Plan Case Studies

19
POLICIES FOR DATA SHARING, ACCESS AND REUSE MacKenzie Smith MIT, ARL, CC

description

MacKenzie Smith, MIT; NSF Data Management Plan Case Studies; RDAP11 Summit The 2nd Research Data Access and Preservation (RDAP) Summit An ASIS&T Summit March 31-April 1, 2011 Denver, CO In cooperation with the Coalition for Networked Information http://asist.org/Conferences/RDAP11/index.html

Transcript of Smith RDAP11 NSF Data Management Plan Case Studies

Page 1: Smith RDAP11 NSF Data Management Plan Case Studies

POLICIES FOR DATA SHARING, ACCESS AND REUSEMacKenzie Smith

MIT, ARL, CC

Page 2: Smith RDAP11 NSF Data Management Plan Case Studies

NSF DMP GUIDELINES WANT

Policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements

Policies and provisions for re-use, re-distribution, and the production of derivatives

RDAP Summit ©2011, MacKenzie Smith

Page 3: Smith RDAP11 NSF Data Management Plan Case Studies

WHAT IS DRIVING THIS?

Scientific progress requires international, interdisciplinary interoperability, including frictionless data integration at large-scales (e.g. the Web)

Data interoperability includes1. technical issues (data integration, protocols)2. social issues (scientific norms, credit

mechanisms or lack thereof)3. legal issues (incompatible laws and policies

for data and databases)

RDAP Summit ©2011, MacKenzie Smith

Page 4: Smith RDAP11 NSF Data Management Plan Case Studies

DATA USE/REUSE/REDISTRIBUTION

Data use: Using research data for the current research purpose/activity to infer new knowledge about the research subject.

Data re-use: Using research data for a research purpose/activity other than that for which it was intended.

Howard, T., Darlington, M., Ball, A., Culley, S., McMahon, C., 2010. Understanding and Characterizing Engineering Research Data for its Better Management. Project Report. Bath, UK: University of Bath, ERIM Project Document. erim2rep100420mjd10

RDAP Summit ©2011, MacKenzie Smith

Page 5: Smith RDAP11 NSF Data Management Plan Case Studies

DATA USE/REUSE/REDISTRIBUTION

Data purposing: Making research data available and fit for the current research activity.

Data re-purposing: Making research data available and fit for a future known research activity

Data re-use: Managing research data such that it will be available for a future unknown research activity.

RDAP Summit ©2011, MacKenzie Smith

Page 6: Smith RDAP11 NSF Data Management Plan Case Studies

SUPPORTING DATA REUSE

Future users unknown, potentially interdisciplinary

You don’t know them and they don’t know you(or what you/your discipline expects)

Data documentation and policies need to be clear, not require contact or ad hoc negotiations (what if you’ve moved or you’re dead?)

RDAP Summit ©2011, MacKenzie Smith

Page 7: Smith RDAP11 NSF Data Management Plan Case Studies

INTERNATIONAL COLLABORATIONS

If I participate in a collaborative international research project, do I need to be concerned with data management policies established by institutions outside the United States?

Yes. There may be cases where data management plans are affected by formal data protocols established by large international research consortia or set forth in formal science and technology agreements signed by the United States Government and foreign counterparts. Be sure to discuss this issue with your sponsored projects office (or equivalent) and your international research partner when first planning your collaboration.

RDAP Summit ©2011, MacKenzie Smith

Page 8: Smith RDAP11 NSF Data Management Plan Case Studies

DATA LICENSING IN US

US Gov data in the Public Domain explicit rights statement rare

Factual data not copyrightable in the US creativity matters, ‘sweat of the brow’ does not not much legal precedent in science generally not known by users

EULAs in place for many data archives all different, varying practicality, hard to enforce

RDAP Summit ©2011, MacKenzie Smith

Page 9: Smith RDAP11 NSF Data Management Plan Case Studies

CREATIVE COMMONS

Tools for data sharing towards Web-scale interoperability (e.g. Linked Open Data)

CC0 or CC-By Public Domain mark Best practice for URI-based attribution

(e.g. to avoid attribution stacking)

RDAP Summit ©2011, MacKenzie Smith

Page 10: Smith RDAP11 NSF Data Management Plan Case Studies

CREATIVE COMMONS

CC0 waives copyright and associated rights (e.g. data rights) where applicable

Important for interoperability with legal jurisdictions that have sui generis data rights (e.g. Europe)

CC-By-SA bad for interoperability

CC-By with attribution via URI (Aus and NZ examples) Attribution stacking

RDAP Summit ©2011, MacKenzie Smith

Page 11: Smith RDAP11 NSF Data Management Plan Case Studies

ISSUES

Licenses Attribution Persistent IDs Provenance Metadata Registries

RDAP Summit ©2011, MacKenzie Smith

Page 12: Smith RDAP11 NSF Data Management Plan Case Studies

WHAT DO RESEARCHERS WANT?SUPPLY SIDE

CREDIT CONTROLCONFIDENCE (in appropriate use of their data)

and sometimes…

IP

but always… FUNDING

RDAP Summit ©2011, MacKenzie Smith

Page 13: Smith RDAP11 NSF Data Management Plan Case Studies

WHAT DO RESEARCHERS WANT?DEMAND SIDE

Easy reuse of their own data

Easy discovery of and access to outside data

Easier integration/interoperability of their own, other data (i.e. “re-purposing”)

RDAP Summit ©2011, MacKenzie Smith

Page 14: Smith RDAP11 NSF Data Management Plan Case Studies

HOW CAN RESEARCHERS ACHIEVE THAT?

• Standard copyright licenses or waivers• Standards terms & conditions (EULA)

… via their institutional repository!

Researchers want good advice, have zero interest in complex legal issues

IRs can establish practices that help researchers achieve their goals with low effort

RDAP Summit ©2011, MacKenzie Smith

Page 15: Smith RDAP11 NSF Data Management Plan Case Studies

DMP BOILERPLATE

Sharing.

Project data will be made publicly accessible/downloadable from the university’s data archive website (via a standard Web UI) as … Once located on the archive website, image sets will be downloadable via standard Web protocols (i.e. http). Included in the associated metadata for each image set will be rights information such as copyright and licensing terms for use and reuse of the data. Each image set will be assigned a unique, persistent URI (web identifier, resolvable as a URL) for use in citations. The university’s data archive uses Handles for persistent URIs.

RDAP Summit ©2011, MacKenzie Smith

Page 16: Smith RDAP11 NSF Data Management Plan Case Studies

DMP BOILERPLATE

Licensing. Images, even scientific research images generated by scanners, may be subject to copyright in the U.S., so images produced by the project will be collected and shared using a Creative Commons license, specifically CC-BY (i.e. with attribution to the copyright owner, who is the Principal Investigator for this project, with the approval of the university’s IP counsel). By using the CC-BY license, we are authorizing all interested researchers to use the image data produced by this project in whatever manner they choose, as long as they cite the Principal Investigator as the source of the data.

RDAP Summit ©2011, MacKenzie Smith

Page 17: Smith RDAP11 NSF Data Management Plan Case Studies

DMP BOILERPLATE

Licensing, cont.Metadata associated with the image sets will be released under a CC0 license (public domain dedication) since it is normally not copyrightable and we want it to be reusable in new contexts (e.g. Google indexes). With these licensing terms, future researchers will be able to combine the image data and associated metadata produced by this project with data produced from their own or other projects, to create super- or sub-sets of images needed for their own research (i.e. “derivative” datasets).

 

RDAP Summit ©2011, MacKenzie Smith

Page 18: Smith RDAP11 NSF Data Management Plan Case Studies

DMP BOILERPLATE

In the university’s central data archive, researchers will be able to determine the rights assigned to the project’s data via the metadata displayed in the UI for the dataset (i.e. in the rights fields of the relevant catalog record for the dataset). The archive’s search interface supports filtering searches by rights category (e.g. Public Domain, CC-BY, embargoed) so that researchers can search for only data that they may reuse in their own research.

RDAP Summit ©2011, MacKenzie Smith

Page 19: Smith RDAP11 NSF Data Management Plan Case Studies

CONCLUSION

IRs serving as data archives can Standardize institutional data policies Encourage OA Lower barriers to researchers to comply with NSF

intent

DMPs encourage use of IR over time, reassure NSF of consistent practice

RDAP Summit ©2011, MacKenzie Smith