SPARC 2013 Data Management Presentation

58
Data management. Nicole Vasilevsky, NCNM, OHSU Jackie Wirz, OHSU Melissa Haendel, OHSU

Transcript of SPARC 2013 Data Management Presentation

Page 1: SPARC 2013 Data Management Presentation

Data management.

Nicole Vasilevsky, NCNM, OHSU

Jackie Wirz, OHSU

Melissa Haendel, OHSU

Page 2: SPARC 2013 Data Management Presentation
Page 3: SPARC 2013 Data Management Presentation

Outline

• Introduction

• Why do we need good data

management?

• Good data management

• Databases and tools

• Sharing your data

Page 4: SPARC 2013 Data Management Presentation

Who are we?

• Nicole Vasilevsky, PhD

– Assistant Professor, Helfgott Research Institute, NCNM

– Project Manager, Ontology Development Group, OHSU

• Jackie Wirz, PhD

– Assistant Professor, Bioinformation Specialist, OHSU

library

• Melissa Haendel, PhD

– Assistant Professor, Department Head, Ontology

Development Group, OHSU

Page 5: SPARC 2013 Data Management Presentation

What does data mean to you?

Page 6: SPARC 2013 Data Management Presentation

Do you have any training in data

management?

Page 7: SPARC 2013 Data Management Presentation

Do you know what

metadata is?

a. Philosophy

b. describes data

c. dating site

d. data

Page 8: SPARC 2013 Data Management Presentation

What is data?

• Clinical data

• Experimental data

• School related data

• Personal data

• Social data

Page 9: SPARC 2013 Data Management Presentation

So much data

Page 10: SPARC 2013 Data Management Presentation

Why?

Personal organization

Credit where credit is due

Reproducibility of science and

medicine

Accelerates scientific and clinical discovery

Efficiency

Page 11: SPARC 2013 Data Management Presentation

Do you get frustrated with any of the

following in your personal or professional

life?a. Storing data

b. Backing up data

c. Analyzing/manipulating data

d. Finding data produced by other researchers/clinicians

e. Ensuring data are secure

f. Making data accessible to other researchers

g. Controlling access to data

h. Tracking updates to data (ie versioning)

i. Creating metadata (ie describing the data to be more useful at

a later time or by others)

j. Protecting intellectual property rights

k. Ensuring appropriate professional credit/citation is given to

data sets/generated

Page 12: SPARC 2013 Data Management Presentation

http://davidmichaelangelosilva.wordpress.com/2012/01/29/organize-your-messy-desktop-with-fences/

Messy Desktop?

Page 13: SPARC 2013 Data Management Presentation

Which of the following do you do? a. Save copies of data on a disk, USB drive, tape, or

computer hard drive

b. Save copies of data on a local server

c. Save copies of data on a central campus server

d. Save copies of data on a web based or cloud server

e. Store data in a repository or archives

f. Automatically backup files

g. Manually generate backup

h. Restrict access to files

Page 14: SPARC 2013 Data Management Presentation

Credit where credit is due

Data collection & Analysis

Authoring

Storage, Archiving, & Preservation

Publication & Dissemination

The scholarly

communication cycle

Page 15: SPARC 2013 Data Management Presentation

Reproducibility of science• Lack of information

makes it difficult to reproduce experiments

• Retraction rates are on the rise

• Difficulty identifying resources in the published literature

Cokol et al. EMBO reports (2008) 9, 2

0%

25%

50%

75%

100%

Page 16: SPARC 2013 Data Management Presentation

Sharing can be advantageous

http://www.flickr.com/photos/eltonl/107582334/sizes/l/in/photostream/

Page 17: SPARC 2013 Data Management Presentation

Why share your data?

• Data sharing mandates– NIH public

access policy

– NIH/NSF data sharing plan for new applications

• Further science and and medicine

• Build collaborations

• Enable new discoveries with your data

• Can be required at time of publication

Page 18: SPARC 2013 Data Management Presentation

Efficiency

http://hbr.org/2012/10/big-data-the-management-revolution

https://upload.wikimedia.org/wikipedia/commons/b/ba/HMS_Surprise_at_sunset_with_airplane.jpg

Page 19: SPARC 2013 Data Management Presentation

How?

• File naming and data storage

• Metadata

• Controlled vocabularies and

Ontologies

• Databases and Tools

• Data accessibility

Page 20: SPARC 2013 Data Management Presentation

File naming

Page 21: SPARC 2013 Data Management Presentation

Informative file names

Will I remember what this file is in a month from now?

Page 22: SPARC 2013 Data Management Presentation

Naming conventions

Project_instrument_location_YYYYMMDDhhmmss_extra.ext

Index/grant conditions Leading zero!

s/n, variable Retain order

Page 23: SPARC 2013 Data Management Presentation

Directory Structure

Sticking with a directory structure can

be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentation

Presentations

SPARC CTSAconnect

Monarch

Page 24: SPARC 2013 Data Management Presentation

VersioningDataManagement_SPARC_050313_final_NV

• Save a copy of every version of a data file

• Follow a file naming convention

• Version control software

– Dropbox

– Google docs

– GIT

– SMART SVN

Page 25: SPARC 2013 Data Management Presentation

Dropbox

www.dropbox.com

Page 26: SPARC 2013 Data Management Presentation

Google docs

Page 27: SPARC 2013 Data Management Presentation

Remember to backup your data!

• Recommended to back up three

copies!

– 1 on your local workstation

– 1 local/remove, such as external hard drive

– 1 remote, such as on a cloud server*

*Depending on the type of data, as cloud servers are not always secure

http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf

Page 28: SPARC 2013 Data Management Presentation

Organizing your IRB application

Created by Heather Schiffke

See:http://libguides.ohsu.edu/data

Page 29: SPARC 2013 Data Management Presentation

File renaming applications

• Bulk Rename Utility (Windows)

• Renamer (Mac)

• PSRenamer

Page 30: SPARC 2013 Data Management Presentation

Metadata

Page 31: SPARC 2013 Data Management Presentation

What is Metadata?

TitleAuthorCall numberPublisherISBN

Page 32: SPARC 2013 Data Management Presentation
Page 33: SPARC 2013 Data Management Presentation
Page 34: SPARC 2013 Data Management Presentation

File name File type

Who created the data

Title

Date created

Page 35: SPARC 2013 Data Management Presentation
Page 36: SPARC 2013 Data Management Presentation
Page 37: SPARC 2013 Data Management Presentation

Using structured phenotype data to identify genetic basis of disease

Page 38: SPARC 2013 Data Management Presentation

Metadata standards:Controlled vocabularies and

ontologies

Page 39: SPARC 2013 Data Management Presentation

Controlled vocabularies

Page 40: SPARC 2013 Data Management Presentation

MeSH

Page 41: SPARC 2013 Data Management Presentation

MeSH

acetominophen

Page 42: SPARC 2013 Data Management Presentation

What is an Ontology?

1. Hierarchical terms are defined textually and logically

2. Relationships between the terms are defined

3. Expressed in a language that can be reasoned across by computers

4. Data can be reused and can be easily linked together

Page 43: SPARC 2013 Data Management Presentation

Commonly Used Ontologies

• Gene Ontology

• Linnaean Taxonomy

• SNOMED

Page 44: SPARC 2013 Data Management Presentation

Why are CVs and Ontologies useful?

• Can be used to structure your

metadata

• Are often used to structure

information in databases

Page 45: SPARC 2013 Data Management Presentation

Structured data helps with searching

Craigslist search: Chaise

Craigslist matches on strings only

Craigslist search: Fainting couch

Page 46: SPARC 2013 Data Management Presentation

Structured data helps with searching

PubMed indexes articles with MeSH Terms

Page 47: SPARC 2013 Data Management Presentation

In Summary: Structured Metadata = good

How can I create structured metadata?

http://www.flickr.com/photos/san_drino/1454922072/

Page 48: SPARC 2013 Data Management Presentation

and Tools…(to make your life easier)

(s)

http://farm4.static.flickr.com/3560/3332644561_c9d5041d02.jpg

Page 49: SPARC 2013 Data Management Presentation

Data Management tools and repositories

• Purpose: Software where you can organize, store and/or share data

• Often contain metadata to assist with data entry and create structured data

Page 50: SPARC 2013 Data Management Presentation

Tools for data management

Page 51: SPARC 2013 Data Management Presentation

Data Sharing Repositories

http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html

Page 52: SPARC 2013 Data Management Presentation

Repositories use Unique IDs

• Document Object Identifier (DOI)

• Example: DOIs for publications

– doi: 10.1371/journal.pbio.1001339

• Unique resource identifier (URI)

• A URI will resolve to a single location on the web

• URIs for people

Page 53: SPARC 2013 Data Management Presentation

People Disambiguation

Page 54: SPARC 2013 Data Management Presentation

• Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis

OR• John L Campbell, Research Ecologist, Center for Research on

Ecosystem Change, Durham, NC

Page 55: SPARC 2013 Data Management Presentation
Page 56: SPARC 2013 Data Management Presentation

Tools for personal data management

• Google drive

• Dropbox

• Evernote

• Task Paper

• Diigo- bookmarking websites

• Mendeley, EndNote, Zotero- citation manager

• Sound Gecko

http://blogs.scientificamerican.com/information-culture/2012/12/10/managing-personal-knowledge-data-and-information/

Page 57: SPARC 2013 Data Management Presentation

URLs to resources

Go to:

http://

libguides.ohsu.edu/data