SPARC 2013 Data Management Presentation

Post on 25-May-2015

281 views 4 download

Tags:

Transcript of SPARC 2013 Data Management Presentation

Data management.

Nicole Vasilevsky, NCNM, OHSU

Jackie Wirz, OHSU

Melissa Haendel, OHSU

Outline

• Introduction

• Why do we need good data

management?

• Good data management

• Databases and tools

• Sharing your data

Who are we?

• Nicole Vasilevsky, PhD

– Assistant Professor, Helfgott Research Institute, NCNM

– Project Manager, Ontology Development Group, OHSU

• Jackie Wirz, PhD

– Assistant Professor, Bioinformation Specialist, OHSU

library

• Melissa Haendel, PhD

– Assistant Professor, Department Head, Ontology

Development Group, OHSU

What does data mean to you?

Do you have any training in data

management?

Do you know what

metadata is?

a. Philosophy

b. describes data

c. dating site

d. data

What is data?

• Clinical data

• Experimental data

• School related data

• Personal data

• Social data

So much data

Why?

Personal organization

Credit where credit is due

Reproducibility of science and

medicine

Accelerates scientific and clinical discovery

Efficiency

Do you get frustrated with any of the

following in your personal or professional

life?a. Storing data

b. Backing up data

c. Analyzing/manipulating data

d. Finding data produced by other researchers/clinicians

e. Ensuring data are secure

f. Making data accessible to other researchers

g. Controlling access to data

h. Tracking updates to data (ie versioning)

i. Creating metadata (ie describing the data to be more useful at

a later time or by others)

j. Protecting intellectual property rights

k. Ensuring appropriate professional credit/citation is given to

data sets/generated

http://davidmichaelangelosilva.wordpress.com/2012/01/29/organize-your-messy-desktop-with-fences/

Messy Desktop?

Which of the following do you do? a. Save copies of data on a disk, USB drive, tape, or

computer hard drive

b. Save copies of data on a local server

c. Save copies of data on a central campus server

d. Save copies of data on a web based or cloud server

e. Store data in a repository or archives

f. Automatically backup files

g. Manually generate backup

h. Restrict access to files

Credit where credit is due

Data collection & Analysis

Authoring

Storage, Archiving, & Preservation

Publication & Dissemination

The scholarly

communication cycle

Reproducibility of science• Lack of information

makes it difficult to reproduce experiments

• Retraction rates are on the rise

• Difficulty identifying resources in the published literature

Cokol et al. EMBO reports (2008) 9, 2

0%

25%

50%

75%

100%

Sharing can be advantageous

http://www.flickr.com/photos/eltonl/107582334/sizes/l/in/photostream/

Why share your data?

• Data sharing mandates– NIH public

access policy

– NIH/NSF data sharing plan for new applications

• Further science and and medicine

• Build collaborations

• Enable new discoveries with your data

• Can be required at time of publication

Efficiency

http://hbr.org/2012/10/big-data-the-management-revolution

https://upload.wikimedia.org/wikipedia/commons/b/ba/HMS_Surprise_at_sunset_with_airplane.jpg

How?

• File naming and data storage

• Metadata

• Controlled vocabularies and

Ontologies

• Databases and Tools

• Data accessibility

File naming

Informative file names

Will I remember what this file is in a month from now?

Naming conventions

Project_instrument_location_YYYYMMDDhhmmss_extra.ext

Index/grant conditions Leading zero!

s/n, variable Retain order

Directory Structure

Sticking with a directory structure can

be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentation

Presentations

SPARC CTSAconnect

Monarch

VersioningDataManagement_SPARC_050313_final_NV

• Save a copy of every version of a data file

• Follow a file naming convention

• Version control software

– Dropbox

– Google docs

– GIT

– SMART SVN

Dropbox

www.dropbox.com

Google docs

Remember to backup your data!

• Recommended to back up three

copies!

– 1 on your local workstation

– 1 local/remove, such as external hard drive

– 1 remote, such as on a cloud server*

*Depending on the type of data, as cloud servers are not always secure

http://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf

Organizing your IRB application

Created by Heather Schiffke

See:http://libguides.ohsu.edu/data

File renaming applications

• Bulk Rename Utility (Windows)

• Renamer (Mac)

• PSRenamer

Metadata

What is Metadata?

TitleAuthorCall numberPublisherISBN

File name File type

Who created the data

Title

Date created

Using structured phenotype data to identify genetic basis of disease

Metadata standards:Controlled vocabularies and

ontologies

Controlled vocabularies

MeSH

MeSH

acetominophen

What is an Ontology?

1. Hierarchical terms are defined textually and logically

2. Relationships between the terms are defined

3. Expressed in a language that can be reasoned across by computers

4. Data can be reused and can be easily linked together

Commonly Used Ontologies

• Gene Ontology

• Linnaean Taxonomy

• SNOMED

Why are CVs and Ontologies useful?

• Can be used to structure your

metadata

• Are often used to structure

information in databases

Structured data helps with searching

Craigslist search: Chaise

Craigslist matches on strings only

Craigslist search: Fainting couch

Structured data helps with searching

PubMed indexes articles with MeSH Terms

In Summary: Structured Metadata = good

How can I create structured metadata?

http://www.flickr.com/photos/san_drino/1454922072/

and Tools…(to make your life easier)

(s)

http://farm4.static.flickr.com/3560/3332644561_c9d5041d02.jpg

Data Management tools and repositories

• Purpose: Software where you can organize, store and/or share data

• Often contain metadata to assist with data entry and create structured data

Tools for data management

Data Sharing Repositories

http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html

Repositories use Unique IDs

• Document Object Identifier (DOI)

• Example: DOIs for publications

– doi: 10.1371/journal.pbio.1001339

• Unique resource identifier (URI)

• A URI will resolve to a single location on the web

• URIs for people

People Disambiguation

• Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis

OR• John L Campbell, Research Ecologist, Center for Research on

Ecosystem Change, Durham, NC

Tools for personal data management

• Google drive

• Dropbox

• Evernote

• Task Paper

• Diigo- bookmarking websites

• Mendeley, EndNote, Zotero- citation manager

• Sound Gecko

http://blogs.scientificamerican.com/information-culture/2012/12/10/managing-personal-knowledge-data-and-information/

URLs to resources

Go to:

http://

libguides.ohsu.edu/data