Research Data Management: Part 2, Practices

56
Managing Research Data Part 2 WHY – WHAT– WHO – WHEN & HOW Planning Working Finalizing Sharing Data This work is licensed under a Creative Commons Attribution 4.0 International License .

Transcript of Research Data Management: Part 2, Practices

Managing Research Data Part 2

WHY – WHAT– WHO – WHEN & HOW

Planning Working

Finalizing Sharing Data

This work is licensed under a Creative Commons Attribution 4.0 International License.

WHY manage data -

WHAT research data are-

WHO manages research data -

WHEN & HOW data management is done -

Planning Working

Finalizing Sharing Data

Managing Research Data

This work is licensed under a Creative Commons Attribution 4.0 International License.

This two-part course is a collaboration between CU Libraries/Information Services and the Office of Research Compliance & Training. The purpose of this course is to familiarize you with the various aspects of research data management (RDM) by taking

3

Managing Research Data

56/ Managing Research Data

This course will guide you through these areas, offering in-depth details on each of them. Please refer to the top navigation to keep track of which area you are currently exploring.

•  Why RDM is both recommended and required

•  What research data are

•  Who is responsible for RDM

•  When RDM activities occur

•  How you can carry out RDM activities

Part 1:

Part 2:

Learning objectives: At the end of this training you will be able to: •  Identify at which research stages data management activities occur •  Understand practical details of research data management such as:

–  File naming –  File formats –  Spreadsheet structure –  Data preservation

4

Managing Research Data

56/ Managing Research Data

Links to many of the references and policies referred to in this course can be found on the final slides. Have Fun!

5

Managing Research Data

56/ Managing Research Data

When does Research Data Management happen? How is it done?

WHY –WHAT – WHO – WHEN & HOW

6 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Planning Working

Finalizing Sharing Data

7 56/ Managing Research Data

Planning

Planning Working

Finalizing Sharing Data

8 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

When planning to manage data or writing a data management plan consider:

Planning

9 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

•  What data will be shared? •  Who will have access to the data? •  Where will the data to be shared be located? •  When will the data be shared? •  How will researchers locate and access the data?

CONSIDER: •  File format •  File sizes •  Changing rates of data production •  Anticipated size of project data •  Storage & Back-up •  Privacy / security requirements •  Data description •  Retention period •  Sharing requirements

Planning

10 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Plan for the entire data life-cycle.

•  Non-proprietary •  Open, documented standard •  Standard representation (e.g., ASCII, Unicode) •  Common, or commonly used by the research community (e.g.

FITS, CIF) •  Unencrypted •  Uncompressed

Planning

Some  commonly  recognized  formats  to  avoid  for  storage  include:  Word  [.doc(x)],  SPSS  [.sav],  Excel  [.xls(x)],  STATA  [.dta],  Access  [.mdb,  .accdb],  JPEG  [.jpg],  .gif,  QuickIme  [.mov],  SAS  [.sas]  

Some  commonly  recognized  formats  meeIng  these  criteria:  ASCII  [e.g.,  .csv,  .txt],  PDF  [.pdf],  FLAC,  TIFF,  JPEG2000  [.jp2],  MPEG-­‐4  [.mp4],  XML  [.xml,  .odf,  .rdf],  R  [.r]  

11 http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

http://www.digitalpreservation.gov/formats/index.shtml?PHPSESSID=c26c5e5101396d5f5ebacedb13cae6e3 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Storage file formats should be:

X ✓  

Not  sure  about  the  extension?    Check  hYps://www.naIonalarchives.gov.uk/PRONOM/default.htm  

Storage / Back-ups Planning

Lifespan of Storage Media: http://www.crashplan.com/medialifespan/

12 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

No storage medium lasts forever. Consider the following media life-spans:

When choosing storage and back-up options you should: •  Reduce the risk of damage or loss •  Use multiple locations (here, near, far) •  Create a back-up schedule •  Use reliable back-up media •  Test your back-up system (i.e., test file recovery, checksums)

Planning

13 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Storage / Back-ups

Remember to: •  Back up data frequently •  Make 3 copies –  Original (here) –  External/local (near) –  External/remote – different geographic area (far)

•  Verify recovery is possible –  Confirm that file has not been corrupted, e.g., checksum

validation –  Make sure you can reload the file, i.e., test file restore after

initial set-up –  Check file recovery periodically & systematically thereafter

Planning

14 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Storage / Back-ups

Consider physical, network, computer system and file security for: •  Intellectual Property –Trade secrets, commercial

information, confidential materials, restricted data •  Personally identifying information (PII) •  Personal health information (PHI) •  High-security data

Planning

15 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Privacy & Security

CU Information Security Charter: “Users are persons who use Information Resources.  Users are responsible for ensuring that such Resources are used properly in compliance with the Columbia University Acceptable Usage of Information Resources Policyhttp://policylibrary.columbia.edu/acceptable-usage-information-resources-policy, information is not made available to unauthorized persons, and appropriate security controls are in place.”

Planning

16 http://policylibrary.columbia.edu/information-security-charter 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Privacy & Security

(Some) Best practices for handling sensitive data: •  Restrict physical access to computers, offices and storage media

•  Encrypt any device (mobile, laptop, desktop, tablet, removable media [e.g., USB flash drives, CDs, hard drives]) containing sensitive data

•  Store lab notebooks, research records, in locked cabinets

•  Keep confidential and sensitive data on computers not connected to the Internet

•  Don't send confidential data via e-mail or FTP (use encryption, if you must)

•  Use strong passwords on files and computers

•  Sanitize all systems before reusing, disposing, or donating

Planning

17 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Privacy & Security

•  Lab notebooks •  Data descriptions / code book •  File naming

–  Consistency: Pick a system, write it down, & stick with it –  Identify necessary elements –  Create brief, understandable names –  Date: YYYY-MM-DD –  Version: v01, v02,…FINAL

In general, try to stay away from spaces in filenames as well as the following characters: .\ / : * ? “ < > | [ ] & $

•  File / directory structure •  Sometimes there is a community standard for data formatting &

description for sharing/integration (aka metadata schema) – Find yours!

Planning

18 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Data Description / Documentation

Plan to keep your data according to: •  CU Data Retention Policy: at least 3 years •  Funder requirements: It varies – check them! •  Regulations •  Contract terms, for industry sponsored research •  The importance of the data, regardless of external

requirements

Planning

19 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Retention Period

Do you plan to share your data? Prepare to follow the requirements of your: •  Funder •  Journal •  Discipline •  Data repository

Planning

20 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Planning to share

Working

Planning Working

Finalizing Sharing Data

21 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Review the data management plan: •  Are you following it? •  Did it survive first contact with the research? If not, –  Does it need to be revised? –  Take the opportunity to change it as necessary,

documenting the changes

Working

22 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Revisit your: •  File naming conventions –  Are they written down? –  Does everyone on the project know & follow them?

•  File structure / organization / tagging –  Is it easy to understand / logical? –  Is everyone on the project familiar with the organizational

practices so they can store and find files efficiently? •  Back-up processes –  Are they working? –  Are they being followed?

Working

23 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Are you using someone else’s data as part of your research? You should probably cite it…

Consider a citation management software to keep track of it:

Working

hYp://library.columbia.edu/research/citaIon-­‐management.html  

24 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Using a spreadsheet for your data? Structure your data so that it’s easily sortable & usable by other software/machines. Be consistent with your: •  Labels •  Types •  Formats •  Layout

(Alternatively, consider using a database for easier data management)

Working

25 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Spreadsheet labels: Adopt a consistent style that indicates a cell contains a label rather than a value

Working

Date   Instrument   SoundLevel_R   SoundLevel_L   Amp_Se7ng  

2013-­‐12-­‐22   BK-­‐732A   84.6   86.0   3  

2013-­‐12-­‐23   BK-­‐732A   115.2   116.4   9  

2013-­‐12-­‐24   BK-­‐732A   128.7   130.0   11  

Date:  12/22/2013   Instrument   BK732A  

Sound  lev   Right   <85   Amplifier   3  (27%)  

Lei   86.0  

Date:   Dec  23,  2013   Instrument   Amp_Sekng  

SoundLevel-­‐R   115   BK_732-­‐A   9  

SoundLevel_L   116.4  

J L

26 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Spreadsheet types: Don’t mix text & number types in the same column

Working

Date   Instrument   SoundLevel_R   SoundLevel_L   Amp_Se7ng  

2013-­‐12-­‐22   BK-­‐732A   84.6   86.0   3  

2013-­‐12-­‐23   BK-­‐732A   115.2   116.4   9  

2013-­‐12-­‐24   BK-­‐732A   128.7   130.0   11  

Date:  12/22/2013   Instrument   BK732A  

Sound  lev   Right   <85   Amplifier   3  (27%)  

Lei   86.0  

Date:   Dec  23,  2013   Instrument   Amp_Sekng  

SoundLevel-­‐R   115   BK_732-­‐A   9  

SoundLevel_L   116.4  

J L

27 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Spreadsheet formats: Do all of your dates or other variable values look the same?

Working

Date   Instrument   SoundLevel_R   SoundLevel_L   Amp_Se7ng  

2013-­‐12-­‐22   BK-­‐732A   84.6   86.0   3  

2013-­‐12-­‐23   BK-­‐732A   115.2   116.4   9  

2013-­‐12-­‐24   BK-­‐732A   128.7   130.0   11  J L

Date:  12/22/2013   Instrument   BK732A  

Sound  lev   Right   <85   Amplifier   3  (27%)  

Lei   86.0  

Date:   Dec  23,  2013   Instrument   Amp_Sekng  

SoundLevel-­‐R   115   BK_732-­‐A   9  

SoundLevel_L   116.4  

28 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

Spreadsheet layout: Tables of similar data should be structured similarly

Working

Date   Instrument   SoundLevel_R   SoundLevel_L   Amp_Se7ng  

2013-­‐12-­‐22   BK-­‐732A   84.6   86.0   3  

2013-­‐12-­‐23   BK-­‐732A   115.2   116.4   9  

2013-­‐12-­‐24   BK-­‐732A   128.7   130.0   11  J L

Date:  12/22/2013   Instrument   BK732A  

Sound  lev   Right   <85   Amplifier   3  (27%)  

Lei   86.0  

Date:   Dec  23,  2013   Instrument   Amp_Sekng  

SoundLevel-­‐R   115   BK_732-­‐A   9  

SoundLevel_L   116.4  

29 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

When it’s not a spreadsheet (it may be a database): Be consistent! •  Consistent process •  Consistent organization •  Consistent descriptions AND •  Consistently documenting everything that’s done

Working

30 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data Collection

(Some)Best practices for assuring quality data entry: •  System-limited value entry, i.e., hard code controlled lists of

values •  Check 5-10% of data records manually •  Check out-of-range values •  Check empty values / blank fields •  Consider using a data entry program or double entry keying

Working

31 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Quality Assurance / Control

•  Keep an untouched, “raw” copy of the data file – Make it Read Only

•  Save cleaned or analyzed data as new files (with good file names, as previously described) –  Take extensive notes of the actions taken or scripts used to

“clean” the data •  Use a scripted language (e.g., R, SAS, SPSS) to consistently

process data and create a record of data processing & analysis •  Document scripts / code with comments! •  Write a ReadMe.txt file as you go, rather than trying to

remember what you did later

Working

32 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Analysis

Create a file to document your project. Include: •  What data are being collected & why •  Names of project files (data & analysis) •  Project file naming and file organization conventions •  Data definitions (aka Code Book or Data Dictionary) – more

next slide •  Project standards •  Calibration, precision, accuracy & units of instruments or

measurements

Working

33 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Analysis - documenting

Code books / data dictionaries should include: •  Data codes or coding keys •  Missing value codes •  Field name / Column header / Data label

–  Definition e.g., Amp_setting | Dial setting of guitar amplifier –  Values – Possible values e.g., from 0 to 11, whole numbers –  Units – may be included in either Definitions or Values –  Type e.g., string, float, char, date [YYYYMMDD]

Working

34 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Analysis - documenting

Finalizing

Planning Working

Finalizing Sharing Data

35 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

•  Check requirements •  Are data useful/usable •  Select data for preservation •  Choose publication path •  Consider publishing negative data – Others may find it useful •  Repositories

Finalizing

36 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Preparing data for publication, sharing, storage, preservation:

Have you fulfilled the expectations of your: •  Funder •  Journal •  Discipline •  Repository •  Institution

Finalizing

? 37 56/

Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Check requirements

Are data consistently: •  Formatted •  Named •  Organized •  Described / Documented

Are they in a file format that may be easily accessed and reused?

Finalizing

38 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Usability of data

You might consider long-term preservation if the answer to any of these questions is “Yes” •  Do the data support published research? •  Are the data difficult or expensive to regenerate? •  Are the data required for your research but from another source

(i.e. not your original research data)? –  If so, is the future availability of that data from the original source

uncertain? •  Do you plan to share your data, or are you required to per funder

agreement? •  Are the data historically significant? •  Are the data vulnerable to loss, corruption, endangerment, etc.?

Finalizing

https://lib.stanford.edu/data-services/preserve 39 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data preservation: Selection

Sharing

Planning Working

Finalizing Sharing Data

40 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

Honestly, the party most interested in the data you are producing today is probably:

Your future self But there are others to consider, too, so…

Sharing

41 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Who will you share with?

“By ‘final research data,’ we mean recorded factual material commonly accepted in the scientific community as necessary to validate research findings.”

NIH FAQ Data Sharing (3/03)

Guidelines will “be determined by the community of interest” and “may include…data, publications, samples, physical collections, software and models.”

Data Management and Sharing FAQ (11/10)

Sharing

42 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW What to share

“Timely release and sharing’ is defined as no later than the acceptance for publication of the main findings from the final data set.”

Data Sharing Policy, Section II.8.2.3.1, NIH Grants Policy Statement (10/12)

“The expectation is that all data will be made available after a reasonable length of time….[which] will be determined by the community of interest…”

Data Management and Sharing FAQ (11/10)

Sharing

43 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW When to share

There are many paths to publishing data: •  Data paper / Data journal •  Supplementary material •  Data repositories Wherever you publish, make sure people can find it, use it, and give you credit for it! (this usually requires a permanent identifier e.g., DOI) Do you have negative data? Others may find it useful – consider making it available!

Sharing

44 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Publishing data

Institutional repository Columbia’s repository accepts materials from faculty, students, and staff. It offers: •  Long-term preservation strategy •  Multiple back-ups (including off site) •  Quality content descriptions for increased discoverability •  Monthly usage reports •  Permanent URL & doi

Sharing

45 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Repositories

Disciplinary repository e.g., •  GenBank •  RCSB Protein Data Bank •  ICPSR

Sharing

46 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Repositories

Public access repository e.g., •  Figshare.com •  DataDryad.org •  ResearchCompendia •  Academic Commons

Sharing

47 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Repositories

•  How would you like your data cited •  Licensing •  Privacy/confidentiality/anonymization – Revisit IRB

commitments •  What to share •  When to share

Sharing

48 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW

When you share your data, consider:

•  Publish your data, and make sure to cite it in your journal publication

•  When publishing your data, provide a preferred citation •  Did you use someone else’s data? –  Check the license for restrictions –  Provide the following minimum in your work’s citations:

•  Title •  Author/Creator name •  Publisher •  Publication year •  Unique identifier e.g., DOI

Sharing

49 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Data citation

In the USA, facts – which means most datasets – are outside the scope of copyright protection. Some researchers have adopted the practice of data licensing because of this. There are many different license types, with varied provisions for reuse and attribution. When thinking about licenses keep in mind: •  Funder requirements •  Institutional requirements •  Scientific and scholarly ethos of extending knowledge

Sharing

50 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Licensing

Revisit: •  IRB commitments •  Privacy Board requirements (HIPAA) •  Institutional requirements •  Ethical considerations Consider: •  Have direct identifiers been removed? •  Have indirect identifiers that could reveal identity when

combined been managed? •  Does relational or spatial data have the possibility of

identifying participants? Maintain the maximum amount of detail possible without compromising participants confidentiality.

Sharing

51 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Privacy / Anonymization

.

Planning Working

Finalizing Sharing Data

•  Managing research data takes place at every stage of the research and scholarly process –  Planning –  Working, where you

follow the plan, collecting and analyzing data

–  Finalizing, where you make sure you followed the plan –  Sharing, where you sigh with relief; it’s so simple, because you

followed your plan! •  Research data management can be complex, but there are resources

available

52 56/ Managing Research Data

WHY –WHAT – WHO – WHEN & HOW Take-aways

à SEE NEXT PAGE!

Resources for Research Data Management ⬅⬅⬅links located to the left

WHY –WHAT – WHO – WHEN & HOW

53 56/ Managing Research Data

Resources for Research Data Management:

WHY –WHAT – WHO – WHEN & HOW

Title   URL  

Scholarly Communications Program, Data Management http://scholcomm.columbia.edu/data-management/

Research and Data Integrity Program (ReaDI)

http://www.columbia.edu/cu/compliance/docs/ReaDI_Program/index.html

Data Management Plan Templates http://scholcomm.columbia.edu/data-management/data-management-plan-templates/

CUIT Research Computing Services http://rcs.columbia.edu

Academic Commons Archival Storage http://academiccommons.columbia.edu/about

Citation Management http://library.columbia.edu/research/citation-management.html

Managing Secure Information - Training http://columbia.sighttraining.com

Data Security Policies http://policylibrary.columbia.edu/category/computingtechnology

This work is licensed under a Creative Commons Attribution 4.0 International License.

RESOURCES •  CU Data Policies & Procedures:

–  Faculty Handbook –  Sponsored Projects Handbook –  Clinical Research Handbook –  Administrative Policy Library, Security Policies

e.g., Electronic Information Resources Security, Data Classification Policy, Policy on Electronic Data Security Breach Reporting and Response

•  Scholarly Communications Program •  Office of Research Compliance and Training

54 56/ Managing Research Data This work is licensed under a Creative Commons

Attribution 4.0 International License.

RESOURCES •  Data Management Plans •  CUIT Active Storage options •  Academic Commons archival storage •  Citation management •  Executive Vice President’s Office of Research (EVPR) •  Training on managing Personal Health Information

(PHI) •  Research and Data Integrity Program (ReaDI)

55 56/ Managing Research Data This work is licensed under a Creative Commons

Attribution 4.0 International License.

REFERENCES •  ScoY,  Mark,  Boardman,  Richard  P.,  Reed,  Philippa  A.S.  and  Cox,  Simon  J.  (2012)  

Introducing  research  data.  Southampton,  GB,  Univeristy  of  Southampton,  29pp.  hYp://eprints.soton.ac.uk/338816/  

•  Responsible  research  data  management  and  the  prevenIon  of  scienIfic  misconduct  www.knaw.nl/Content/Internet_KNAW/publicaIes/pdf/2013569.pdf  

•  hYp://dmconsult.library.virginia.edu/  

56 56/ Managing Research Data Created  by:  Amy  Nurnberger,  2015-­‐05-­‐12    

This work is licensed under a Creative Commons Attribution 4.0 International License.