Best practices data collection

23
est Practices eating Research Data Sherry Lake July 31, 2012 University of Florida Data Management Workshop

description

 

Transcript of Best practices data collection

Page 1: Best practices data collection

Best PracticesCreating Research Data

Sherry LakeJuly 31, 2012 University of Florida Data Management Workshop

Page 2: Best practices data collection

WHY?

Following these Best Practices…….

• Will improve the usability of the data by you or by others

• Your data will be “computer ready”• Your data will be ready to share with others

Page 3: Best practices data collection

Spreadsheet Examples

Page 4: Best practices data collection

Spreadsheet Problems?

Page 5: Best practices data collection

Problems

• Dates are not stored consistently

• Values are labeled inconsistently• Data coding is inconsistent• Order of values are different

Page 6: Best practices data collection

Problems

• Confusion between numbers and text

• Different types of data are stored in the same columns

• The spreadsheet loses interpretability if it is sorted

Page 7: Best practices data collection

Best Practices Data Organization

• Lines or rows of data should be complete – Designed to be machine readable, not human

readable (sort)

Page 8: Best practices data collection

Best Practices Data Organization

• Include a Header Line 1st line (or record) • Label each Column with a short but

descriptive name– Names should be unique– Use letters, numbers, or “_” (underscore)– Do not include blank spaces or symbols (+ - & ^ *)

Page 9: Best practices data collection

Best Practices Data Organization

• Columns of data should be consistent – Use the same naming convention for text data

• Columns should include only a single kind of data– Text or “string” data – Integer numbers– Floating point or real numbers

Page 10: Best practices data collection

Use Standardized Formats

• ISO 8601 Standard for Date and Time– YYYYMMDDThh:mmss.sTZD

20091013T09:1234.9Z 20091013T09:1234.9+05:00

• Spatial Coordinates for Latitute/Longitude– +/- DD.DDDDD -78.476 (longitude)

+38.029 (latitude)

Page 11: Best practices data collection

File Names

Page 12: Best practices data collection

File Names

• Use descriptive names• Not too long• Don’t use spaces• Try to include time,

place & theme• May use “-” or “_”

Page 13: Best practices data collection

File Names

• String words together with Caps (VegBiodiv_2007)

• Think about using version numbers

• Don’t change default extensions (txt, jpg, csv,…)

Page 14: Best practices data collection

Quantitative Assurance/Control

Dataset Creation & Integrity Errors• Use a data entry program

– Program to catch typing errors

– Program pull-down menu options

• Perform double entry of the data

• Manually check 5 – 10% of data records

• Check for out-of-range values (plotting)

• Check for missing or impossible values

• Perform statistical summaries (random samples)

Page 15: Best practices data collection

Analyzing Data - Notes

• Keep Original File– Uncorrected copy– Make “read-only”

• Make notes on transformations• Any changes, save as a new file• Use scripted code to transform and correct

data

Page 16: Best practices data collection

Analyzing Data

• Use a scripted program (R, SAS, SPSS, Matlab)– Steps are recorded in textual format– Can be easily revised and re-executed– Helps sharing and repetition– Easy to document

• GUI-bases analysis may be easier, but harder to reproduce

Page 17: Best practices data collection

Document EVERYTHING!

• Create a Project Document File– More than a Lab Notebook– Data Management Plan

• Start at the beginning of the project and continue throughout data collection & analysis– Why you are collecting data– Exact details of methods of collecting & analyzing

Page 18: Best practices data collection

Document EVERYTHING!

• Details such as:– Names of data & analysis files associated with

study– Definitions for data and codes (include missing

value codes, names) example– Units of measure (accuracy and precision)– Standards or instrument calibrations

Page 19: Best practices data collection

Choosing File Formats

• Accessible Data (in the future)– Non-proprietary (software formats)– Open, documented standard– Common, used by the research community– Standard representation (ASCII, Unicode)– Unencrypted & Uncompressed– Media formats (hardware formats)

Page 20: Best practices data collection

Preferred Format Choices

• PDF, not Word• ASCII, not Excel• MPEG-4, not Quicktime• TIFF or JPEG2000, not GIF or JPG• XML or RDF, not RDBMS

Good if not software specific

Page 21: Best practices data collection

Best Practices

1. Use Consistent Data Organization2. Use Standardized Formats3. Assign Descriptive File Names4. Perform Basic Quality Assurance/ Quality Control5. Use Scripted Program for Analysis and Keep Notes6. Document EVERYTHING! (Define Contents of Data

Files )7. Use Consistent, Stable and Open File Formats

Page 22: Best practices data collection

Best Practices BibliographyBorer, E. T., Seabloom, E. W., Jones, M. B., & Schildhauer, M. (2009). Some

simple guidelines for effective data management. Bulletin of the Ecological Society of America, 90(2), 205-214.

Hook, L. A., Santhana Vannan, S.K., Beaty, T. W., Cook, R. B. and Wilson, B.E. (2010). Best Practices for Preparing Environmental Data Sets to Share and Archive. Available online (http://daac.ornl.gov/PI/BestPractices-2010.pdf) from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. doi:10.3334/ORNLDAAC/BestPractices-2010.

Inter-university Consortium for Political and Social Research (ICPSR). (2012). Guide to social science data preparation and archiving: Best practices throughout the data cycle (5th ed.). Ann Arbor, MI. Retrieved 05/31/2012, from http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf.

Data Observation Network for Earth (DataONE). (2012). DataONE Best Practices database. Retrieved 07/21/12, from http://www.dataone.org/best-practices.

Page 23: Best practices data collection

23

Questions? Discussion?

• Sherry LakeSenior Scientific Data Consultant, UVA Library

[email protected]• Twitter: shlakeuva• Slideshare: http://www.slideshare.net/shlake• Web: http://www.lib.virginia.edu/brown/data