QA/QC for ecological data: tips & cheat codes
-
Upload
cjlortie -
Category
Environment
-
view
73 -
download
0
Embed Size (px)
Transcript of QA/QC for ecological data: tips & cheat codes

dead data tell no tales
@cjlortie

you will have to reuse your data
planning promotes reproducibility

https://dmptool.org to begin your game/journey
try a data management planning tool
Michener & Jones 2012

there is no perfect experiment
Ruxton & Colegrave 2016

there are no perfect data
data vary in class and structure

QA/QC
Cai & Zhu 2015

no one set of criterianeed fit all ecological data
but practical principlescan be used as a guide
Pipino et al. 2002
QA/QC

a practical guide to QA/QC for ecological data
increasingly adopt #rstats & #tidyverse workflows

Tip #1. Pilot data & meta-data
build tidy data & do data by design
rnorm(n = 10, mean = 39.74, sd = 25.09)

Tip #2. Use social coding for QA/QC
(at least) two-player mode

Tip #3. Check #rstats for data tools
there is a package for that (at least two)i.e. like ‘cheat codes’ to get you there sooner

Maia et al. 2013
pavo

biogeo for occurrence data
Robertson et al. 2016

codyn for community dynamic metrics with taxize to check names
codyn::check_multispp(), check_names(), check_sppvar()taxize::gnr_resolve()

use R Markdown + GitHub for versioned reviews & data cleaning
Tip #4. Version & annotate your data cleaning

Tip #5. Check classes of vectors/variables
str(), unique(), nrow(), tibble()

Tip #6. Decide what is a true zero
Martin et al. 2005
is.na(), data[!is.na(data$x), ]

Tip #7. Pre-print your data
publish sooner

the reproducibility crisis in science needs to end. today.
avoid a ‘game-over’ effect before the reuse even begins.

better data. better reproducibility.
nom nom