Reproducible research (and literate programming) in R
-
Upload
lizis -
Category
Data & Analytics
-
view
392 -
download
3
Transcript of Reproducible research (and literate programming) in R
![Page 1: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/1.jpg)
Lenhard Group Retreat - October 2015
Reproducible research in R
Liz Ing-Simmons
![Page 2: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/2.jpg)
Lenhard Group Retreat - October 2015
Reproducible research (and literate programming) in R
Liz Ing-Simmons
![Page 3: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/3.jpg)
The worst kind of collaborator
(This is good motivation for reproducibility, too)
![Page 4: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/4.jpg)
What is reproducibility?
![Page 5: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/5.jpg)
What is reproducibility?
• Replicable:– results can be reproduced from an independent analysis
(different lab, model system, software…)
• Reproducible:– Results can be reproduced using your code and data
• Both are important!– Making analysis reproducible means being explicit about
what you’ve done, which makes it easier to replicate– and has other benefits (more on this later)
• Partial reproducibility is better than none
Or maybe the other way round depending on who you ask…
![Page 6: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/6.jpg)
Reproducibility tools for R
• packrat– Manage and track dependencies for projects
• switchr– Switch between different package libraries
• knitr– Report generation from combined text and code
• R Markdown (rmarkdown package)– Simple formatting syntax for text and code blocks
You can use knitr with other languages too!
![Page 7: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/7.jpg)
Literate programming
• Documents that combine code, results, and documentation that tells you what the code is doing
• Encourages you to be explicit about what you’re trying to do– can make it easier to spot mistakes– better code
more readable more understandable more reusable
• Bonus: make pretty reports for your collaborators• Some journals now encourage you to submit code as
supplementary material
![Page 8: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/8.jpg)
Anatomy of an Rmarkdown document
YAML header: Title, author, document options
Code block:Enclosed in ```, language and
options specified
Text:Including section headers and links
![Page 9: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/9.jpg)
A sample .Rmd
• In Rstudio, you can use the ‘knit HTML’ button (or pdf)
• In an R session, use knitr::knit2html()
![Page 10: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/10.jpg)
Anatomy of an Rmd
Table of contents‘short’ or ‘long’ version – with code
included or without
Controls printing of warnings/messages
Custom figure / cache paths
Stop on error!
![Page 11: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/11.jpg)
Anatomy of an RmdSecond-level header
Links
(you can use similar syntax to insert image files)
Load all packages(do not cache!)
Keep functions in one place
![Page 12: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/12.jpg)
Anatomy of an RmdCache data loading /
processing
Code formatting within text using backticks`function()`
Control figure size for a specific chunk
It’s a good idea to name chunks – will be used to name figures
![Page 13: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/13.jpg)
Anatomy of an Rmd
Code can be included for demonstration but not evaluated
Here the data is loaded from the package instead
![Page 14: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/14.jpg)
Anatomy of an Rmd
Format tables with knitr::kable()
You can include citations from (e.g.) a BibTeX file in an Rmd!
(but it’s not worth it for two)
Include session info to track package versions used!
I also add the time the document was created
You can include evaluated R code in the text by using `r `
![Page 15: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/15.jpg)
Other tips and tricks
• You can set multiple figure devices e.g. dev=c(‘pdf’, ‘png’)
• Disable lazy loading for very large caches (cache.lazy = FALSE)
• ‘dependson’ can be used to set dependencies between chunks
![Page 16: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/16.jpg)
Other tips and tricks
• File paths:– Either relative to the Rmd location or set as a
variable• Consider directory structure
– (e.g. nicercode.github.io/blog/2013-04-05-projects/)
• Use set.seed() if using any random numbers
![Page 17: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/17.jpg)
Resources
yihui.name/knitr Official site with examples and documentation(there’s also a knitr book)
kbroman.org/knitr_knutshell/Really good knitr tutorial
kbroman.org/steps2rr/Other reproducibility tips
rmarkdown.rstudio.com/Rmarkdown info including cheatsheets
![Page 18: Reproducible research (and literate programming) in R](https://reader035.fdocuments.in/reader035/viewer/2022062901/58f147831a28ab82588b45ef/html5/thumbnails/18.jpg)
Other reproducibility tools
• Jupyter (formerly iPython Notebook):– Similar in concept to knitr but for interactive
use (jupyter.org/)• Make (and similar tools):
– Automated building of project outputs• Docker (Rocker):
– Containers for code, like a lightweight virtual machine