The R package as the unit of reproducible research · Issue #31 · ropensci/unconf15 (original) (raw)
This repository was archived by the owner on May 19, 2021. It is now read-only.
This repository was archived by the owner on May 19, 2021. It is now read-only.
Description
In my opinion a piece of reproducible research needs at least the following ingredients:
- Documents, to write down goals, reasons for decisions, subproject summaries, why some directions were abandoned, etc. Research papers belong here, too.
- Code, to do the computation.
- Potentially, documentation for the code.
- Tests for your code. (Well, ideally.)
- Data.
- Code or data by other people (or code/data from your previous project), and a way to specify what this code/data are and which versions you need.
- Scripts, that use your or external code, and generate data or documents.
- Other misc files, maybe.
- A way of documenting the environment external to the package, to make the whole thing reproducible. (I.e. OS version, system libraries, etc.)
- A way to keep track of all the things above, how and when they change over time.
- A way of sharing all the things above with collaborators.
If you think about it, this is more or less the description of an R package in a git repository:
- Use vignettes for documents, put them in
/vignettes
- Just put the R/C/C++ code in the package. Other code is trickier, but you can put in
/inst
. - Write roxygen or plain Rd docs, goes into
/man
. - Use testthat (or another testing package, or the builtin R package testing methods), goes into
/tests
- Just put it in
/data
. (OK, not that simple, because data can be big, so maybe into another data package, or a database.) - If these other things are also R packages, then just declare your
Imports
on them inDESCRIPTION
. - Again,
/vignettes
. - They go in
/inst
. - Use Docker if you can. If you can't (e.g. I can't use Docker to run my stuff on the university cluster), then declare your R version and
SystemDependencies
inDESCRIPTION
. - Put the package into a git repository.
- Put the git repository on Github, Bitbucket, etc.
So do we have everything to use R packages as research units? Probably not, but we are really close, I think. In my opinion we would need:
- A better way of handling (versioned) dependencies, for R packages not on CRAN as well (see Beyond CRAN: modern dependency management; including older/archived versions & alternative respositories #7).
- A better way of describing the platform, so that it is possible to recreate it without much effort. Docker is great for this, but sometimes you just cannot use Docker.
- Tools, that facilitate the whole process.
devtools
is great for developing packages in general. Some more specialized tools that go further towards this particular use of packages would be nice.
If this does make sense to you (or the opposite :), I'll be happy to chat about it more.