CRAN Task View: Reproducible Research (original) (raw)

Maintainer: John Blischak, Alison Hill, Ben Marwick, Daniel Sjoberg, Will Landau
Contact: jdblischak at gmail.com
Version: 2024-12-23
URL: https://CRAN.R-project.org/view=ReproducibleResearch
Source: https://github.com/cran-task-views/ReproducibleResearch/
Contributions: Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.
Citation: John Blischak, Alison Hill, Ben Marwick, Daniel Sjoberg, Will Landau (2024). CRAN Task View: Reproducible Research. Version 2024-12-23. URL https://CRAN.R-project.org/view=ReproducibleResearch.
Installation: The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("ReproducibleResearch", coreOnly = TRUE) installs all the core packages or ctv::update.views("ReproducibleResearch") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details.

The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, understood, and verified. Packages in R for this purpose can be roughly split into groups for: literate programming, pipeline toolkits, package reproducibility, project workflows, code/data formatting tools, format convertors, and object caching.

The current maintainers gratefully acknowledge Max Kuhn for originally creating and maintaining this task view.

Literate Programming

The primary way that R facilitates reproducible research is using a document that is a combination of content and data analysis code. The Sweave function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the analysis. The brew and R.rsp packages contain alternative approaches to embedding R code into various markups.

The resources for literate programming are best organized by the document type/markup language:

LaTeX

Both Sweave and knitr can process LaTeX files. lazyWeave can create LaTeX documents from scratch. RweaveExtra provides Sweave drivers with additional options to control processing and output.

The knitr and rmarkdown packages (along with pandoc ) can be used to create slides using the LaTeX beamer class.

Object Conversion Functions:

Miscellaneous Tools

HTML

The knitr package can process HTML files directly. Sweave can also work with HTML by way of the R2HTML package. lazyWeave can create HTML format documents from scratch.

For HTML slides, a combination of the knitr and rmarkdown packages (along with pandoc ) can be used to create slides using ioslides , reveal.js , Slidy , or remark.js (from the xaringan package).

The packages blogdown, bookdown, and distill can create entire websites.

Object Conversion Functions:

Miscellaneous Tools: htmltools has various tools for working with HTML. tufterhandout can create Tufte-style handouts.

Markdown

The knitr package can process markdown files without assistance. The packages markdown and rmarkdown have general tools for working with documents in this format. lazyWeave can create markdown format documents from scratch. Also, the ascii package can write R objects to the AsciiDoc format.

Object Conversion Functions:

Miscellaneous Tools: tufterhandout can create Tufte-style handouts. kfigr allows for figure indexing in markdown documents.

Microsoft/LibreOffice Formats

The officer (formerly ReporteRs and before that R2DOCX) package can create docx and pptx files. R2wd (windows only) can also create Word documents from scratch and R2PPT (also windows only) can create PowerPoint slides. The rtf package does the same for Rich Text Format documents. The openxlsx package creates xlsx files. The readODS package can read and write Open Document Spreadsheets.

Object Conversion Functions:

Pipeline toolkits help maintain and verify reproducibility. They synchronize computational output with the underlying code and data, and they tell the user when everything is up to date. In other words, they provide concrete evidence that results are re-creatable from the starting materials, and the data analysis project does not need to rerun from scratch. The targets package is such a pipeline toolkit. It is similar to GNU Make , but it is R-focused.

Package Reproducibility

R has various tools for ensuring that specific packages versions can be required for analyses. As an example, the renv package installs packages in project-specific directory, records “snapshots” of the current package versions in a “lockfile”, and restores the package setup on a different machine.

Project Workflows

Successfully completing a data analysis project often requires much more than statistics and visualizations. Efficiently managing the code, data, and results as the project matures helps reduce stress and errors. The following “workflow” packages assist the R programmer by managing project infrastructure and/or facilitating a reproducible workflow.

Workflow utility packages provide single-use functions to implement project infrastructure or solve a specific problem. As a typical example, usethis::use_git() initializes a Git repository, ignores common R files, and commits all project files.

Workflow framework packages provide an organized directory structure and helper functions to assist during the development of the project. As a typical example, ProjectTemplate::create.project() creates an organized setup with many subdirectories, and ProjectTemplate::run.project() executes each R script that is saved in the src/ subdirectory.

formatR and styler can be used to format R code.

highlight and highr can be used to color R code.

Packages humanFormat, lubridate, prettyunits, and rprintf have functions to better format data.

Format Convertors

pander can be used for rendering R objects into Pandoc’s markdown. knitr has the function pandoc that can call an installed version of Pandoc to convert documents between formats such as Markdown, HTML, LaTeX, PDF and Word. tth facilitates TeX to HTML/MathML conversions.

Object Caching Packages

When using Sweave and knitr it can be advantageous to cache the results of time consuming code chunks if the document will be re-processed (i.e. during debugging). knitr facilitates object caching and the Bioconductor package weaver can be used with Sweave.

Non-literate programming packages to facilitate caching/archiving are archivist, R.cache, reproducible, and storr.

CRAN packages

Core: Hmisc, knitr, R2HTML, rms, xtable.
Regular: animation, archivist, ascii, bibtex, blogdown, bookdown, brew, cabinets, checkpoint, codebook, codebookr, dateback, distill, drake, DT, exams, exreport, flextable, formatR, formattable, groundhog, gt, gtsummary, here, highlight, highr, htmlTable, htmltools, HTMLUtils, humanFormat, huxtable, hwriter, kfigr, knitcitations, knitLatex, latex2exp, lazyWeave, liftr, lubridate, madrat, maestro, makeit, makepipe, makeProject, markdown, memisc, miniCRAN, mschart, NMOF, officer, openxlsx, orderly, packrat, pander, papeR, parameters, pharmaRTF, pipeflow, prettyunits, prodigenr, projects, ProjectTemplate, quantreg, R.cache, R.rsp, R2PPT, r2rtf, R2wd, rang, rapport, rcompendium, readODS, RefManageR, renv, repo, RepoGenerator, reportfactory, reporttools, represtools, reproducible, Require, rix, rmarkdown, rprintf, rtf, RweaveExtra, sparktex, stargazer, starter, storr, styler, suRtex, switchr, table1, tables, TAF, targets, texreg, tikzDevice, tinyProject, trackdown, tth, tufterhandout, unrtf, usethis, worcs, workflowr, xaringan, ztable.

Other resources