The {targets} R package user manual (original) (raw)

Introduction

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The targets package is a Make-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise.

Motivation

Data analysis can be slow. A round of scientific computation can take several minutes, hours, or even days to complete. After it finishes, if you update your code or data, your hard-earned results may no longer be valid. Unchecked, this invalidation creates chronic Sisyphean loop:

  1. Launch the code.
  2. Wait while it runs.
  3. Discover an issue.
  4. Restart from scratch.

Pipeline tools

Pipeline tools like GNU Make break the cycle. They watch the dependency graph of the whole workflow and skip steps, or “targets”, whose code, data, and upstream dependencies have not changed since the last run of the pipeline. When all targets are up to date, this is evidence that the results match the underlying code and data, which helps us trust the results and confirm the computation is reproducible.

Unlike most pipeline tools, which are language agnostic or Python-focused, the targets package allows data scientists and researchers to work entirely within R. targets implicitly nudges users toward a clean, function-oriented programming style that fits the intent of the R language and helps practitioners maintain their data analysis projects.

About this manual

This manual is a step-by-step user guide to targets. The most important chapters are the walkthrough, help guide, and debugging guide. Subsequent chapters explain how to write code, manage projects, utilize high-performance computing, transition from drake, and more. See the documentation website for most other major resources, including installation instructions, links to example projects, and a reference page with all user-side functions.

What about drake?

The drake is an older R-focused pipeline tool, and targets is drake’s long-term successor. There is a special chapter to explain why targets was created, what this means for drake’s future, advice for drake users transitioning to targets, and the main technical advantages of targets over drake.