submission: piggyback · Issue #220 · ropensci/software-review (original) (raw)

Summary

Package: piggyback
Version: 0.0.0.9000
Title: Managing Larger Data on a GitHub Repository
Description: Because larger (> 50 MB) data files cannot easily be committed to git,
  a different approach is required to manage data associated with an analysis in a
  GitHub repository.  This package provides a simple work-around by allowing larger
  (up to 2 GB) data files to piggyback on a repository as assets attached to individual
  GitHub releases.  These files are not handled by git in any way, but instead are
  uploaded, downloaded, or edited directly by calls through the GitHub API. These
  data files can be versioned manually by creating different releases.  This approach
  works equally well with public or private repositories.  Data can be uploaded
  and downloaded programmatically from scripts. No authentication is required to
  download data from public repositories.
Authors@R: person("Carl", "Boettiger",
                  email = "cboettig@gmail.com",
                  role = c("aut", "cre", "cph"),
                  comment=c(ORCID = "0000-0002-1642-628X"))
URL: https://github.com/cboettig/piggyback
BugReports: https://github.com/cboettig/piggyback/issues
License: GPL-3
Encoding: UTF-8
LazyData: true
ByteCompile: true
Imports:
    gh,
    httr,
    jsonlite,
    git2r,
    fs,
    usethis,
    crayon,
    clisymbols
Suggests:
    readr,
    covr,
    testthat,
    datasets,
    knitr,
    rmarkdown
VignetteBuilder: knitr
RoxygenNote: 6.0.1.9000
Roxygen: list(markdown = TRUE)

https://github.com/cboettig/piggyback

reproducibility, because accessing data being analyzed is essential for reproducible workflows, and yet we have no good solution for workflows with unpublished data or private workflows to do this once the data is too large for version control (e.g. files > 50 mb).

The target audience is anyone working with data files on GitHub.

datastorr on ropenscilabs is the closest match, which takes a very different approach (from the user perspective -- on the back end both store data on GitHub assets) to the essentially the same problem. The Intro vignette discusses at greater length many of the alternative possible strategies and why I feel they have all fallen short of my needs and led to me creating this package.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

No errors, notes, or warnings.

Rich FitzJohn, @richfitz, would be great based on his experience in this area and with datastorr. Jenny Bryan, @jennybc, since this package makes heavy use of usethis and GitHub interactions.