submission: piggyback · Issue #220 · ropensci/software-review (original) (raw)

Summary

Allow large and binary data files to "piggyback" on top of your existing repositories. push and pull large-ish (< 2GB) data files to & from GitHub repositories as attachments to a GitHub release;
Paste the full DESCRIPTION file inside a code block below:

Package: piggyback
Version: 0.0.0.9000
Title: Managing Larger Data on a GitHub Repository
Description: Because larger (> 50 MB) data files cannot easily be committed to git,
  a different approach is required to manage data associated with an analysis in a
  GitHub repository.  This package provides a simple work-around by allowing larger
  (up to 2 GB) data files to piggyback on a repository as assets attached to individual
  GitHub releases.  These files are not handled by git in any way, but instead are
  uploaded, downloaded, or edited directly by calls through the GitHub API. These
  data files can be versioned manually by creating different releases.  This approach
  works equally well with public or private repositories.  Data can be uploaded
  and downloaded programmatically from scripts. No authentication is required to
  download data from public repositories.
Authors@R: person("Carl", "Boettiger",
                  email = "cboettig@gmail.com",
                  role = c("aut", "cre", "cph"),
                  comment=c(ORCID = "0000-0002-1642-628X"))
URL: https://github.com/cboettig/piggyback
BugReports: https://github.com/cboettig/piggyback/issues
License: GPL-3
Encoding: UTF-8
LazyData: true
ByteCompile: true
Imports:
    gh,
    httr,
    jsonlite,
    git2r,
    fs,
    usethis,
    crayon,
    clisymbols
Suggests:
    readr,
    covr,
    testthat,
    datasets,
    knitr,
    rmarkdown
VignetteBuilder: knitr
RoxygenNote: 6.0.1.9000
Roxygen: list(markdown = TRUE)

URL for the package (the development repository, not a stylized html page):

https://github.com/cboettig/piggyback

Please indicate which category or categories from our package fit policies this package falls under * and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

reproducibility, because accessing data being analyzed is essential for reproducible workflows, and yet we have no good solution for workflows with unpublished data or private workflows to do this once the data is too large for version control (e.g. files > 50 mb).

Who is the target audience and what are scientific applications of this package?

The target audience is anyone working with data files on GitHub.

Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?

datastorr on ropenscilabs is the closest match, which takes a very different approach (from the user perspective -- on the back end both store data on GitHub assets) to the essentially the same problem. The Intro vignette discusses at greater length many of the alternative possible strategies and why I feel they have all fallen short of my needs and led to me creating this package.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

Do you intend for this package to go on CRAN?
Do you wish to automatically submit to the Journal of Open Source Software? If so:
- The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

No errors, notes, or warnings.

Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Rich FitzJohn, @richfitz, would be great based on his experience in this area and with datastorr. Jenny Bryan, @jennybc, since this package makes heavy use of usethis and GitHub interactions.