Bioconda: sustainable and comprehensive software distribution for the life sciences (original) (raw)

To the Editor: Bioinformatics software comes in a variety of programming languages and requires diverse installation methods. This heterogeneity makes management of a software stack complicated, error-prone, and inordinately time-consuming. Whereas software deployment has traditionally been handled by administrators, ensuring the reproducibility of data analyses1,2,3 requires that the researcher be able to maintain full control of the software environment, rapidly modify it without administrative privileges, and reproduce the same software stack on different machines.

The Conda package manager (https://conda.io) has become an increasingly popular means to overcome these challenges for all major operating systems. Conda normalizes software installations across language ecosystems by describing each software with a human readable ‘recipe’ that defines meta-information and dependencies, as well as a simple ‘build script’ that performs the steps necessary to build and install the software. Conda builds software packages in an isolated environment, transforming them into relocatable binaries. Importantly, it obviates reliance on system-wide administration privileges by allowing users to generate isolated software environments in which they can manage software versions by project, without generating incompatibilities and side-effects (Supplementary Results). These environments support reproducibility, as they can be rapidly exchanged via files that describe their installation state. Conda is tightly integrated into popular solutions for reproducible data analysis such as Galaxy4, bcbio-nextgen (https://github.com/chapmanb/bcbio-nextgen), and Snakemake5. To further enhance reproducibility guarantees, Conda can be combined with container or virtual machine-based approaches and archive facilities such as Zenodo (Supplementary Results). Finally, although Conda provides many commonly used packages by default, it also allows users to optionally include additional, community-managed repositories of packages (termed channels).

This is a preview of subscription content, access via your institution

References

  1. Mesirov, J. P. Science 327, 415–416 (2010).
    Article PubMed CAS Google Scholar
  2. Baker, M. Nature 533, 452–454 (2016).
    Article PubMed CAS Google Scholar
  3. Munafò, M. R. et al. Nat. Hum. Behav. 1, 0021 (2017).
    Article Google Scholar
  4. Afgan, E. et al. Nucleic Acids Res. 44, W3–W10 (2016).
    Article PubMed PubMed Central CAS Google Scholar
  5. Köster, J. & Rahmann, S. Bioinformatics 28, 2520–2522 (2012).
    Article PubMed CAS Google Scholar
  6. Field, D. et al. Nat. Biotechnol. 24, 801–803 (2006).
    Article PubMed CAS Google Scholar

Download references

Acknowledgements

We thank all contributors, the conda-forge team, and Anaconda Inc. for excellent cooperation. Further, we thank Travis CI (https://travis-ci.com) and Circle CI (https://circleci.com) for providing free Linux and macOS computing capacity. Finally, we thank ELIXIR (https://www.elixir-europe.org) for constant support and donation of staff. This work was supported by the Intramural Program of the National Institute of Diabetes and Digestive and Kidney Diseases, US National Institutes of Health (R.D.), the Netherlands Organisation for Scientific Research (NWO) (VENI grant 016.Veni.173.076 to J.K.), the German Research Foundation (SFB 876 to J.K.), and the NYU Abu Dhabi Research Institute for the NYU Abu Dhabi Center for Genomics and Systems Biology, program number CGSB1 (grant to J.R. and A. Yousif).

Author information

Author notes

  1. These authors contributed equally: Björn Grüning and Ryan Dale.
  2. A full list of authors and affiliations is available as Supplementary Table 1.

Authors and Affiliations

  1. Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
    Björn Grüning
  2. Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, US National Institutes of Health, Bethesda, MD, USA
    Ryan Dale
  3. Division of CBRN Security and Defence, FOI–Swedish Defence Research Agency, Umeå, Sweden
    Andreas Sjödin
  4. Department of Chemistry, Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden
    Andreas Sjödin
  5. Harvard T.H. Chan School of Public Health, Boston, MA, USA
    Brad A. Chapman
  6. Center for Genomics and Systems Biology, Genomics Core,, NYU Abu Dhabi,, Abu Dhabi,, United Arab Emirates
    Jillian Rowe
  7. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
    Christopher H. Tomkins-Tinch
  8. Broad Institute of MIT and Harvard, Cambridge, MA, USA
    Christopher H. Tomkins-Tinch
  9. Laboratory of Bioinformatics and Computational Biology, A. C. Camargo Cancer Center, São Paulo, Brazil
    Renan Valieris
  10. Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg–Essen, Essen, Germany
    Johannes Köster
  11. Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
    Johannes Köster

Authors

  1. Björn Grüning
    You can also search for this author inPubMed Google Scholar
  2. Ryan Dale
    You can also search for this author inPubMed Google Scholar
  3. Andreas Sjödin
    You can also search for this author inPubMed Google Scholar
  4. Brad A. Chapman
    You can also search for this author inPubMed Google Scholar
  5. Jillian Rowe
    You can also search for this author inPubMed Google Scholar
  6. Christopher H. Tomkins-Tinch
    You can also search for this author inPubMed Google Scholar
  7. Renan Valieris
    You can also search for this author inPubMed Google Scholar
  8. Johannes Köster
    You can also search for this author inPubMed Google Scholar

Consortia

The Bioconda Team

Contributions

J.K. and R.D. wrote the manuscript and conducted the data analysis. K. Beauchamp, C. Brueffer, B.A.C., F. Eggenhofer, B.G., E. Pruesse, M. Raden, J.R., D. Ryan, I. Shlyakter, A.S., C.H.T.-T., and R.V. (in alphabetical order) contributed to writing of the manuscript. D.A. Søndergaard supervised student programmers on writing Conda package recipes and maintaining the connection with ELIXIR. All other members of the Bioconda Team contributed or maintained recipes (author order was determined by the number of commits in October 2017).

Corresponding author

Correspondence toJohannes Köster.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Rights and permissions

About this article

Cite this article

Grüning, B., Dale, R., Sjödin, A. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences.Nat Methods 15, 475–476 (2018). https://doi.org/10.1038/s41592-018-0046-7

Download citation