Bioconda: sustainable and comprehensive software distribution for the life sciences (original) (raw)

To the Editor: Bioinformatics software comes in a variety of programming languages and requires diverse installation methods. This heterogeneity makes management of a software stack complicated, error-prone, and inordinately time-consuming. Whereas software deployment has traditionally been handled by administrators, ensuring the reproducibility of data analyses1,2,3 requires that the researcher be able to maintain full control of the software environment, rapidly modify it without administrative privileges, and reproduce the same software stack on different machines.

The Conda package manager (https://conda.io) has become an increasingly popular means to overcome these challenges for all major operating systems. Conda normalizes software installations across language ecosystems by describing each software with a human readable ‘recipe’ that defines meta-information and dependencies, as well as a simple ‘build script’ that performs the steps necessary to build and install the software. Conda builds software packages in an isolated environment, transforming them into relocatable binaries. Importantly, it obviates reliance on system-wide administration privileges by allowing users to generate isolated software environments in which they can manage software versions by project, without generating incompatibilities and side-effects (Supplementary Results). These environments support reproducibility, as they can be rapidly exchanged via files that describe their installation state. Conda is tightly integrated into popular solutions for reproducible data analysis such as Galaxy4, bcbio-nextgen (https://github.com/chapmanb/bcbio-nextgen), and Snakemake5. To further enhance reproducibility guarantees, Conda can be combined with container or virtual machine-based approaches and archive facilities such as Zenodo (Supplementary Results). Finally, although Conda provides many commonly used packages by default, it also allows users to optionally include additional, community-managed repositories of packages (termed channels).

This is a preview of subscription content, access via your institution

References

Mesirov, J. P. Science 327, 415–416 (2010).
Article PubMed CAS Google Scholar
Baker, M. Nature 533, 452–454 (2016).
Article PubMed CAS Google Scholar
Munafò, M. R. et al. Nat. Hum. Behav. 1, 0021 (2017).
Article Google Scholar
Afgan, E. et al. Nucleic Acids Res. 44, W3–W10 (2016).
Article PubMed PubMed Central CAS Google Scholar
Köster, J. & Rahmann, S. Bioinformatics 28, 2520–2522 (2012).
Article PubMed CAS Google Scholar
Field, D. et al. Nat. Biotechnol. 24, 801–803 (2006).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We thank all contributors, the conda-forge team, and Anaconda Inc. for excellent cooperation. Further, we thank Travis CI (https://travis-ci.com) and Circle CI (https://circleci.com) for providing free Linux and macOS computing capacity. Finally, we thank ELIXIR (https://www.elixir-europe.org) for constant support and donation of staff. This work was supported by the Intramural Program of the National Institute of Diabetes and Digestive and Kidney Diseases, US National Institutes of Health (R.D.), the Netherlands Organisation for Scientific Research (NWO) (VENI grant 016.Veni.173.076 to J.K.), the German Research Foundation (SFB 876 to J.K.), and the NYU Abu Dhabi Research Institute for the NYU Abu Dhabi Center for Genomics and Systems Biology, program number CGSB1 (grant to J.R. and A. Yousif).

Author information

Author notes

These authors contributed equally: Björn Grüning and Ryan Dale.
A full list of authors and affiliations is available as Supplementary Table 1.

Authors and Affiliations

Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
Björn Grüning
Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, US National Institutes of Health, Bethesda, MD, USA
Ryan Dale
Division of CBRN Security and Defence, FOI–Swedish Defence Research Agency, Umeå, Sweden
Andreas Sjödin
Department of Chemistry, Computational Life Science Cluster (CLiC), Umeå University, Umeå, Sweden
Andreas Sjödin
Harvard T.H. Chan School of Public Health, Boston, MA, USA
Brad A. Chapman
Center for Genomics and Systems Biology, Genomics Core,, NYU Abu Dhabi,, Abu Dhabi,, United Arab Emirates
Jillian Rowe
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
Christopher H. Tomkins-Tinch
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Christopher H. Tomkins-Tinch
Laboratory of Bioinformatics and Computational Biology, A. C. Camargo Cancer Center, São Paulo, Brazil
Renan Valieris
Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg–Essen, Essen, Germany
Johannes Köster
Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
Johannes Köster

Authors

Björn Grüning
You can also search for this author inPubMed Google Scholar
Ryan Dale
You can also search for this author inPubMed Google Scholar
Andreas Sjödin
You can also search for this author inPubMed Google Scholar
Brad A. Chapman
You can also search for this author inPubMed Google Scholar
Jillian Rowe
You can also search for this author inPubMed Google Scholar
Christopher H. Tomkins-Tinch
You can also search for this author inPubMed Google Scholar
Renan Valieris
You can also search for this author inPubMed Google Scholar
Johannes Köster
You can also search for this author inPubMed Google Scholar

Consortia

The Bioconda Team

Contributions

J.K. and R.D. wrote the manuscript and conducted the data analysis. K. Beauchamp, C. Brueffer, B.A.C., F. Eggenhofer, B.G., E. Pruesse, M. Raden, J.R., D. Ryan, I. Shlyakter, A.S., C.H.T.-T., and R.V. (in alphabetical order) contributed to writing of the manuscript. D.A. Søndergaard supervised student programmers on writing Conda package recipes and maintaining the connection with ELIXIR. All other members of the Bioconda Team contributed or maintained recipes (author order was determined by the number of commits in October 2017).

Corresponding author

Correspondence toJohannes Köster.

Ethics declarations

Competing interests

The authors declare no competing interests.

Supplementary information

Rights and permissions

About this article

Cite this article

Grüning, B., Dale, R., Sjödin, A. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences.Nat Methods 15, 475–476 (2018). https://doi.org/10.1038/s41592-018-0046-7

Download citation

Published: 02 July 2018
Issue Date: July 2018
DOI: https://doi.org/10.1038/s41592-018-0046-7