Accessible, curated metagenomic data through ExperimentHub (original) (raw)

To the Editor:

The microbiome has emerged as a key aspect of human biology and has been implicated in many disease etiologies. Shotgun metagenomic sequencing is an approach with the highest resolution currently available for studying the taxonomic composition and functional potential of the human microbiome. The increase in publicly available shotgun data theoretically enables hypothesis testing for specific diseases and environmental niches as well as meta-analysis across related studies. However, several factors prevent the research community from taking full advantage of these resources. Barriers include the need for substantial investments of time, computational resources and specialized bioinformatic expertise as well as inconsistencies in annotation and formatting between individual studies.

This is a preview of subscription content, access via your institution

References

Huber, W. et al. Nat. Methods 12, 115–121 (2015).
Article CAS Google Scholar
Truong, D.T. et al. Nat. Methods 12, 902–903 (2015).
Article CAS Google Scholar
Abubucker, S. et al. PLoS Comput. Biol. 8, e1002358 (2012).
Article CAS Google Scholar
Human Microbiome Project Consortium Nature 486, 207–214 (2012).
Koren, O. et al. PLoS Comput. Biol. 9, e1002863 (2013).
Article CAS Google Scholar
Arumugam, M. et al. Nature 473, 174–180 (2011).
Article CAS Google Scholar

Download references

Acknowledgements

This work was made possible by the CUNY High Performance Computing Center, College of Staten Island, funded in part by the City and State of New York, CUNY Research Foundation, and National Science Foundation Grants CNS-0958379, CNS-0855217 and ACI 1126113. Support was provided by the European Union H2020 Marie-curie grant (707345) to E.P., the European Research Council (ERC-STG project MetaPG), MIUR “Futuro in Ricerca” RBFR13EWWI_001, the People Programme (Marie Curie Actions) of the European Union Seventh Framework Programme (FP7/2007-2013) under REA grant agreement no. PCIG13-GA-2013-618833, the LEO Pharma Foundation, and by Fondazione CARITRO fellowship Rif.Int.2013.0239 to N.S., the National Institute of Allergy and Infectious Diseases (1R21AI121784-01 to J.B.D. and L.W.) and the US National Cancer Institute (U24CA180996 to M.M. and L.W.).

Author information

Author notes

Edoardo Pasolli, Lucas Schiffer and Paolo Manghi: These authors contributed equally to this work.

Authors and Affiliations

Centre for Integrative Biology, University of Trento, Trento, Italy
Edoardo Pasolli, Paolo Manghi, Duy Tin Truong, Francesco Beghini & Nicola Segata
Graduate School of Public Health and Health Policy, City University of New York, New York, New York, USA
Lucas Schiffer, Audrey Renson, Faizan Malik, Marcel Ramos, Jennifer B Dowd & Levi Waldron
Institute for Implementation Science and Population Health, City University of New York, New York, New York, USA
Lucas Schiffer, Audrey Renson, Valerie Obenchain, Marcel Ramos & Levi Waldron
Roswell Park Cancer Institute, University of Buffalo, Buffalo, New York, USA
Marcel Ramos & Martin Morgan
Department of Global Health and Social Medicine, King's College London, London, UK
Jennifer B Dowd
Biostatistics Department, Harvard School of Public Health, Boston, Massachusetts, USA
Curtis Huttenhower
The Broad Institute, Cambridge, Massachusetts, USA
Curtis Huttenhower

Authors

Edoardo Pasolli
Lucas Schiffer
Paolo Manghi
Audrey Renson
Valerie Obenchain
Duy Tin Truong
Francesco Beghini
Faizan Malik
Marcel Ramos
Jennifer B Dowd
Curtis Huttenhower
Martin Morgan
Nicola Segata
Levi Waldron

Corresponding authors

Correspondence toNicola Segata or Levi Waldron.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Clustering scores for enterotypes in gut WGS samples.

Consistent with Koren et al.5, these plots indicate weak support for any discrete clustering in the data and confirm that the three enterotypes hypothesis is likely an oversimplification that does not hold when considering large set of biogeographycally diverse populations. Thresholds for significance of clustering are presented as dashed lines, and are the same thresholds used by Koren et al.5. Each plot line represents an analysis that can be accomplished with one line of code using the R packages 'fpc' (prediction strength and Calinski-Harabasz) and 'cluster' (silhouette index), provided in the curatedMetagenomicData package examples.

Supplementary Figure 2 Health status classification from species abundance.

Six different classification problems of health status were attempted using a random forest algorithm and cross-validation to estimate prediction accuracy. Plots show ROC curves by using species abundance as microbiome features, one of the five data types considered in the Example 1 of Figure 1. Results are consistent with the meta-analysis conducted in32.

Supplementary Figure 4 Top correlations between metabolic pathways and genera.

Pearson correlation was calculated between each individual pathway (HUMAnN2 pathways from the full UniRef90 database) and each of the top 20 most abundant microbial genera, in a combined dataset obtained from merging 20 studies of gut specimens. The top correlations are 1) Ornithine de novo biosynthesis: Bacteroides (r = 0.86), activity that has been confirmed in cultures of this organism33, and 2) superpathway of allantoin degradation in yeast: Escherichia (r = 0.95). Although this superpathway has been associated with yeast, it includes subpathways (such as allantoin degradation to glyoxylate I and allantoin degradation to ureidoglycolate I) that are common in Escherichia, which is known to be an allantoin utilizier under anaerobic conditions34. Of note, the top 100 correlations have adjusted p < 0.001.

Supplementary Figure 5 Alpha diversity of taxa from 22 studies of the gut microbiome.

Shannon Alpha Diversity was calculated for each individual sample within each human gut microbiome study. The median diversity varies by a maximum factor of 1.5 between studies, however the variability within studies as measured by interquartile range varies by more than 3-fold.

Supplementary information

Rights and permissions

About this article

Cite this article

Pasolli, E., Schiffer, L., Manghi, P. et al. Accessible, curated metagenomic data through ExperimentHub.Nat Methods 14, 1023–1024 (2017). https://doi.org/10.1038/nmeth.4468

Download citation

Published: 31 October 2017
Issue date: 01 November 2017
DOI: https://doi.org/10.1038/nmeth.4468