Accessible, curated metagenomic data through ExperimentHub (original) (raw)
To the Editor:
The microbiome has emerged as a key aspect of human biology and has been implicated in many disease etiologies. Shotgun metagenomic sequencing is an approach with the highest resolution currently available for studying the taxonomic composition and functional potential of the human microbiome. The increase in publicly available shotgun data theoretically enables hypothesis testing for specific diseases and environmental niches as well as meta-analysis across related studies. However, several factors prevent the research community from taking full advantage of these resources. Barriers include the need for substantial investments of time, computational resources and specialized bioinformatic expertise as well as inconsistencies in annotation and formatting between individual studies.
This is a preview of subscription content, access via your institution
References
- Huber, W. et al. Nat. Methods 12, 115–121 (2015).
Article CAS Google Scholar - Truong, D.T. et al. Nat. Methods 12, 902–903 (2015).
Article CAS Google Scholar - Abubucker, S. et al. PLoS Comput. Biol. 8, e1002358 (2012).
Article CAS Google Scholar - Human Microbiome Project Consortium Nature 486, 207–214 (2012).
- Koren, O. et al. PLoS Comput. Biol. 9, e1002863 (2013).
Article CAS Google Scholar - Arumugam, M. et al. Nature 473, 174–180 (2011).
Article CAS Google Scholar
Acknowledgements
This work was made possible by the CUNY High Performance Computing Center, College of Staten Island, funded in part by the City and State of New York, CUNY Research Foundation, and National Science Foundation Grants CNS-0958379, CNS-0855217 and ACI 1126113. Support was provided by the European Union H2020 Marie-curie grant (707345) to E.P., the European Research Council (ERC-STG project MetaPG), MIUR “Futuro in Ricerca” RBFR13EWWI_001, the People Programme (Marie Curie Actions) of the European Union Seventh Framework Programme (FP7/2007-2013) under REA grant agreement no. PCIG13-GA-2013-618833, the LEO Pharma Foundation, and by Fondazione CARITRO fellowship Rif.Int.2013.0239 to N.S., the National Institute of Allergy and Infectious Diseases (1R21AI121784-01 to J.B.D. and L.W.) and the US National Cancer Institute (U24CA180996 to M.M. and L.W.).
Author information
Author notes
- Edoardo Pasolli, Lucas Schiffer and Paolo Manghi: These authors contributed equally to this work.
Authors and Affiliations
- Centre for Integrative Biology, University of Trento, Trento, Italy
Edoardo Pasolli, Paolo Manghi, Duy Tin Truong, Francesco Beghini & Nicola Segata - Graduate School of Public Health and Health Policy, City University of New York, New York, New York, USA
Lucas Schiffer, Audrey Renson, Faizan Malik, Marcel Ramos, Jennifer B Dowd & Levi Waldron - Institute for Implementation Science and Population Health, City University of New York, New York, New York, USA
Lucas Schiffer, Audrey Renson, Valerie Obenchain, Marcel Ramos & Levi Waldron - Roswell Park Cancer Institute, University of Buffalo, Buffalo, New York, USA
Marcel Ramos & Martin Morgan - Department of Global Health and Social Medicine, King's College London, London, UK
Jennifer B Dowd - Biostatistics Department, Harvard School of Public Health, Boston, Massachusetts, USA
Curtis Huttenhower - The Broad Institute, Cambridge, Massachusetts, USA
Curtis Huttenhower
Authors
- Edoardo Pasolli
- Lucas Schiffer
- Paolo Manghi
- Audrey Renson
- Valerie Obenchain
- Duy Tin Truong
- Francesco Beghini
- Faizan Malik
- Marcel Ramos
- Jennifer B Dowd
- Curtis Huttenhower
- Martin Morgan
- Nicola Segata
- Levi Waldron
Corresponding authors
Correspondence toNicola Segata or Levi Waldron.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Clustering scores for enterotypes in gut WGS samples.
Consistent with Koren et al.5, these plots indicate weak support for any discrete clustering in the data and confirm that the three enterotypes hypothesis is likely an oversimplification that does not hold when considering large set of biogeographycally diverse populations. Thresholds for significance of clustering are presented as dashed lines, and are the same thresholds used by Koren et al.5. Each plot line represents an analysis that can be accomplished with one line of code using the R packages 'fpc' (prediction strength and Calinski-Harabasz) and 'cluster' (silhouette index), provided in the curatedMetagenomicData package examples.
Supplementary Figure 2 Health status classification from species abundance.
Six different classification problems of health status were attempted using a random forest algorithm and cross-validation to estimate prediction accuracy. Plots show ROC curves by using species abundance as microbiome features, one of the five data types considered in the Example 1 of Figure 1. Results are consistent with the meta-analysis conducted in32.
Supplementary Figure 4 Top correlations between metabolic pathways and genera.
Pearson correlation was calculated between each individual pathway (HUMAnN2 pathways from the full UniRef90 database) and each of the top 20 most abundant microbial genera, in a combined dataset obtained from merging 20 studies of gut specimens. The top correlations are 1) Ornithine de novo biosynthesis: Bacteroides (r = 0.86), activity that has been confirmed in cultures of this organism33, and 2) superpathway of allantoin degradation in yeast: Escherichia (r = 0.95). Although this superpathway has been associated with yeast, it includes subpathways (such as allantoin degradation to glyoxylate I and allantoin degradation to ureidoglycolate I) that are common in Escherichia, which is known to be an allantoin utilizier under anaerobic conditions34. Of note, the top 100 correlations have adjusted p < 0.001.
Supplementary Figure 5 Alpha diversity of taxa from 22 studies of the gut microbiome.
Shannon Alpha Diversity was calculated for each individual sample within each human gut microbiome study. The median diversity varies by a maximum factor of 1.5 between studies, however the variability within studies as measured by interquartile range varies by more than 3-fold.
Supplementary information
Rights and permissions
About this article
Cite this article
Pasolli, E., Schiffer, L., Manghi, P. et al. Accessible, curated metagenomic data through ExperimentHub.Nat Methods 14, 1023–1024 (2017). https://doi.org/10.1038/nmeth.4468
- Published: 31 October 2017
- Issue date: 01 November 2017
- DOI: https://doi.org/10.1038/nmeth.4468