Microbial community resemblance methods differ in their ability to detect biologically relevant patterns - PubMed (original) (raw)
Microbial community resemblance methods differ in their ability to detect biologically relevant patterns
Justin Kuczynski et al. Nat Methods. 2010 Oct.
Abstract
High-throughput sequencing methods enable characterization of microbial communities in a wide range of environments on an unprecedented scale. However, insight into microbial community composition is limited by our ability to detect patterns in this flood of sequences. Here we compare the performance of 51 analysis techniques using real and simulated bacterial 16S rRNA pyrosequencing datasets containing either clustered samples or samples arrayed across environmental gradients. We found that many diversity patterns were evident with severely undersampled communities and that methods varied widely in their ability to detect gradients and clusters. Chi-squared distances and Pearson correlation distances performed especially well for detecting gradients, whereas Gower and Canberra distances performed especially well for detecting clusters. These results also provide a basis for understanding tradeoffs between number of samples and depth of coverage, tradeoffs that are important to consider when designing studies to characterize microbial communities.
Figures
Figure 1
Schematic of simulations and analysis of data. (a) 6 stages for the analysis of a simulated environmental gradient (b) Clustered samples. A hypothetical sample is formed at the root of a hierarchy which defines the relatedness of samples both inter- and intra-cluster (d1 and d2; stage 1). The species abundances at the root node (stage 2) are perturbed by an amount proportional to d1, and the results are renormalized to form the species abundances at each cluster (stage 3). The cluster nodes are then perturbed by d2 to produce species abundances at each sample (stage 4). Sample data is generated and analyzed similar to (a), and the analysis methods are then evaluated based on their ability to reveal the underlying cluster structure of the samples (stages 5–8).
Figure 2
Comparison of different gradient methods on the soil dataset, a simulated gradient dataset with or without noise. Axes represent the first two principal coordinates maximizing the variance in the data, obtained via PCoA (the percentage of the total variance explained by each axis is shown in parentheses). Each data point is a microbial community sample, colored according to either a real gradient (soil pH) or a simulated gradient (arbitrary units). For simulated data, sequencing depth was 1,000 sequences per sample, and species rank-abundance distributions were fit from empirical data.
Figure 3
Choice of analysis method reveals or obscures clusters. Keyboard data, simulated data resembling the keyboard data (distinct clusters), and simulated data representing less prominent sample clusters (subtle clusters) were analyzed by the indicated techniques All simulated data shown in this figure had 90 samples divided into 3 clusters, with 1,000 sequences per sample. Axes are labeled as in Figure 2.
Figure 4
Deep sequencing is superfluous when clusters are prominent, but critical when clusters are subtle. Data representing either prominent or subtle clusters was generated (see methods) with varying sequencing depths. (a–c) Jaccard distance followed by PCoA was applied to prominent cluster data with 10, 1,000, or 100,000 sequences per sample. No substantial improvement in the effectiveness of the method was found above 1,000 sequences per sample. (d–f) Gower distance followed by PCoA was applied to the same data (g–i) Gower distance applied to more subtle clusters.. (j–l) Morisita-Horn distance followed by PCoA applied to the subtle clusters. Although substantially more of the variance is explained by this method, the clusters are not easily interpretable: this situation persists even with 10 million sequences per sample (data not shown).
Figure 5
Tradeoff between number of samples and number of sequences per sample with prominent and subtle gradients and clusters. Panels show (a) subtle clusters, (b) prominent clusters, (c) subtle gradients, and (d) prominent gradients, with a survey budget of 500,000 sequences allocated to varying numbers of samples, and thus an inversely varying number of sequences per sample. Insets show examples of data at specific sampling depths. The inset panels show examples of the gradients and clusters at 5, 100, and 2,000 samples, corresponding to 100,000 5,000 and 250 sequences per sample respectively (arranged right to left in each panel). All comparisons use the Pearson distance + PCoA ordination method. Note that the fraction of the variance explained by the PCoA decreases as the number of samples increases, even when the patterns are clearer with more samples. Error bars represent ± s.e.m. of 12 simulations.
Similar articles
- Using network analysis to explore co-occurrence patterns in soil microbial communities.
Barberán A, Bates ST, Casamayor EO, Fierer N. Barberán A, et al. ISME J. 2012 Feb;6(2):343-51. doi: 10.1038/ismej.2011.119. Epub 2011 Sep 8. ISME J. 2012. PMID: 21900968 Free PMC article. - Regional Similarities and Consistent Patterns of Local Variation in Beach Sand Bacterial Communities throughout the Northern Hemisphere.
Staley C, Sadowsky MJ. Staley C, et al. Appl Environ Microbiol. 2016 Apr 18;82(9):2751-2762. doi: 10.1128/AEM.00247-16. Print 2016 May. Appl Environ Microbiol. 2016. PMID: 26921429 Free PMC article. - Bacterial Diversity Patterns Differ in Soils Developing in Sub-tropical and Cool-Temperate Ecosystems.
Shanmugam SG, Magbanua ZV, Williams MA, Jangid K, Whitman WB, Peterson DG, Kingery WL. Shanmugam SG, et al. Microb Ecol. 2017 Apr;73(3):556-569. doi: 10.1007/s00248-016-0884-8. Epub 2016 Nov 26. Microb Ecol. 2017. PMID: 27889811 - Species divergence and the measurement of microbial diversity.
Lozupone CA, Knight R. Lozupone CA, et al. FEMS Microbiol Rev. 2008 Jul;32(4):557-78. doi: 10.1111/j.1574-6976.2008.00111.x. Epub 2008 Apr 22. FEMS Microbiol Rev. 2008. PMID: 18435746 Free PMC article. Review. - A survey of the methods for the characterization of microbial consortia and communities.
Spiegelman D, Whissell G, Greer CW. Spiegelman D, et al. Can J Microbiol. 2005 May;51(5):355-86. doi: 10.1139/w05-003. Can J Microbiol. 2005. PMID: 16088332 Review.
Cited by
- Social interaction, noise and antibiotic-mediated switches in the intestinal microbiota.
Bucci V, Bradde S, Biroli G, Xavier JB. Bucci V, et al. PLoS Comput Biol. 2012;8(4):e1002497. doi: 10.1371/journal.pcbi.1002497. Epub 2012 Apr 26. PLoS Comput Biol. 2012. PMID: 22577356 Free PMC article. - The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples.
Evans SN, Matsen FA. Evans SN, et al. J R Stat Soc Series B Stat Methodol. 2012 Jun 1;74(3):569-592. doi: 10.1111/j.1467-9868.2011.01018.x. Epub 2012 Feb 15. J R Stat Soc Series B Stat Methodol. 2012. PMID: 22844205 Free PMC article. - Sinus microbiome diversity depletion and Corynebacterium tuberculostearicum enrichment mediates rhinosinusitis.
Abreu NA, Nagalingam NA, Song Y, Roediger FC, Pletcher SD, Goldberg AN, Lynch SV. Abreu NA, et al. Sci Transl Med. 2012 Sep 12;4(151):151ra124. doi: 10.1126/scitranslmed.3003783. Sci Transl Med. 2012. PMID: 22972842 Free PMC article. - Inferring correlation networks from genomic survey data.
Friedman J, Alm EJ. Friedman J, et al. PLoS Comput Biol. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687. Epub 2012 Sep 20. PLoS Comput Biol. 2012. PMID: 23028285 Free PMC article. - Effects of season and experimental warming on the bacterial community in a temperate mountain forest soil assessed by 16S rRNA gene pyrosequencing.
Kuffner M, Hai B, Rattei T, Melodelima C, Schloter M, Zechmeister-Boltenstern S, Jandl R, Schindlbacher A, Sessitsch A. Kuffner M, et al. FEMS Microbiol Ecol. 2012 Dec;82(3):551-62. doi: 10.1111/j.1574-6941.2012.01420.x. Epub 2012 Jun 25. FEMS Microbiol Ecol. 2012. PMID: 22670891 Free PMC article.
References
- Rappe MS, Giovannoni SJ. The uncultured microbial majority. Annu Rev Microbiol. 2003;57:369–394. - PubMed
Publication types
MeSH terms
Grants and funding
- P01 DK078669-030003/DK/NIDDK NIH HHS/United States
- P01 DK078669/DK/NIDDK NIH HHS/United States
- HG4872/HG/NHGRI NIH HHS/United States
- U01 HG004866/HG/NHGRI NIH HHS/United States
- R01 HG004872/HG/NHGRI NIH HHS/United States
- HG4866/HG/NHGRI NIH HHS/United States
- R01 HG004872-03/HG/NHGRI NIH HHS/United States
- HHMI/Howard Hughes Medical Institute/United States
- DK78669/DK/NIDDK NIH HHS/United States