A statistical toolbox for metagenomics: assessing functional diversity in microbial communities - PubMed (original) (raw)

A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

Patrick D Schloss et al. BMC Bioinformatics. 2008.

Abstract

Background: The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data.

Results: Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments.

Conclusion: The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Analysis of the richness and community membership when peptide fragments identified in individual sequence reads were used to assemble the Bacillus anthracis str. Ames genome sequence. (A) The collector's curves for three non-parametric richness estimators and observed richness using individual sequence reads compared to the OPF richness of the assembled genome (horizontal black line). The solid lines represent the richness of non-merged OPFs and the dashed lines represent the richness of merged OPFs with a penalty of 0.15. (B) Collector's curves of parameters describing the similarity between two randomly selected subsets of peptide fragments.

Figure 2

Figure 2

Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an AMD biofilm community.

Figure 3

Figure 3

Collector's curves for the OTU (A) and OPF (B) richness observed and estimated using DNA extracted from an agricultural soil in Minnesota, USA.

Figure 4

Figure 4

Venn diagram comparing the OPF membership found in three whalebone microbial communities (AGZO, n = 38,981 peptide fragments; AHAA, n = 36,165; and AHAI, n = 33,199). Below each community name is the Chao1 richness estimate and the 95% confidence interval for that community. We estimated the richness of the overlapping regions based on the pairwise SA,B Chao shared richness estimates between the three communities and by pooling two communities and estimating the shared fraction with the third community. These estimates are provided on the right side of the figure.

Figure 5

Figure 5

Venn diagram comparing the pooled OPF membership found in the AMD (n = 99,419 peptide fragments), soil (n = 143,422), and whalebone (n = 108,345) microbial communities.

References

    1. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–552. doi: 10.1146/annurev.genet.38.072902.091216. - DOI - PubMed
    1. Stein JL, Marsh TL, Wu KY, Shizuya H, DeLong EF. Characterization of uncultivated prokaryotes: Isolation and analysis of a 40-kilobase-pair genome fragment front a planktonic marine archaeon. J Bacteriol. 1996;178:591–599. - PMC - PubMed
    1. Rondon MR, August PR, Bettermann AD, Brady SF, Grossman TH, et al. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol. 2000;66:2541–2547. doi: 10.1128/AEM.66.6.2541-2547.2000. - DOI - PMC - PubMed
    1. Schmidt TM, DeLong EF, Pace NR. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing. J Bacteriol. 1991;173:4371–4378. - PMC - PubMed
    1. Breitbart M, Hewson I, Felts B, Mahaffy JM, Nulton J, et al. Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003;185:6220–6223. doi: 10.1128/JB.185.20.6220-6223.2003. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources