An application of statistics to comparative metagenomics - PubMed (original) (raw)

Comparative Study

An application of statistics to comparative metagenomics

Beltran Rodriguez-Brito et al. BMC Bioinformatics. 2006.

Abstract

Background: Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments.

Results: Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified.

Conclusion: The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Effect of sample size on identifying differences between phylosubsystems. The red lines reflect the number of phylosubsystems overrepresented in the Sargasso Sea dataset. The blue lines represented the number of phylosubsystems overrepresented in the SEED dataset. Three different confidence levels (90, 95, and 99%) are plotted.

Figure 2

Figure 2

Significantly different subsystems between Sargasso Sea and SEED datasets. (A). Each subsystem was bootstrapped with between 10 and 400 samples per bootstrap, and subsystems that are significantly different with 99% and 2,000 bootstraps are highlighted. Those subsystems that are significantly more prevalent in the SEED database are colored blue, and those subsystems that are significantly more prevalent in the Sargasso Sea dataset are colored red. (B). Magnified view of several different subsystems. Subsystems from amino acid synthesis, carbohydrate utilization, cofactor synthesis, fatty acids, nucleotide synthesis, and photosynthesis are shown in more detail. Colors are as described for (A). E = eukaryotic subsystem, B = bacterial subsystem, A = archaeal subsystem.

Figure 3

Figure 3

Fraction of amino acids in metagenomes. The fraction of each amino acid in all the predicted proteins in the three data samples was counted and compared.

Figure 4

Figure 4

Flow chart of methods used to identify statistical differences between phylosubsystems.

References

    1. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–552. doi: 10.1146/annurev.genet.38.072902.091216. - DOI - PubMed
    1. Breitbart M, Felts B, Kelley S, Mahaffy JM, Nulton J, Salamon P, Rohwer F. Diversity and population structure of a near-shore marine-sediment viral community. Proc R Soc Lond B Biol Sci. 2004;271:565–574. doi: 10.1098/rspb.2003.2628. - DOI - PMC - PubMed
    1. Breitbart M, Hewson I, Felts B, Mahaffy JM, Nulton J, Salamon P, Rohwer F. Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003;185:6220–6223. doi: 10.1128/JB.185.20.6220-6223.2003. - DOI - PMC - PubMed
    1. Breitbart M, Salamon P, Andresen B, Mahaffy JM, Segall AM, Mead D, Azam F, Rohwer F. Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci U S A. 2002;99:14250–14255. doi: 10.1073/pnas.202488399. - DOI - PMC - PubMed
    1. Cann AJ, Fandrich SE, Heaphy S. Analysis of the virus population present in equine faeces indicates the presence of hundreds of uncharacterized virus genomes. Virus Genes. 2005;30:151–156. doi: 10.1007/s11262-004-5624-3. - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources