Metagenomic sequencing of an in vitro-simulated microbial community - PubMed (original) (raw)

Metagenomic sequencing of an in vitro-simulated microbial community

Jenna L Morgan et al. PLoS One. 2010.

Abstract

Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism's DNA was observed in reads generated via DNA sequencing.

Methodology/principal findings: We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized.

Conclusions/significance: We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different protocols are not suitable for comparative metagenomics.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Jonathan Eisen is an associate with PLoS as Editor-in-Chief of PLoS Biology.

Figures

Figure 1

Figure 1. Phylogenetic distribution of organisms selected for the metagenomic simulation.

A phylogenetic tree of three domains with representative groups is shown. Organisms used in this study are indicated by *. The organisms used represent all known domains of life, include four bacterial phyla, a variety of genome sizes, GC compositions, and cell wall types. Large font size indicates clades where multiple isolate genomes have been collapsed into a single leaf node.

Figure 2

Figure 2. Outline of the steps involved in the creation and sequencing of the simulated metagenomic samples.

Figure 3

Figure 3. Predicted and observed frequencies of sequence reads from each organism.

The fraction of reads assigned to organisms for each sample preparation method is shown at top. The fraction expected given the measured quantities of mixed DNA from each organism assuming unbiased library prep and sequencing is given as “DNA quantification”, and the fraction of reads predicted based on cell count and genome size is given as “cc*gs prediction.” Sampling error was estimated assuming a multinomial distribution (not shown) and indicated that estimates of relative abundance are accurate +/−5% for dominant organisms given the number of Sanger reads obtained, and +/−1% for pyrosequencing reads. Note that the top two bars labeled Enz+Pyrosequencing and Enz+Sanger offer a comparison of Sanger and pyrosequencing technology on the same extracted DNA.

Figure 4

Figure 4. Additional sequence data for three of the simulated metagenomes.

Bars represent the observed frequency of organisms in sequenced metagenomes. We constructed and sequenced metagenomes according to the Enz, EnzBB, and DNeasy protocols using the long term frozen isolate culture stocks with glycerol and without glycerol. Reads were mapped to reference genomes as described in Methods. The additional metagenomes show some differences to each of the original libraries. Such differences might be caused by variation across DNA preparations and sequencing runs, age of the frozen samples, or other factors. The libraries constructed using the DNeasy Kit produced the most consistent results.

References

    1. Hugenholtz P, Goebel BM, Pace NR. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol. 1998;180:4765–4774. - PMC - PubMed
    1. Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev. 2004;68:669–685. - PMC - PubMed
    1. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–552. - PubMed
    1. Blow N. Metagenomics: exploring unseen communities. Nature. 2008;453:687–690. - PubMed
    1. Daniel R. The metagenomics of soil. Nat Rev Microbiol. 2005;3:470–478. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources