Microbial community profiling for human microbiome projects: Tools, techniques, and challenges - PubMed (original) (raw)

Review

Microbial community profiling for human microbiome projects: Tools, techniques, and challenges

Micah Hamady et al. Genome Res. 2009 Jul.

Abstract

High-throughput sequencing studies and new software tools are revolutionizing microbial community analyses, yet the variety of experimental and computational methods can be daunting. In this review, we discuss some of the different approaches to community profiling, highlighting strengths and weaknesses of various experimental approaches, sequencing methodologies, and analytical methods. We also address one key question emerging from various Human Microbiome Projects: Is there a substantial core of abundant organisms or lineages that we all share? It appears that in some human body habitats, such as the hand and the gut, the diversity among individuals is so great that we can rule out the possibility that any species is at high abundance in all individuals: It is possible that the focus should instead be on higher-level taxa or on functional genes instead.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Models of a core microbiome. The circles represent the microbial communities in different individuals and can be thought of as either representing different taxa (species, genera, etc.) or representing different genes. (A) “Substantial core” model. Most individuals share most components of the microbiota. (B) “Minimal core” model. All individuals share a few components, and any individual shares many components with a few other individuals, but very little is shared across all individuals. (C) “No core” model. Nothing is shared by all individuals, and most diversity is unique to a given individual. (D) “Gradient” model. Individuals next to each other on a gradient, for example, age or obesity, share many components, but individuals at opposite ends share little or nothing. (E) “Subpopulation” model. Different subpopulations, for example, those defined by geography or disease, have different cores, but nothing is shared across subpopulations. Scenarios C–E would represent situations in which the strategy of identifying core species for sequencing, then using these as a scaffold for “omics” studies, would be problematic.

Figure 2.

Figure 2.

Overview of barcoded pyrosequencing workflow. The sample-specific barcodes are introduced into each sample during the PCR step (for amplicon sequencing), or through ligation (for metagenomics). After sequencing, individual sequences can then be traced back to individual samples using the barcodes they contain. The sequences from each sample are then separated, aligned, and then either used directly for taxa-based analyses or used to build trees for phylogenetic analyses. OTU, operational taxonomic unit.

Figure 3.

Figure 3.

(A) Phylum-level abundance and (B) shared “species” (represented here as 97% OTUs, approximately species level) in 22 human gut samples with depth of coverage of at least 350 sequences per individual. These data are taken from a meta-analysis (Ley et al. 2008a) covering several large Sanger-sequencing studies of humans in different populations (Suau et al. 1999; Hayashi et al. 2002a,b, 2003; Eckburg et al. 2005; Ley et al. 2006c; Nagashima et al. 2006). Interestingly, the results are very consistent with results from both Sanger sequencing and pyrosequencing within a North American population of lean and obese twins (Turnbaugh et al. 2009). Note: No species-level OTUs were shared across all samples with 350 sequences per sample; 1813 OTUs were only present in one sample; the total number of OTUs was 2320.

Figure 4.

Figure 4.

Comparison of phylogenetic and nonphylogenetic methods for comparing communities. (A–D) Sequences are from stool and six different biopsy sites along the distal gut from three unrelated healthy human subjects (Eckburg et al. 2005); (E) sequences are from 162 free-living communities and 159 vertebrate gut communities (Ley et al. 2008b). Fragments are labeled as either full-length, V2 or V4 (250-nt reads ending at 338R or starting at 515F, respectively), or V6 (80-nt reads ending at 1046R). (A) Effect of fragment on phylogenetic assignment: Each circle is one of the three individual human subjects, pooling sequences from all sites. Note increase in unclassified reads produced by V6; results from V2 and V4 are very similar to those obtained from the full-length sequences. Assignments performed using RDP. (B) Effect of three different distance measures for principal coordinates analysis on the full-length 16S rRNA sequence data: UniFrac (a phylogenetic method), and Euclidean and Kulczynski distances on the sample by OTU matrix (two examples of taxon-based methods). Only the relative positions of and distances between points are relevant: The choice of direction along each axis is a mathematical artifact. Individual points are samples, colored according to the three subjects that the samples came from (i.e., the three colors represent three subjects: The same color scheme is used for panels C and D). In this data set, all methods give broadly equivalent results and cluster the samples by individual, not by sample location (stool or individual sites along the distal gut mucosa). (C) Effect of reducing the number of sequences per sample on the UniFrac clustering, comparing the results obtained using all sequences to results obtained using a random sample of sequences. (Right panel) Clustering is still good, as measured by the consistency of clustering together the samples from the same individual as in panel B, at 25 sequences per sample, although there is more scatter as the number of sequences per sample decreases. (D) Effect of the different regions on clustering with UniFrac using either (top row) all sequences or (bottom row) 25 sequences/sample. For this analysis, we take each full-length sequence, computationally clip out the part of the sequence corresponding to each region to simulate 454 data, and repeat the analysis: The analysis thus includes the effect of the region sequenced, but not the effect of primer bias that may differentially amplify specific taxa. Again, we expect the samples from each individual to cluster together, and a mixture of samples from different individuals indicates poor performance. V6 is especially affected at low sample coverage, and V2 is especially unaffected. (E) Effect of different clustering measures, indicated on each panel, on the data set from Ley et al. (2008a), showing only the (yellow) vertebrate gut and (red) free-living samples. This data set is very heterogeneous and includes many samples with low numbers of sequences per sample or where nonoverlapping regions of the 16S rRNA were chosen for sequencing. In this data set, UniFrac, which is a phylogenetic metric, performs very well, separating the samples into two groups; in contrast, the other three methods, which are all taxon based, perform poorly with obvious clustering artifacts such as spikes leading off at right angles from one another, and fail to separate the two types of samples into two discrete clusters. Note that this figure is not based on the Arb parsimony insertion tree used in Ley et al. (2008a) but rather on a tree constructed de novo from the NAST-aligned sequences using Clearcut (Sheneman et al. 2006). The artifacts in the taxon-based methods are due to lack of overlap at the species level among different kinds of samples. An exploration of primer effects in a subset of these data shows that sample type is more important than region sequenced or length of amplicon (Liu et al. 2007).

Figure 5.

Figure 5.

Different methods for selecting OTUs produce different results. (A,B) If sequences are arranged so that each sequence has a neighbor within the OTU threshold (e.g., 97%) but these neighbors are not similar to one another, that is, the variation is not in the same direction, (A) the nearest-neighbor algorithm will produce one OTU because every sequence is connected to every other sequence through a chain of neighbors within threshold, but (B) the furthest-neighbor algorithm will produce a series of OTUs where all members of each OTU are similar to one another. (OTU boundaries are indicated by dashed lines.) Note that the precise OTUs produced by the furthest-neighbor algorithm will vary every time owing to the choice of the randomly chosen seed sequence for each (gray) OTU. (C,D) If sequences are arranged so that there is one outlier that is within threshold of only one of the other sequences, (C) the nearest-neighbor algorithm will produce a single OTU, (D) but the furthest-neighbor algorithm will produce two OTUs in which one of the two most distant sequences is excluded from the main OTU at random. (E,F) If sequences are arranged so that all sequences are within threshold of a central sequence but are outside threshold from each other, (E) the nearest-neighbor algorithm will again produce one OTU, (F) but the furthest-neighbor algorithm will group one sequence with the central sequence at random and break the other sequences into their own OTUs.

References

    1. Aas JA, Paster BJ, Stokes LN, Olsen I, Dewhirst FE. Defining the normal bacterial flora of the oral cavity. J Clin Microbiol. 2005;43:5721–5732. - PMC - PubMed
    1. Anderson IC, Cairney JW. Diversity and ecology of soil fungal communities: Increased understanding through the application of molecular techniques. Environ Microbiol. 2004;6:769–779. - PubMed
    1. Andersson AF, Lindberg M, Jakobsson H, Backhed F, Nyren P, Engstrand L. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS One. 2008;3:e2836. doi: 10.1371/journal.pone.0002836. - DOI - PMC - PubMed
    1. Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ. New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras. Appl Environ Microbiol. 2006;72:5734–5741. - PMC - PubMed
    1. Baker GC, Smith JJ, Cowan DA. Review and re-analysis of domain-specific 16S primers. J Microbiol Methods. 2003;55:541–555. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources