Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities - PubMed (original) (raw)

Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities

Catherine A Lozupone et al. Appl Environ Microbiol. 2007 Mar.

Abstract

The assessment of microbial diversity and distribution is a major concern in environmental microbiology. There are two general approaches for measuring community diversity: quantitative measures, which use the abundance of each taxon, and qualitative measures, which use only the presence/absence of data. Quantitative measures are ideally suited to revealing community differences that are due to changes in relative taxon abundance (e.g., when a particular set of taxa flourish because a limiting nutrient source becomes abundant). Qualitative measures are most informative when communities differ primarily by what can live in them (e.g., at high temperatures), in part because abundance information can obscure significant patterns of variation in which taxa are present. We illustrate these principles using two 16S rRNA-based surveys of microbial populations and two phylogenetic measures of community beta diversity: unweighted UniFrac, a qualitative measure, and weighted UniFrac, a new quantitative measure, which we have added to the UniFrac website (http://bmf.colorado.edu/unifrac). These studies considered the relative influences of mineral chemistry, temperature, and geography on microbial community composition in acidic thermal springs in Yellowstone National Park and the influences of obesity and kinship on microbial community composition in the mouse gut. We show that applying qualitative and quantitative measures to the same data set can lead to dramatically different conclusions about the main factors that structure microbial diversity and can provide insight into the nature of community differences. We also demonstrate that both weighted and unweighted UniFrac measurements are robust to the methods used to build the underlying phylogeny.

PubMed Disclaimer

Figures

FIG. 1.

FIG. 1.

Calculation of the unweighted and the weighted UniFrac measures. Squares and circles represent sequences from two different environments. (a) In unweighted UniFrac, the distance between the circle and square communities is calculated as the fraction of the branch length that has descendants from either the square or the circle environment (black) but not both (gray). (b) In weighted UniFrac, branch lengths are weighted by the relative abundance of sequences in the square and circle communities; square sequences are weighted twice as much as circle sequences because there are twice as many total circle sequences in the data set. The width of branches is proportional to the degree to which each branch is weighted in the calculations, and gray branches have no weight. Branches 1 and 2 have heavy weights since the descendants are biased toward the square and circles, respectively. Branch 3 contributes no value since it has an equal contribution from circle and square sequences after normalization.

FIG. 2.

FIG. 2.

PCoA analysis of hot spring sediment samples with FST and unweighted, weighted, and normalized weighted UniFrac using a variety of trees. Shown is a plot of the first two principal coordinate axes (factors) for PCoA using each tree-building method and a UniFrac algorithm. Rows show the effects of different tree-building methods; columns show the effects of applying unweighted UniFrac (first column), weighted UniFrac (second column), and weighted UniFrac with the branch length normalization (third column). (a) The legend describes which symbol applies to which sample. Fe-containing springs have solid symbols; springs that contain only S have hollow symbols. Temperature (°C) is denoted by the shape of the symbol. (b) PCoA clustering using FST values as distances. (c through e) Neighbor-joining tree from NEIGHBOR. (f through h) and (i through k) Two representative parsimony trees from DNAPARS. (l through n) ARB parsimony insertion tree. (o through q) RAxML maximum likelihood tree. (r through t) RAxML parsimony guide tree, no branch lengths. (u through w) MrBayes consensus tree.

FIG. 2.

FIG. 2.

PCoA analysis of hot spring sediment samples with FST and unweighted, weighted, and normalized weighted UniFrac using a variety of trees. Shown is a plot of the first two principal coordinate axes (factors) for PCoA using each tree-building method and a UniFrac algorithm. Rows show the effects of different tree-building methods; columns show the effects of applying unweighted UniFrac (first column), weighted UniFrac (second column), and weighted UniFrac with the branch length normalization (third column). (a) The legend describes which symbol applies to which sample. Fe-containing springs have solid symbols; springs that contain only S have hollow symbols. Temperature (°C) is denoted by the shape of the symbol. (b) PCoA clustering using FST values as distances. (c through e) Neighbor-joining tree from NEIGHBOR. (f through h) and (i through k) Two representative parsimony trees from DNAPARS. (l through n) ARB parsimony insertion tree. (o through q) RAxML maximum likelihood tree. (r through t) RAxML parsimony guide tree, no branch lengths. (u through w) MrBayes consensus tree.

FIG. 3.

FIG. 3.

Jackknifing of PCoA analysis of hot spring sediment samples with unweighted and weighted UniFrac. Shown is a plot of the first two principal coordinate axes (factors) for PCoA with the neighbor-joining tree. Point locations are the average location in the 100 jackknife replicates. Only 50 randomly selected sequences from each sample were used in each replicate (the range of sequences per sample was 65 to 96). Gray ellipses represent the IQR for the 100 jackknife replicates. The 95% confidence intervals for the point locations were also calculated and were considerably smaller than the IQRs (data not shown). The symbols are the same as those shown in Fig. 2.

FIG. 4.

FIG. 4.

Hierarchical clustering of hot spring sediment samples with weighted and unweighted UniFrac. The percentage support for nodes supported at least 70% of the time with sequence jackknifing is indicated. The name of each sample indicates the spring (e.g., A1, A2, and A3 are different springs from the Amphitheatre Springs area, and RM is from the Roaring Mountain area), whether the sample is sulfur rich (S), iron rich (Fe), or both (FeS), and the temperature. The names and branches are colored black for S samples and gray for Fe and FeS samples. (a) Weighted UniFrac with the neighbor-joining tree and (b) unweighted UniFrac with the neighbor-joining tree.

FIG. 5.

FIG. 5.

Analysis of mouse cecal microbial communities with weighted and unweighted UniFrac. Genotypes are ob/ob for homozygotes for the mutant leptin allele that confers obesity, ob/+ for heterozygotes, and +/+ for wild types. All mothers are ob/+. (a) Plot of the first two principal coordinate axes for PCoA with unweighted UniFrac. Symbols represent individual animals. The rectangles highlight the family of mother 2 and the families of mothers 1 and 3, who are sisters. (b) The same plot for weighted Unifrac. The rectangle highlights the majority of the ob/ob mice. The arrows point to outliers: an ob/ob mouse outside of the ob/ob cluster (black triangle) and an ob/+ mouse inside the ob/ob cluster (white square). (c) Same plot for sequence jackknifing of unweighted UniFrac with a maximum of 200 sequences from each mouse for 100 replicates. The symbols are the average values for the 100 replicates, and the gray ellipses represent the IQR of the point locations. (d) Sequence jackknifing with weighted UniFrac with a maximum of 200 sequences from each mouse for 100 replicates. (e) Hierarchical cluster diagram for unweighted UniFrac. The percentage support for nodes supported at least 70% of the time with sequence jackknifing is indicated. The main clustering is by mother. (f) Hierarchical cluster diagram for weighted UniFrac. The clustering by mother is much less clear, and there is more clustering by ob/ob genotype (and hence by obesity phenotype).

Similar articles

Cited by

References

    1. Badano, E. I., and L. A. Cavieres. 2006. Impacts of ecosystem engineers on community attributes: effects of cushion plants at different elevations of the Chilean Andes. Divers. Distrib. 12:388-396.
    1. Bluis, J., and D. Shin. 2003. Nodal distance algorithm: calculating a phylogenetic tree comparison metric, p. 87-94. In Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering. IEEE, Los Alamitos, CA.
    1. Breitbart, M., P. Salamon, B. Andresen, J. M. Mahaffy, A. M. Segall, D. Mead, F. Azam, and F. Rohwer. 2002. Genomic analysis of uncultured marine viral communities. Proc. Natl. Acad. Sci. USA 99:14250-14255. - PMC - PubMed
    1. De Benedictis, P. A. 1973. On the correlations between certain diversity indices. Am. Nat. 107:295-302.
    1. Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates, Inc., Sunderland, MA.

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources