Differential abundance analysis for microbial marker-gene surveys - PubMed (original) (raw)

Differential abundance analysis for microbial marker-gene surveys

Joseph N Paulson et al. Nat Methods. 2013 Dec.

Abstract

We introduce a methodology to assess differential abundance in sparse high-throughput microbial marker-gene survey data. Our approach, implemented in the metagenomeSeq Bioconductor package, relies on a novel normalization technique and a statistical model that accounts for undersampling-a common feature of large-scale marker-gene studies. Using simulated data and several published microbiota data sets, we show that metagenomeSeq outperforms the tools currently used in this field.

PubMed Disclaimer

Conflict of interest statement

Competing financial interests

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1. Clustering analysis is improved substantially by CSS normalization

We plot the first two principal coordinates in a multi-dimensional scaling analysis of mouse stool data normalized by (A) CSS, (B) DESeq size factors, (C) trimmed mean of M-values, and (D) total-sum. Colors indicate clinical phenotype (diet). CSS normalization data successfully separates samples by diet while controlling within-group variability. (E) Class posterior probability log-ratio for Western diet obtained from linear discriminant analysis (LDA). Each box corresponds to the distribution of leave-one-out posterior probability of assignment to the “Western” cluster across normalization methods (whiskers indicate 1.5 times inter-quartile range). Samples were best distinguished by phenotypic similarity using CSS normalization.

Figure 2

Figure 2. Simulation results indicate that metagenomeSeq has greater sensitivity and specificity in a variety of settings

We use area under the receiver operating characteristic curve (AUC) to compare Metastats, Xipe, Kruskal-Wallis test as used in Lefse, a non-zero inflated log-normal model, edgeR and DESeq. (A) AUC as dataset sparsity decreases. MetagenomeSeq achieves larger AUC values than any other method in datasets with high sparsity (vertical dashed line represents the least sparse metagenomic dataset). (B) AUC as the effect-size between two conditions increases. Both metagenomeSeq and Lefse are better at detecting features with small effect size. (C) AUC as the variability in depth of sequencing increases. MetagenomeSeq and Kruskal-Wallis are robust to high variability in sequencing depth. (D) AUC as average sequencing depth increases. All models (except the non-zero inflated log-normal model and XIPE) perform similarly well at sufficient depth of coverage.

Comment in

Similar articles

Cited by

References

    1. Morgan XC, et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012;13:R79. - PMC - PubMed
    1. Ravel J, et al. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A. 2011;108 (Suppl 1):4680–4687. - PMC - PubMed
    1. Larsen N, et al. Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults. PLoS One. 2010;5:e9085. - PMC - PubMed
    1. Kåhrström CT. Microbiome: Gut microbiome as a marker for diabetes. Nature Reviews Microbiology. 2012;10
    1. Harris JK, Wagner BD. Bacterial identification and analytic challenges in clinical microbiome studies. J Allergy Clin Immunol. 2012;129:441–442. - PubMed

MeSH terms

Substances

LinkOut - more resources