metaSNV: A tool for metagenomic strain level analysis - PubMed (original) (raw)

metaSNV: A tool for metagenomic strain level analysis

Paul Igor Costea et al. PLoS One. 2017.

Abstract

We present metaSNV, a tool for single nucleotide variant (SNV) analysis in metagenomic samples, capable of comparing populations of thousands of bacterial and archaeal species. The tool uses as input nucleotide sequence alignments to reference genomes in standard SAM/BAM format, performs SNV calling for individual samples and across the whole data set, and generates various statistics for individual species including allele frequencies and nucleotide diversity per sample as well as distances and fixation indices across samples. Using published data from 676 metagenomic samples of different sites in the oral cavity, we show that the results of metaSNV are comparable to those of MIDAS, an alternative implementation for metagenomic SNV analysis, while data processing is faster and has a smaller storage footprint. Moreover, we implement a set of distance measures that allow the comparison of genomic variation across metagenomic samples and delineate sample-specific variants to enable the tracking of specific strain populations over time. The implementation of metaSNV is available at: http://metasnv.embl.de/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1

Fig 1. Overview of analysis pipeline and example results.

(A) shows the SNV calling and analysis workflow, consisting of an optional pre-processing step, which splits the computation load into subsets of similar size based on the genome coverage, the main SNV calling step and further post-processing of the raw output, which can be tailored according to the aim of the analysis. (B) shows the Principal Coordinate Analysis projection of a pairwise distance between oral samples, based on population SNVs, which clearly separates strain populations in tongue dorsum samples from those in supra-gingival plaque samples. (C) shows the tracking of the individual SNV frequencies within an individual over a period of 384 days. Each line represents one variant position and the respective colour encodes the amount by which the allele frequency of that position changed over time; red represents stable variants that maintain their frequency while in blue are positions which dramatically change their frequency in the population. Only a small number of positions vary over the measured period, with most remaining at approximately the same population frequency, suggesting great stability of strain populations within the individual.

Fig 2

Fig 2. Comparison of metaSNV and MIDAS results.

Correlation coefficient (R2, mantel) for the pairwise distance matrices generated by MIDAS and metaSNV (top). Compared are only sample intersects for species examined with both methods. Jaccard indices for the sample overlap per species was computed (bottom). The average sample number and average Jaccard index over all samples intersect is shown in the legend.

References

    1. Schloissnig S, Arumugam M, Sunagawa S, Mitreva M, Tap J, Zhu A, et al. Genomic variation landscape of the human gut microbiome. Nature. Nature Publishing Group; 2013;493: 45–50. doi: 10.1038/nature11711 - DOI - PMC - PubMed
    1. Zhu A, Sunagawa S, Mende DR, Bork P. Inter-individual differences in the gene content of human gut bacterial species. Genome Biol. 2015;16: 82 doi: 10.1186/s13059-015-0646-9 - DOI - PMC - PubMed
    1. Scholz M, Ward D V, Pasolli E, Tolio T, Zolfo M, Asnicar F, et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods. Nature Research; 2016;13: 435–438. doi: 10.1038/nmeth.3802 - DOI - PubMed
    1. Nayfach S, Rodriguez-Mueller B, Garud N, Pollard KS. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. Cold Spring Harbor Laboratory Press; 2016; doi: 10.1101/gr.201863.115 - DOI - PMC - PubMed
    1. Luo C, Knight R, Siljander H, Knip M, Xavier RJ, Gevers D. ConStrains identifies microbial strains in metagenomic datasets. Nat Biotechnol. 2015;33: 1045–52. doi: 10.1038/nbt.3319 - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources