Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries - PubMed (original) (raw)

Consistent Metagenome-Derived Metrics Verify and Delineate Bacterial Species Boundaries

Matthew R Olm et al. mSystems. 2020.

Abstract

Longstanding questions relate to the existence of naturally distinct bacterial species and genetic approaches to distinguish them. Bacterial genomes in public databases form distinct groups, but these databases are subject to isolation and deposition biases. To avoid these biases, we compared 5,203 bacterial genomes from 1,457 environmental metagenomic samples to test for distinct clouds of diversity and evaluated metrics that could be used to define the species boundary. Bacterial genomes from the human gut, soil, and the ocean all exhibited gaps in whole-genome average nucleotide identities (ANI) near the previously suggested species threshold of 95% ANI. While genome-wide ratios of nonsynonymous and synonymous nucleotide differences (dN/dS) decrease until ANI values approach ∼98%, two methods for estimating homologous recombination approached zero at ∼95% ANI, supporting breakdown of recombination due to sequence divergence as a species-forming force. We evaluated 107 genome-based metrics for their ability to distinguish species when full genomes are not recovered. Full-length 16S rRNA genes were least useful, in part because they were underrecovered from metagenomes. However, many ribosomal proteins displayed both high metagenomic recoverability and species discrimination power. Taken together, our results verify the existence of sequence-discrete microbial species in metagenome-derived genomes and highlight the usefulness of ribosomal genes for gene-level species discrimination.IMPORTANCE There is controversy about whether bacterial diversity is clustered into distinct species groups or exists as a continuum. To address this issue, we analyzed bacterial genome databases and reports from several previous large-scale environment studies and identified clear discrete groups of species-level bacterial diversity in all cases. Genetic analysis further revealed that quasi-sexual reproduction via horizontal gene transfer is likely a key evolutionary force that maintains bacterial species integrity. We next benchmarked over 100 metrics to distinguish these bacterial species from each other and identified several genes encoding ribosomal proteins with high species discrimination power. Overall, the results from this study provide best practices for bacterial species delineation based on genome content and insight into the nature of bacterial species population genetics.

Keywords: bacterial species; bioinformatics; metagenomics; microbial genetics; species.

Copyright © 2020 Olm et al.

PubMed Disclaimer

Figures

FIG 1

FIG 1

Average nucleotide identity gaps exist near ∼95% ANI in all tested genome sets. Each plot is a histogram of average nucleotide identity and genome alignment percentage values resulting from pairwise comparison within a genome set. Higher-intensity colors represent a higher density of comparisons with that particular ANI and genome alignment percentage. The top row contains data from three sets of metagenome-assembled genomes (MAGs) from different environments. The bottom row displays data from NCBI RefSeq (rarefied to reduce taxonomic bias; see Materials and Methods), RefSeq with only comparisons between genomes annotated as the same species included, and RefSeq with only comparisons between genomes annotated as different species included.

FIG 2

FIG 2

Metrics of recombination and selection follow patterns related to the proposed 95% ANI species threshold. Each plot displays a histogram of ANI values resulting from pairwise comparison within a genome set (light gray bars), the median dN/dS ratio at each ANI level (orange line), and the median estimated recombination rate at each ANI level determined using two criteria, namely, the percentage of enriched identical genes (purple line; see Materials and Methods for details) and length bias (green line), as measured using the program PopCOGent. A dotted line is drawn at 95% ANI to mark the commonly proposed threshold for species delineation, and 95% confidence intervals are shown shaded around orange, green, and purple lines. Color coding corresponds to _y_-axis labels.

FIG 3

FIG 3

Whole-genome alignment outperforms analysis of marker genes for species discrimination. (a) Histograms of ANI values from comparisons between bacteria from RefSeq annotated as belonging to the same species (green) or different species (red). Each row represents a different method of nucleotide sequence alignment, and vertical black lines indicate the ANI value with the highest F1 score for the corresponding method. (b) Comparison of optimal species discrimination threshold to F1 score for reconstruction of species-level clusters from RefSeq. Whole-genome comparison algorithms, a 16S rRNA alignment, and single-copy gene alignments were tested. (c) Accuracy of marker genes for reconstruction of species clusters based on 95% ANI whole-genome alignments of genomes from metagenomes (dots; left y axis) and recoverability of maker genes from metagenomic data from different environments (lines; right y axis). A horizontal dotted line marks a recoverability level of 1, meaning equal numbers of marker genes and genomes were assembled from the environment.

References

    1. Cohan FM. 2002. What are bacterial species? Annu Rev Microbiol 56:457–487. doi:10.1146/annurev.micro.56.012302.160634. - DOI - PubMed
    1. Cohan FM. 2019. Systematics: the cohesive nature of bacterial species taxa. Curr Biol 29:R169–R172. doi:10.1016/j.cub.2019.01.033. - DOI - PubMed
    1. Shapiro BJ, Polz MF. 2015. Microbial speciation. Cold Spring Harb Perspect Biol 7:a018143. doi:10.1101/cshperspect.a018143. - DOI - PMC - PubMed
    1. Caro-Quintero A, Konstantinidis KT. 2012. Bacterial species may exist, metagenomics reveal. Environ Microbiol 14:347–355. doi:10.1111/j.1462-2920.2011.02668.x. - DOI - PubMed
    1. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37–43. doi:10.1038/nature02340. - DOI - PubMed

LinkOut - more resources