Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer - PubMed (original) (raw)

Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer

Guillaume Bernard et al. Sci Rep. 2016.

Abstract

Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1. Alignment-free methods classification.

Classification of alignment-free methods, modified following Haubold. LZ: Lempel-Ziv.

Figure 2

Figure 2. Accuracy of AF methods based on m.

RF distances are shown for (a) word-count methods and (b) match-length methods at m = 0.1, 0.5 and 0.9. Error bars indicate standard deviation from the mean across 50 replicates.

Figure 3

Figure 3. Accuracy of AF methods based on l.

RF distances are shown for (a) word-count methods and (b) match-length methods at l = 0, 5, 25, 125, 250 and 500. Error bars indicate standard deviation from the mean across 50 replicates.

Figure 4

Figure 4. Accuracy of AF methods based on d.

RF distances are shown for (a) word-count methods and (b) match-length methods at d = 200, 1000, 3000 and 5000. Error bars indicate standard deviation from the mean across 50 replicates.

Figure 5

Figure 5. Accuracy of AF method based on r.

RF distances are shown for (a) word-count methods and (b) match-length methods at r = 0.00, 0.01, 0.10 and 1.00. Error bars indicate standard deviation from the mean across 50 replicates.

Figure 6

Figure 6. AF phylogeny of 143 prokaryote genomes.

Phylogenetic tree of 143 prokaryote genomes using formula image at k = 24, supported by JK values. The 15 phylum-level backbone nodes of Beiko et al. are marked with solid circles.

Figure 7

Figure 7. Phylogenetic trees of 27 E. coli and Shigella species.

(a) Tree generated using co-phylog at K = 8, supported by JK values. (b) Tree generated using formula image at k = 26, supported by JK values. (c) MRP tree constructed from 5282 Bayesian protein trees. Taxa labeled with an asterisk in each AF tree (a,b) are positioned differently in comparison to the reference (c). ECOR groups and Shigella (S) are indicated.

Figure 8

Figure 8. Phylogenetic trees of 8 Yersinia genomes.

(a) Tree generated using formula image at k = 7, supported by JK values. (b) Tree generated using formula image at k = 9, supported by JK values.

Similar articles

Cited by

References

    1. Tong S. Y. et al.. Genome sequencing defines phylogeny and spread of methicillin-resistant Staphylococcus aureus in a high transmission setting. Genome Res 25, 111–118, 10.1101/gr.174730.114 (2015). - DOI - PMC - PubMed
    1. Dunn C. W. et al.. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452, 745–749, 10.1038/nature06614 (2008). - DOI - PubMed
    1. Skippington E. & Ragan M. A. Within-species lateral genetic transfer and the evolution of transcriptional regulation in Escherichia coli and Shigella. BMC Genomics 12, 532, 10.1186/1471-2164-12-532 (2011). - DOI - PMC - PubMed
    1. Jarvis E. D. et al.. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331, 10.1126/science.1253451 (2014). - DOI - PMC - PubMed
    1. Darling A. E., Miklós I. & Ragan M. A. Dynamics of genome rearrangement in bacterial populations. PLoS Genet 4, e1000128, 10.1371/journal.pgen.1000128 (2008). - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources