The Impact of Cross-Species Gene Flow on Species Tree Estimation (original) (raw)

Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow

National Science Review

Multispecies coalescent (MSC) is the extension of the single-population coalescent model to multiple species. It integrates the phylogenetic process of species divergences and the population genetic process of coalescent, and provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow, and species delimitation. In this review, we introduce the major features of the MSC model, discuss full-likelihood and heuristic methods of species tree estimation, and summarize recent methodological advances in inference of cross-species gene flow. We discuss the statistical and computational challenges in the field and research directions where breakthroughs may be likely in the next few years.

The Multi-species Coalescent Model and Species Tree Inference

2020

The multispecies coalescent (MSC) is an extension of the single-population coalescent model of population genetics to the case of multiple species. The MSC naturally accommodates speciation events (with subsequent genetic isolation between species) and the coalescent process within each species. It provides a framework for analysis of multilocus genomic sequence data from multiple species in a number of inference problems including species tree estimation, accounting for ancestral polymorphism and deep coalescence. Within this framework, the genealogical fluctuations across genes or genomic regions (and the gene tree/species tree conflicts that may result) are not seen as a problem but rather as a source of information for estimating important parameters such as species divergence times, ancestral population sizes, and the timings, directions, and intensities of cross-species introgression or hybridisation events. This chapter outlines the basic theory of the MSC and its important a...

Assessing approaches for inferring species trees from multi-copy genes

Systematic biology, 2015

With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both nonprobabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJst distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates. In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most a...

Full modeling versus summarizing gene-tree uncertainty: method choice and species-tree accuracy

Molecular Phylogenetics and Evolution, 2012

With the proliferation of species-tree methods, empiricists now have to confront the daunting task of method choice. Such decisions might be made based on theoretical considerations alone. However, the messiness of real data means that theoretical ideals may not hold in practice (e.g., with convergence of complicated MCMC algorithms and computational times that limit analyses to small data sets). On the other hand, simplifying assumptions made by some approaches may compromise the accuracy of species-tree estimates. Here we examine the purported tradeoff between accuracy and computational simplicity for species-tree analysis, focusing on the different ways the approaches treat gene-tree uncertainty. By considering a diversity of species trees, as well as different sampling designs and total sampling efforts, we not only compare the accuracy of species-tree estimates across methods, but we also partition the variation in accuracy across factors to identify their relative importance. This analysis shows that although the method of analysis affects accuracy, other factors – namely, the history of species divergence and aspects of the sampling design – have a larger impact. Despite a full modeling of gene tree uncertainty (e.g., using a Bayesian framework), species-tree estimates may not be accurate, particularly for recent diversification histories. Nevertheless, we demonstrate how factors within the control of the empirical investigator (e.g., decisions about sampling) improve the accuracy of species tree estimates, and more so than the method of analysis. Lastly, with much of the attention on species-tree analyses focused on the discord among loci arising from the coalescent, this work also highlights a previously overlooked key determinant of species-tree accuracy for recent divergences – the level of genetic variation at a locus, which has important implications for improving species-tree estimates in practice.► Method of analysis is not the primary determinant of the accuracy of species trees. ► Methods that fully model gene-tree uncertainty are not necessary when loci are informative. ► Limited genetic variation is a key factor determining species-tree accuracy. ► Modeling gene-tree uncertainty improves accuracy, but species trees may be inaccurate.