Phylogenomics provides robust support for a two-domains tree of life - PubMed (original) (raw)

Phylogenomics provides robust support for a two-domains tree of life

Tom A Williams et al. Nat Ecol Evol. 2020 Jan.

Erratum in

Abstract

Hypotheses about the origin of eukaryotic cells are classically framed within the context of a universal 'tree of life' based on conserved core genes. Vigorous ongoing debate about eukaryote origins is based on assertions that the topology of the tree of life depends on the taxa included and the choice and quality of genomic data analysed. Here we have reanalysed the evidence underpinning those claims and apply more data to the question by using supertree and coalescent methods to interrogate >3,000 gene families in archaea and eukaryotes. We find that eukaryotes consistently originate from within the archaea in a two-domains tree when due consideration is given to the fit between model and data. Our analyses support a close relationship between eukaryotes and Asgard archaea and identify the Heimdallarchaeota as the current best candidate for the closest archaeal relatives of the eukaryotic nuclear lineage.

PubMed Disclaimer

Conflict of interest statement

Competing interests statement: The authors declare they have no competing interests.

Figures

Figure 1

Figure 1. The 35-gene matrix of Da Cunha et al. favours a two-domains tree using the best-fitting models in both maximum likelihood and Bayesian analyses.

The eukaryotes (green) group with the sampled Asgard archaea (orange) with maximum posterior support. Bacteria are in grey, TACK Archaea in yellow, Euryarchaeota in blue. This is a consensus tree inferred under the CAT+GTR+G4 model in PhyloBayes-MPI; branch lengths are proportional to the expected number of substitutions per site, as indicated by the scale bar. A 2D topology was obtained under a variety of other models in ML analyses (LG+G4+F, LG+PMSF+G4, LG+C60+G4+F; Supplementary Figure 1), and also with 4-state Susko-Roger recoding under the CAT+GTR+G4 and NDCH2 models (Supplementary Figure 2).

Figure 2

Figure 2. Evidence that the three-domains tree is an artifact of long branch attraction.

(a) Da Cunha et al. analysed a dataset of 35 core protein-coding genes under the LG+G4+F model and obtained a 3D tree; the better-fitting (Supplementary Table 4) CAT+GTR+G4 model recovers a 2D tree. Bootstrap support (a) and Bayesian posterior probability (b) are indicated for the key nodes defining the 3D and 2D trees. “Asgard” refers to a clade of Heimdallarchaeota and Lokiarchaeum. Plotting these trees to the same scale (in terms of substitutions per site) illustrates major differences in these analyses. The 3D/LG+G4+F analysis suggests that, on average, 30.77 changes have taken place per site; the two-domains/CAT+GTR+G4 analysis suggests that 47.4 changes per site have occurred. This difference amounts to ~128,511 additional substitutions in total inferred under the CAT+GTR+G4 model. (b) Posterior predictive tests indicate that CAT+GTR+G4 performs significantly better than LG+G4+F in capturing the site-specific evolutionary constraints reflected by lower biochemical diversity approaching that of the empirical data). This results in more realistic estimates of substitutional saturation and convergence found in the data. The longest branches on both the 3D and 2D tree are the stems leading to the bacteria and eukaryotes (in blue and green, respectively). CAT+GTR+G4 identifies many more convergent substitutions on these branches than does LG+G4+F, as can be seen by comparing the branch lengths in (a). This failure to detect convergent substitutions under LG+G4+F has the effect of drawing the bacterial and eukaryotic branches together, because convergences are mistaken for homologies (synapomorphies), resulting in a 3D tree.

Figure 3

Figure 3. An expanded sampling of microbial diversity supports a two-domains tree.

(a) Bayesian phylogeny of 21 concatenated proteins conserved across bacteria, archaea and eukaryotes under the CAT+GTR+G4 model, rooted on the branch separating bacteria and archaea. Eukaryotes group with Asgard archaea with maximum posterior support. (b) Bayesian phylogeny of 43 genes conserved between Archaea and eukaryotes under CAT+GTR+G4. Eukaryotes group with, or within, Heimdallarchaeota. All support values are Bayesian posterior probabilities, and branch lengths are proportional to the expected number of substitutions per site, as indicated by the scale bars. The Euryarchaeota are paraphyletic in the consensus tree in (a), consistent with some recent analyses using bacterial outgroups,, although the relevant support values are low and the analysis does not robustly exclude the alternative hypothesis of a monophyletic Euryarchaeota. The tree in (b) is formally unrooted because it does not include a bacterial outgroup. Based on (a) and published analyses,, the root may lie between the Euryarchaeota and the other taxa, or within the Euryarchaeota. Amino acid data were recoded using the 4-state scheme of Susko and Roger, which our posterior predictive simulations (Supplementary Table 7) suggest improved model fit by ameliorating substitutional saturation and compositional heterogeneity; phylogenies inferred on the original amino acid data are provided in Supplementary Figure 7.

Comment in

Similar articles

Cited by

References

    1. Embley TM, Martin W. Eukaryotic evolution, changes and challenges. Nature. 2006;440:623–630. - PubMed
    1. Martin WF, Garg S, Zimorski V. Endosymbiotic theories for eukaryote origin. Philos Trans R Soc Lond B Biol Sci. 2015;370 - PMC - PubMed
    1. Roger AJ, Muñoz-Gómez SA, Kamikawa R. The Origin and Diversification of Mitochondria. Curr Biol. 2017;27:R1177–R1192. - PubMed
    1. Martijn J, Ettema TJG. From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochem Soc Trans. 2013;41:451–457. - PubMed
    1. Williams Ta, Foster PG, Cox CJ, Embley TM. An archaeal origin of eukaryotes supports only two primary domains of life. Nature. 2013;504:231–236. - PubMed

Publication types

MeSH terms

LinkOut - more resources