Integrative modeling of gene and genome evolution roots the archaeal tree of life - PubMed (original) (raw)

Integrative modeling of gene and genome evolution roots the archaeal tree of life

Tom A Williams et al. Proc Natl Acad Sci U S A. 2017.

Abstract

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood-Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.

Keywords: Archaea; evolution; phylogenetics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

A rooted tree of the Archaea. This rooted phylogeny summarizes inferences from analyses of a concatenation of 45 protein-coding genes under CAT+GTR, an MRP supertree of 3,242 single-copy, lineage-specific archaeal gene families, and DTL modeling of archaeal gene family evolution using the ALE method. The concatenation and supertree analyses recovered the same unrooted topology for all but the Thermococcales, which grouped at the base of the TACK/Lokiarchaeum clade in the concatenated protein analysis but at the base of the Euryarchaeota in the supertree. We obtained significantly better likelihoods for a Thermococcales+Euryarchaeota clade from DTL modeling, and that is the topology depicted here. Support values are Bayesian PPs from the CAT+GTR+Dayhoff analysis, and branch lengths are expected numbers of substitutions per site under CAT+GTR+Dayhoff. The tree is rooted according to the ML root position obtained in the DTL analysis, as discussed in the text.

Fig. 2.

Fig. 2.

Using gene DTL to root the species tree. Different roots (denoted by asterisk) on the species tree imply different scenarios of gene family evolution, and thus lead to different gene family likelihoods under the probabilistic gene tree-species tree reconciliation model implemented in ALE (67); here we provide a simple illustration of the approach. (A) The evolutionary history of a gene family present in two copies in species C and D, but only a single copy in A and B. Solid lines indicate the branches of the inferred gene tree, and red highlights represent discord with the species tree. The number of gene transfers needed to explain this gene tree depends on the root of the species tree. (B and C) A root between species AB and CD would require one transfer (B), but a root between ABC and D would require three transfers (C), providing some support for the root depicted in B. Other reconciliations (e.g., gene duplications above the root followed by a series of losses) are also possible; ALE integrates over these possibilities to calculate a likelihood for each gene family under each root position. Rooting hypotheses can then be statistically distinguished from one another based on these likelihoods.

Fig. 3.

Fig. 3.

An ML reconstruction of archaeal gene family evolution. We used the DTL model implemented in ALE (67) to perform gene tree-species tree reconciliation using the rooted tree shown in Fig. 1 and the set of homologous gene families from our sample of archaeal genomes. The diameters of the circles at each node are proportional to inferred gene content (1,090 gene families at the root) and number of originations, or new genes. Branch colors denote number of gene losses, and the areas of the bars above and below each branch correspond to numbers of gene duplications and HGTs. The complete data underlying this figure are provided in SI Appendix, Table S8. The analysis was performed with (A) and without (B) the inclusion of the DPANN lineages. In contrast to scenarios in which a complex archaeal common ancestor gave rise to modern lineages by streamlining (89), fitted DTL models imply a common ancestor whose genome was moderately smaller than modern lineages, with an ongoing process of genome expansion via gene duplications, de novo gene origination, and HGTs throughout archaeal evolution. The Haloarchaeota (green) and the Thaumarchaeota (blue) are the two stem lineages that have experienced the greatest number of gene acquisitions, whether by de novo innovation or by HGT.

Fig. 4.

Fig. 4.

Inference of ancestral archaeal metabolisms under the DTL model. The reconstruction is based on genes that could be mapped with P > 0.5 to a series of key nodes on the archaeal tree under the ML reconstruction of gene family evolution displayed in Fig. 3. The presence of a gene at a node is indicated by the symbols shown in the key, and partially filled symbols indicate that only some of the subunits composing a particular enzyme were present. Owing to the occasional extinction of gene families during evolution, as well as the increased uncertainty associated with DTL scenarios in the early regions of the tree, reconstructions of gene content at deeper nodes are increasingly incomplete. Nonetheless, the reconstruction supports the proposal that the ancestral archaeon was an anaerobe that encoded a subunit (cdhC) of CO dehydrogenase/acetyl-CoA synthase, the key enzyme of the Wood–Ljungdahl pathway. Aerobic metabolisms evolved later and independently in several different archaeal lineages, perhaps associated with the rise in atmospheric oxygen that began 2.5–2.3 Gya (82). Eury, Euryarchaeota including Thermococcales; Eury w/o Thermococcales, Euryarchaeota without Thermococcales; TACKL, TACK and Lokiarchaeum; B, nuoB/Ni Fe-hydrogenase III small subunit/coenzyme F420-reducing hydrogenase, gamma subunit; D, nuoD/Ni Fe-hydrogenase III large subunit and subunit G/coenzyme F420-reducing hydrogenase, alpha subunit; FpoFm coenzyme F420-reducing hydrogenase, beta subunit. aThe bifunctional fructose-1 6-bisphosphate aldolase/phosphatase FBPA/FBPase (arCOG04180) (98) was not predicted to be present in any of the ancestors. bPyruvate kinase is a glycolytic enzyme only. cA tetrameric protein complex with α, δ, β, and γ subunits, which in Pyrococcus functions as both a sulfur reductase (α, δ) and a hydrogenase (β, γ) (99); the ancestral enzyme also might have been bifunctional.

Fig. 5.

Fig. 5.

Distributions of gene acquisition, duplication, and loss rates across the archaeal tree. We observed clear outliers for each distribution. The greatest number of outliers correspond to the branch leading to Lokiarchaeum (gene duplications) and to the branches leading to the Haloarchaea (gene acquisitions and gene losses) and Thaumarchaeota (gene acquisitions).

Similar articles

Cited by

References

    1. Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA. 1990;87:4576–4579. - PMC - PubMed
    1. Brochier-Armanet C, Boussau B, Gribaldo S, Forterre P. Mesophilic Crenarchaeota: Proposal for a third archaeal phylum, the Thaumarchaeota. Nat Rev Microbiol. 2008;6:245–252. - PubMed
    1. Pester M, Schleper C, Wagner M. The Thaumarchaeota: An emerging view of their phylogeny and ecophysiology. Curr Opin Microbiol. 2011;14:300–306. - PMC - PubMed
    1. Rinke C, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499:431–437. - PubMed
    1. Castelle CJ, et al. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr Biol. 2015;25:690–701. - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources