Origins of major archaeal clades correspond to gene acquisitions from bacteria (original) (raw)

References

  1. Doolittle, W. F. & Papke, R. T. Genomics and the bacterial species problem. Genome Biol. 7, 116 (2006)
    Article Google Scholar
  2. Retchless, A. C. & Lawrence, J. G. Temporal fragmentation of speciation in bacteria. Science 317, 1093–1096 (2007)
    Article ADS CAS Google Scholar
  3. Achtman, M. & Wagner, M. Microbial diversity and the genetic nature of microbial species. Nature Rev. Microbiol. 6, 431–440 (2008)
    Article CAS Google Scholar
  4. Fraser, C., Alm, E. J., Polz, M. F., Spratt, B. G. & Hanage, W. P. The bacterial species challenge: making sense of genetic and ecological diversity. Science 323, 741–746 (2009)
    Article ADS CAS Google Scholar
  5. Puigbò, P., Wolf, Y. I. & Koonin, E. V. The tree and net components of prokaryote genome evolution. Genome Biol. Evol. 2, 745–756 (2010)
    Article Google Scholar
  6. Dagan, T. Phylogenomic networks. Trends Microbiol. 19, 483–491 (2011)
    Article CAS Google Scholar
  7. Hess, W. R. Genome analysis of marine photosynthetic microbes and their global role. Curr. Opin. Biotechnol. 15, 191–198 (2004)
    Article CAS Google Scholar
  8. Kloesges, T. et al. Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths. Mol. Biol. Evol. 28, 1057–1074 (2011)
    Article CAS Google Scholar
  9. Williams, D., Gogarten, J. P. & Papke, R. T. Quantifying homologous replacement of loci between haloarchaeal species. Genome Biol. Evol. 4, 1223–1244 (2012)
    Article Google Scholar
  10. Woese, C. R. Bacterial evolution. Microbiol. Rev. 51, 221–271 (1987)
    CAS PubMed PubMed Central Google Scholar
  11. Rivera, M. C., Jain, R., Moore, J. E. & Lake, J. A. Genomic evidence for two functionally distinct gene classes. Proc. Natl Acad. Sci. USA 95, 6239–6244 (1998)
    Article ADS CAS Google Scholar
  12. Puigbò, P., Wolf, Y. I. & Koonin, E. V. Search for a tree of life in the thicket of the phylogenetic forest. J. Biol. 8, 59 (2009)
    Article Google Scholar
  13. Brochier-Armanet, C., Forterre, P. & Gribaldo, S. Phylogeny and evolution of the Archaea: one hundred genomes later. Curr. Opin. Microbiol. 14, 274–281 (2011)
    Article Google Scholar
  14. Lake, J. A. & Rivera, M. C. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol. Biol. Evol. 21, 681–690 (2004)
    Article CAS Google Scholar
  15. Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)
    Article CAS Google Scholar
  16. Wolf, Y. I., Makarova, K. S., Yutin, N. & Koonin, E. V. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer. Biol. Direct 7, 46 (2012)
    Article CAS Google Scholar
  17. Nelson-Sathi, S. et al. Acquisitions of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc. Natl Acad. Sci. USA 109, 20537–20542 (2012)
    Article ADS CAS Google Scholar
  18. Bräsen, C., Esser, D., Rauch, B. & Siebers, B. Carbohydrate metabolism in Archaea: current insights into unusual enzymes and pathways and their regulation. Microbiol. Mol. Biol. Rev. 78, 89–175 (2014)
    Article Google Scholar
  19. Siebers, B. & Schönheit, P. Unusual pathways and enzymes of central carbohydrate metabolism in Archaea. Curr. Opin. Microbiol. 8, 695–705 (2005)
    Article CAS Google Scholar
  20. Doolittle, W. F. & Bapteste, E. Pattern pluralism and the tree of life hypothesis. Proc. Natl Acad. Sci. USA 104, 2043–2049 (2007)
    Article ADS CAS Google Scholar
  21. Creevey, C. J. et al. Does a tree-like phylogeny only exist at the tips in the tree of prokaryotes? Proc. R. Soc. Lond. B 271, 2551–2558 (2004)
    Article CAS Google Scholar
  22. Deppenmeier, U. et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 4, 453–461 (2002)
    CAS Google Scholar
  23. Williams, T. A., Foster, G. F., Cox, C. Y. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013)
    Article ADS CAS Google Scholar
  24. McInerney, J. O., O’Connell, M. J. & Pisani, D. The hybrid nature of eukaryota and a consilient view of life on Earth. Nature Rev. Microbiol. 12, 449–455 (2014)
    Article CAS Google Scholar
  25. Wolf, Y. I. & Koonin, E. V. Genome reduction as the dominant mode of evolution. Bioessays 35, 829–837 (2013)
    Article Google Scholar
  26. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
    Article CAS Google Scholar
  27. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997)
    Article ADS CAS Google Scholar
  28. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000)
    Article CAS Google Scholar
  29. Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)
    Article Google Scholar
  30. Stamatakis, A., Ludwig, T. & Meier, H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–463 (2005)
    Article CAS Google Scholar

Download references

Acknowledgements

We gratefully acknowledge funding from European Research Council (ERC 232975 to W.F.M.), the graduate school E-Norm of the Heinrich-Heine University (W.F.M.), the DFG (Scho 316/11-1 to P.S.; SI 642/10-1 to B.S.), and BMBF (0316188A, B.S.). G.L. is supported by an ERC grant (281357 to Tal Dagan), D.B. thanks the Alexander von Humbold Foundation for a Fellowship. Computational support of the Zentrum für Informations- und Medientechnologie (ZIM) at the Heinrich-Heine University is gratefully acknowledged.

Author information

Authors and Affiliations

  1. Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany ,
    Shijulal Nelson-Sathi, Filipa L. Sousa, Mayo Roettger, Nabor Lozada-Chávez, Thorsten Thiergart & William F. Martin
  2. Mathematisches Institut, Heinrich-Heine University, 40225 Düsseldorf, Germany ,
    Arnold Janssen
  3. Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand,
    David Bryant
  4. Genomic Microbiology Group, Institute of Microbiology, Christian-Albrechts-Universität Kiel, 24118 Kiel, Germany ,
    Giddy Landan
  5. Institut für Allgemeine Mikrobiologie, Christian-Albrechts-Universität Kiel, 24118 Kiel, Germany ,
    Peter Schönheit
  6. Faculty of Chemistry, Biofilm Centre, Molecular Enzyme Technology and Biochemistry, University of Duisburg-Essen, 45117 Essen, Germany ,
    Bettina Siebers
  7. Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland,
    James O. McInerney
  8. Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal ,
    William F. Martin

Authors

  1. Shijulal Nelson-Sathi
    You can also search for this author inPubMed Google Scholar
  2. Filipa L. Sousa
    You can also search for this author inPubMed Google Scholar
  3. Mayo Roettger
    You can also search for this author inPubMed Google Scholar
  4. Nabor Lozada-Chávez
    You can also search for this author inPubMed Google Scholar
  5. Thorsten Thiergart
    You can also search for this author inPubMed Google Scholar
  6. Arnold Janssen
    You can also search for this author inPubMed Google Scholar
  7. David Bryant
    You can also search for this author inPubMed Google Scholar
  8. Giddy Landan
    You can also search for this author inPubMed Google Scholar
  9. Peter Schönheit
    You can also search for this author inPubMed Google Scholar
  10. Bettina Siebers
    You can also search for this author inPubMed Google Scholar
  11. James O. McInerney
    You can also search for this author inPubMed Google Scholar
  12. William F. Martin
    You can also search for this author inPubMed Google Scholar

Contributions

S.N.-S., F.L.S., M.R., N.L.-C. and T.T. performed bioinformatic analyses; A.J., D.B. and G.L. performed statistical analyses; P.S., B.S., J.O.M. and W.F.M. interpreted results; S.N.-S., F.L.S., G.L., J.O.M. and W.F.M. wrote the paper; S.N.-S., G.L. and W.F.M. designed the study. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence toWilliam F. Martin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Inter-domain gene sharing network.

Each cell in the matrix indicates the number of genes (_e_-value ≤ 10−10 and ≥ 25% global identity) shared between 134 archaeal and 1,847 bacterial genomes in each pairwise inter-domain comparison (scale bar at lower right). Archaeal genomes are listed as in Fig. 1. Bacterial genomes are presented in 23 groups corresponding to phylum or class in the GenBank nomenclature: a = Clostridia; b = Erysipelotrichi, Negativicutes; c = Bacilli; d = Firmicutes; e = Chlamydia; f = Verrucomicrobia, Planctomycete; g = Spirochaete; h = Gemmatimonadetes, Synergisteles, Elusimicrobia, Dyctyoglomi, Nitrospirae; i = Actinobacteria; j = Fibrobacter, Chlorobi; k = Bacteroidetes; l = Fusobacteria; Thermatogae, Aquificae, Chloroflexi; m = Deinococcus-Thermus; n = Cyanobacteria; o = Acidobacteria; δ, ε, α, β, γ = Delta, Epsilon, Alpha, Beta and Gamma proteobacteria; P = Thermosulfurobateria, Caldiserica, Chysiogenete, Ignavibacteria. Bacterial genome size in number of proteins is indicated at the top.

Extended Data Figure 2 Presence–absence patterns of archaeal genes with sparse distribution among bacteria sampled.

Archaeal export families are sorted according to the reference tree on the left. The figure shows the 391 cases of archaea-to-bacteria export (≥ 2 archaea and ≥ 2 bacteria from one phylum only), 662 cases of bacterial singleton trees (≥ 3 archaea, one bacterium). The 25,762 clusters were classified into the following categories (Supplementary Table 2): 16,983 archaeal specific, 3,315 imports, 391 exports, 662 cases of bacterial singletons with ≥ 3 archaea in the tree, 308 cases with three sequences (a bacterial singleton and 2 archaea) in the cluster, 4,074 trees in which archaea were non-monophyletic, and 29 ambiguous cases among trees showing archaeal monophyly. The bacterial taxonomic distribution is shown in the lower panel. Gene identifiers and trees are given in Supplementary Table 3.

Extended Data Figure 3 Comparison of sets of trees for single-copy genes in 11 archaeal groups.

Cumulative distribution functions for scores of tree compatibility with the recipient data set. Values are P values of the two-sided Kolmogorov–Smirnov (KS) two-sample goodness-of-fit test in the comparison of the recipient (blue) data sets against the imports (green) data set and three synthetic data sets, one-LGT (red), two-LGT (pink) and random (cyan). a, Thermoproteales. b, Desulfurococcales. c, Sulfolobales. d, Thermococcales. e, Methanobacteriales. f, Methanococcales. g, Thermoplasmatales. h, Archaeoglobales. i, Methanococcales. j, Methanosarcinales. k, Haloarchaea.

Extended Data Figure 4 Presence–absence patterns of all archaeal non-monophyletic genes.

Archaeal families that did not generate monophyly for archaeal sequences in ML trees are plotted according the reference tree on the left, the distribution across bacterial genomes groups is shown in the lower panel. These trees include 693 cases in which archaea showed non-monophyly by the misplacement of a single archaeal branch. Gene identifiers and trees are given in Supplementary Tables 4 and 5.

Extended Data Figure 5 Sorting by bacterial presence absence patterns for archaeal imports, exports and archaeal non-monophyletic families.

Archaeal families and their homologue distribution in 1,847 bacterial genomes are sorted by archaeal (top) and bacterial (bottom) gene distributions for direct comparison. af, Distributions of archaeal imports sorted by archaeal groups (a) and by bacterial groups (b); distributions of archaeal exports sorted by archaeal groups (c) and by bacterial groups (d); distributions of archaeal non-monophyletic gene families sorted by archaeal groups (e) and by bacterial groups (f).

Extended Data Figure 6 Testing for evidence of higher order archaeal relationships using a permutation tail probability (PTP) test.

Comparison of pairwise Euclidian distance distributions between archaeal real and conditional random gene family patterns using the two-sided Kolmogorov-Smirnov (KS) two-sample goodness-of-fit test. a, Archaeal specific families: distribution of 2,471 archaeal specific families present in at least 2 and less than 11 groups (top); comparison between real data and 100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately; comparison between real data and conditional random patterns generated by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P = 0.0071, middle) or into Euryarchaeota (mean P = 0.02591, bottom). b, Archaeal import families: distribution of 989 archaeal import families present in at least 2 and less than 11 groups (top). Comparison between real data and 100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P = 0.0795, middle); comparison between real data and random patterns generated by including others (Nanoarchaea,Thaumarchaea and Korarchaeota) into Euryarchaeota (mean P = 0.0098, bottom).

Extended Data Figure 7 Archaeal specific and import gene counts on a reference tree.

Number of archaeal specific and import families corresponding to each node in the reference tree are shown in the order of ‘specific/imports’. Numbers at internal nodes indicate the number of archaeal-specific families and families with bacterial homologues that correspond to the reference tree topology. Values at the far left indicate the number of archaeal-specific families and families with bacterial homologues that are present in all archaeal groups.

Extended Data Figure 8 Non tree-like structure of archaeal protein families.

Proportion of archaeal families whose distributions are congruent with the reference tree and with all possible trees. Filled circles indicate the proportion of archaeal families that are congruent to the reference tree allowing no losses (with a single origin) and different increments of losses allowed. Red, blue, green, magenta and black circles represent the proportion of families that can be explained using a single origin (849, 11.5%), single origin plus 1 loss (22.4%), single origin plus 2 losses (15%), single origin plus 3 losses (13%) and single origin plus ≥ 4 losses (38%) respectively. Lines indicate the proportion of families that can be explained by each of the 6,081,075 possible trees that preserve euryarchaeote and crenarchaeote monophyly. Note that on average, any given tree can explain 569 (8%) of the archaeal families using a single origin event in the tree, and the best tree can explain only 1,180 families (16%). In the present data, 208,019 trees explain the gene distributions better than the archaeal reference tree without loss events, underscoring the discordance between core gene phylogeny and gene distributions in the remainder of the genome.

Extended Data Table 1 Comparison of sets of trees for single-copy genes in 11 archaeal groups

Full size table

Extended Data Table 2 Functional annotations for archaeal genes according to gene family distribution and phylogeny

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Methods and Supplementary References. (PDF 728 kb)

Supplementary Data

This file contains Supplementary Tables 1-8 and a Supplementary Table Guide. (ZIP 32480 kb)

PowerPoint slides

Source data

Rights and permissions

About this article

Cite this article

Nelson-Sathi, S., Sousa, F., Roettger, M. et al. Origins of major archaeal clades correspond to gene acquisitions from bacteria.Nature 517, 77–80 (2015). https://doi.org/10.1038/nature13805

Download citation