Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution - PubMed (original) (raw)

Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution

Tal Dagan et al. Proc Natl Acad Sci U S A. 2007.

Abstract

The amount of lateral gene transfer (LGT) that has occurred in microbial evolution is heavily debated. Efforts to quantify LGT through gene-tree comparisons have delivered estimates that between 2% and 60% of all prokaryotic genes have been affected by LGT, the 30-fold discrepancy reflecting differences among gene samples studied and uncertainties inherent in phylogenetic reconstruction. Here we present a simple method that is independent of gene-tree comparisons to estimate the LGT rate among sequenced prokaryotic genomes. If little or no LGT has occurred during evolution, ancestral genome sizes would become unrealistically large, whereas too much LGT would render them far too small. We determine the amount of LGT that is necessary and sufficient to bring the distribution of inferred ancestral genome sizes into agreement with that observed among modern microbes. Rather than testing for phylogenetic congruence or lack thereof across genes, we assume that all gene trees are compatible; hence, our method delivers very conservative lower-bound estimates of the average LGT rate. The results indicate that among 57,670 gene families distributed across 190 sequenced genomes, at least two-thirds and probably all, have been affected by LGT at some time in their evolutionary past. A component of common ancestry nonetheless remains detectable in gene distribution patterns. We estimate the minimum lower bound for the average LGT rate across all genes as 1.1 LGT events per gene family and gene family lifespan and this minimum rate increases sharply when genes present in only a few genomes are excluded from the analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

The distibution of genes across genomes. (a) Presence (black) and absence (white) patterns for representative segments of the data comprising widely (present in 100–190 genomes), intermediately (60–80 and 10–20 genomes), and sparsely distributed genes (two genomes). (Note the scale bar.) (b) Color-coded matrix of the proportion of shared genes for all genome pairs, with genomes grouped by taxonomical classification. For the same matrix using random genome order, see

SI Fig. 4

. The proportion of shared genes for a genome pair x,y is calculated as the number of genes in genomes x,y that are found in shared clusters, divided by the total genes in genomes x,y. The color scale indicates the shared proportions of genes in percent. For example, archaebacterial genome pairs share 32 ± 16% (mean ± SD) families on average, whereas archaebacterial vs. eubacterial genome pairs share only 7 ± 3% of their families. For cyanobacteria, 61 ± 10% of each genome consists of families shared with another cyanobacterium, as opposed to 18 ± 5% in comparisons to noncyanobacteria. For proteobacteria, γ-proteobacteria share 38 ± 13% common families with other γ-proteobacteria, 26 ± 7% with other proteobacteria, and 18 ± 8% with nonproteobacteria.

Fig. 2.

Fig. 2.

Gene loss and LGT can both account for patchy gene distributions. Schematic representation of four different LGT allowances. (a) In the loss-only model, all genes are assumed to have originated at the root of the tree; PAPs are attributed to gene loss only. (b) Introducing a gene origin in the SO model disperses gene origins over internal nodes of the tree according to their first occurrence. (c) In the LGT≤1 model, each gene is allowed to have two origins, where one is an LGT. This model results in further dispersal of gene origins across the tree, hence smaller ancestral genomes. (d) Two additional LGTs are allowed in the LGT≤3 model. Allowances of up to 7, 15, and 31 LGTs were also tested.

Fig. 3.

Fig. 3.

Ancestral genome sizes reconstructed under the various reconstruction models. The colors of nodes and branches correspond to the inferred ancestral genome size, as indicated in the scale. a_–_e correspond to the SO, LGT≤1, LGT≤3, LGT≤7, and LGT≤15 models, respectively (see

SI Figs. 7 and 8

for the same analysis using a reference tree reconstructed by neighbor joining and a random reference tree, respectively). To calculate the genome size in each hypothetical taxonomic unit, a binary recursive algorithm scans the reference tree from root to tips; the genome size of each hypothetical taxonomic unit is calculated as the cumulative sum of the origins minus the cumulative sum of losses inferred for previous nodes and the node itself.

References

    1. Woese CR. Proc Natl Acad Sci USA. 2000;97:8392–8396. - PMC - PubMed
    1. Kurland CG, Canback B, Berg OG. Proc Natl Acad Sci USA. 2003;100:9658–9662. - PMC - PubMed
    1. Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. Science. 2006;311:1283–1287. - PubMed
    1. Doolittle WF. In: Microbial Phylogeny and Evolution: Concepts and Controversies. Sapp J, editor. New York: Oxford Univ Press; 2004. pp. 119–133.
    1. Gogarten JP, Doolittle WF, Lawrence JG. Mol Biol Evol. 2002;19:2226–2238. - PubMed

Publication types

MeSH terms

LinkOut - more resources