The evolution of domain-content in bacterial genomes - PubMed (original) (raw)

The evolution of domain-content in bacterial genomes

Nacho Molina et al. Biol Direct. 2008.

Abstract

Background: Across all sequenced bacterial genomes, the number of domains nc in different functional categories c scales as a power-law in the total number of domains n, i.e. nc proportional n(alpha)c, with exponents alpha(c) that vary across functional categories. Here we investigate the implications of these scaling laws for the evolution of domain-content in bacterial genomes and derive the simplest evolutionary model consistent with these scaling laws.

Results: We show that, using only an assumption of time invariance, the scaling laws uniquely determine the relative rates of domain additions and deletions across all functional categories and evolutionary lineages. In particular, the model predicts that the rate of additions and deletions of domains of category c is proportional to the number of domains nc currently in the genome and we discuss the implications of this observation for the role of horizontal transfer in genome evolution. Second, in addition to being proportional to nc, the rate of additions and deletions of domains of category c is proportional to a category-dependent constant rho(c), which is the same for all evolutionary lineages. This 'evolutionary potential' rho(c) represents the relative probability for additions/deletions of domains of category c to be fixed in the population by selection and is predicted to equal the scaling exponent alpha(c). By comparing the domain content of 93 pairs of closely-related genomes from all over the phylogenetic tree of bacteria, we demonstrate that the model's predictions are supported by available genome-sequence data.

Conclusion: Our results establish a direct quantitative connection between the scaling of domain numbers with genome size, and the rate of addition and deletions of domains during short evolutionary time intervals.of domain numbers with genome size, and the rate of addition and deletions of domains during short evolutionary time intervals.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Scaling laws. The number of protein-domains associated with functional categories 'translation' (green), metabolic process' (blue), and 'regulation of transcription' (red) as a function of the total number of domains in the genome for which a functional annotation is available. Each dot corresponds to a fully-sequenced microbial genome, with the total number of domains on the horizontal axis and the number of domains in a particular functional category on the vertical axis. Both axes are shown on a logarithmic scale. The straight lines show power-law fits.

Figure 2

Figure 2

Evolutionary histories and time invariance. Evolutionary histories of different organisms. The scaling laws constrain integrals of domain-count changes over long evolutionary times, i.e. from the common ancestor up to the present (left panel). Our assumption of time invariance now implies relations between the domain-count changes during short time intervals which can be tested by comparing domain-counts in closely-related genomes (right panel).

Figure 3

Figure 3

Linear dependence of domain-count changes on domain occurrence. Linear dependency of the fraction of domain-count changes on the domain-count itself. Left panel : For each genome pair i the fraction Δnci/Δni of domain-count changes that involve domains of category c is shown (vertical axis) as a function of the fraction nci/ni of all domains in the genome that are associated with category c (horizontal axis) for the categories 'metabolic process' (blue), 'regulation of transcription' (red), and 'protein kinase activity' (green). Each dot corresponds to the data for one pair i of closely-related genomes. Both axes are shown on a logarithmic scale. The straight-lines show least-squares fits of the form log⁡[Δnci/Δni]=γclog⁡[nci/ni]+δc. The fitted slopes for the three categories are _γ_prot.kin.activity = 0.56 ± 0.46, _γ_reg.transcr. = 0.95 ± 0.20, and _a_met.proc. = 1.48 ± 0.31. For comparison the dotted lines show linear scaling. Right panel: A 99% posterior probability interval for the slope γ c was estimated for all selected GO categories (Methods). The fitted slopes were ordered from high to low and are shown in the right panel from left to right with the vertical bars corresponding to the 99% posterior probability intervals for each slope γ c. The slope γ = 1, corresponding to a linear dependency, is shown as a horizontal dotted line.

Figure 4

Figure 4

Evolutionary potentials across different lineages. Distribution of inferred evolutionary potentials ρci for the categories 'translation' (left panel), 'metabolic process' (middle panel), and 'regulation of transcription' (right panel) across all genome pairs i. Each panel shows the 99% posterior probability intervals [lci,hci] for the potentials ρci as vertical bars (sorted from left to right by their means). The dotted horizontal lines show the average ρci, averaged over all pairs i.

Figure 5

Figure 5

Correlation between exponents α c and evolutionary potentials ρ c. Correlation between the inferred evolutionary potentials ρ c (vertical axis) and the exponents ρ c (horizontal axis) of the scaling laws. Each dot corresponds to one of the 156 selected GO categories. The line shows the linear fit ρ c = 0.71_α_ c + 0.1 with correlation coefficient _r_2 = 0.80.

Similar articles

Cited by

References

    1. Zuckerkandl E, Pauling LB. Molecular disease, evolution, and genetic heterogeneity. In: Kasha M, Pulman B, editor. Horizons in Biochemistry. New York: Academic Press; 1962. pp. 189–225.
    1. Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. doi: 10.1038/217624a0. - DOI - PubMed
    1. Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press; 1983.
    1. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17:368–376. doi: 10.1007/BF01734359. - DOI - PubMed
    1. Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution. 1986;3:418–426. - PubMed

MeSH terms

LinkOut - more resources