Genome rhetoric and the emergence of compositional bias - PubMed (original) (raw)

Genome rhetoric and the emergence of compositional bias

Kalin Vetsigian et al. Proc Natl Acad Sci U S A. 2009.

Abstract

Genomes exhibit diverse patterns of species-specific GC content, GC and AT skews, codon bias, and mutation bias. Despite intensive investigations and the rapid accumulation of sequence data, the causes of these a priori different genome biases have not been agreed on and seem multifactorial and idiosyncratic. We show that these biases can arise generically from an instability of the coevolutionary dynamics between genome composition and resource allocation for translation, transcription, and replication. Thus, we offer a unifying framework for understanding and analyzing different genome biases. We develop a test of multistability of nucleotide composition of completely sequenced genomes and reveal a bistability for Borrelia burgdorferi, a genome with pronounced replication-related biases. These results indicate that evolution generates rhetoric, it improves the efficiency of the genome's communication with the cell without modifying the message, and this leads to bias.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Contrasting frameworks for studying genome biases. (A) Mutation–selection–drift framework. Selection and mutation pressure are treated as static external variables, and for given values of these inputs the genome composition relaxes to a single state. The variety of genome biases must arise from exogenous mutation and selection pressures. (B) Coevolutionary framework of template-directed synthesis. Resource allocation is introduced as a dynamic degree of freedom. It is optimized to increase the speed, accuracy, and/or energy efficiency for a given template composition and in turn controls the mutation and selection pressures affecting template composition. The feedback loops lead to multistability and diversity of genome biases.

Fig. 2.

Fig. 2.

Evolutionary instability of template-directed synthesis leading to genome biases. (A) Each template letter selects for an increase of its cognate (same color) adaptor concentration; for an optimal adaptor pool, more abundant cognate adaptors correspond to more popular letters. The time a polymerase waits for an adaptor to bind is inversely proportional to the adaptor concentration. The accuracy of synthesis also depends on the relative concentrations because the polymerase discriminates between correct and incorrect adaptors imperfectly. (B) Coevolution between regulation of adaptor abundance and genome composition leads to multistability at low mutation rates. Presented is a symmetric case with two synonymous template letters (dark and light squares) and their corresponding adaptors (dark and light circles). We follow the fate of the symmetric state after a fluctuation increasing the dark letters. The excess selects for an increase of dark adaptors (see A). This, in turn, selects for dark letters at all sites, whereas mutation tries to restore the letter balance. At low mutation rate, selection increases the dark letters, promoting a further increase of dark adaptors. The cycle continues until balanced by mutation pressure. Because of symmetry, there is an alternative state biased toward light letters and adaptors. The system is bistable: there is selection on the bias but not on its direction.

Fig. 3.

Fig. 3.

Different instantiations of genome biases emerging from the universal mechanism on Fig. 2_B_. The first two columns list the template letters and adaptors. dN denotes a dNTP, (d)N is dNTP for replication or NTP for transcription. The third column specifies an environmentally independent selection pressure that drives the optimization of resources and shapes letter usage. This picture is a simplification: GC content and skew evolution are coupled as reflected in the full model; nucleotide composition and codon biases also influence each other. More than two states are stable in general.

Fig. 4.

Fig. 4.

Pattern of equilibrium solutions reached using a fixed set of parameters and the same initial condition as a function of the mutation rate μ (M = I + μ_M_(0)). The four panels present different projections of the same data. Outcomes are marked in black if GC and AT skews are in-phase (same sign) and in green if out-of-phase (opposite signs). Grayed out are solutions in which the lagging strand is no longer limiting (see

SI Text

). (A) Skew magnitude as a function of the mutations per genome per generation (same for the AT and GC skew because of parameter symmetry). (B) GC content as a function of the mutations per genome per generation. (C) Correlation between AT and GC skews. (D) Correlation between GC content and GC skew. The following parameters were used: symmetric mutation matrix M(0) with transition transversion bias, k = 7, _M_GC(0) = _M_GT(0) = _M_AC(0) = _M_AT(0) = 1/(k + 2), M_AG(0) = M_CT(0) = k/(k + 2). Redundancy structure {R si}: 9.09% fixed sites (evenly distributed to A, G, C, and T) and from the rest of the genome: 5% each of GC and AT redundant sites, 20% each of AG and CT redundant sites, 50% of GCAT redundant sites; genome length L = 110,000; α_i = 1, ϴ = 106 = 9.09_L. Δ_L_AG = ϴ/2 was used to gray out solutions. Initial conditions: c i = 1 and letter usage equilibrated to it.

Fig. 5.

Fig. 5.

Bistability and robustness of adaptor bias (ratio of adaptor concentrations) for the two-letter model. The equilibrium adaptor bias, ψ_c_, is plotted as a function of the number of mutations per genome per generation, μ_L_, for different model parameters. All curves are for ϴ = 0. Thick blue line, exact solution of the symmetric case m = 0, α = 1, with the two letters being synonymous at all genome sites. Solid red line, same as the above except for the presence of strong mutational asymmetry, m = 0.9; the dotted red line indicates the unstable equilibria. Red balls, simulation of the previous case (m = 0.9), but 4.55% of the sites are fixed to letter 1, and 4.55% are fixed to letter 2. Green line, bias of the two solutions (Top and Bottom branches for the same μ_L_) at the onset of bistability for different m >0 in the synonymous model.

Fig. 6.

Fig. 6.

Predicted bistability of the genome composition of B. burgdorferi caused by selection on the speed of replication. The actual genome has GC and TA skews indicated by blue and red stars. Simulations reveal an additional stable state with skews of opposite sign and higher GC content. The location of the second stable state (but not its existence) depends on the free parameter κ∈ [0,1] (see Materials and Methods). Blue and red balls correspond to κ = 0.5; blue and red lines correspond to κ∈[0.001,0.99].

References

    1. Gautier C. Compositional bias in DNA. Curr Opin Genet Dev. 2000;10:656–661. - PubMed
    1. Grantham R, Gautier C, Gouy M, Mercier R, Pave A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980;8:r49–r62. - PMC - PubMed
    1. Gouy M, Gautier C. Codon usage in bacteria: Correlation with gene expressivity. Nucleic Acids Res. 1982;10:7055–7074. - PMC - PubMed
    1. Sueoka N. On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA. 1962;48:582–592. - PMC - PubMed
    1. Muto A, Osawa S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA. 1987;84:166–169. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources