Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure - PubMed (original) (raw)

Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mRNA primary structure

Gabriela Moura et al. PLoS One. 2007.

Abstract

Background: Codon usage and codon-pair context are important gene primary structure features that influence mRNA decoding fidelity. In order to identify general rules that shape codon-pair context and minimize mRNA decoding error, we have carried out a large scale comparative codon-pair context analysis of 119 fully sequenced genomes.

Methodologies/principal findings: We have developed mathematical and software tools for large scale comparative codon-pair context analysis. These methodologies unveiled general and species specific codon-pair context rules that govern evolution of mRNAs in the 3 domains of life. We show that evolution of bacterial and archeal mRNA primary structure is mainly dependent on constraints imposed by the translational machinery, while in eukaryotes DNA methylation and tri-nucleotide repeats impose strong biases on codon-pair context.

Conclusions: The data highlight fundamental differences between prokaryotic and eukaryotic mRNA decoding rules, which are partially independent of codon usage.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Flowchart of the codon-pair context analysis performed by ANACONDA.

A) The software selects valid ORFs from the total set available for each species (ORFeome) and counts all combinations of two consecutive codons (codon-pair context) that are present in the sequences. B) The observed values are incorporated into a contingency table in which the lines correspond to the 5′ codon (ribosome P-site) and the columns to the 3′ codon (ribosome A-site) of each pair. C) The contingency table of observed values is then compared to another table in which the values expected under independence are calculated. The cell corresponding to each pair of codons was colored in green for preferred contexts or red for rejected ones. This produces a color-coded map for the 61×64 two-codon contexts of one ORFeome. D) To aid simultaneous comparison of a large set of ORFeomes the 61×64 map is automatically converted into one single column with 3904 lines, one for each pair of codons. E) Finally, the columns that illustrate the two-codon context bias of each individual ORFeome are placed side by side, yielding a large-scale codon context comparison map. Both maps for codon context bias, i.e. the 61×64 map for a single species and the large-scale codon context comparison map can be rearranged using clustering methodologies that highlight similar codon-pair context patterns. For detailed description of statistics and software, see Methods or , .

Figure 2

Figure 2. Codon-pair context is species specific.

A) Individual codon-pair context maps built for various genomes followed phylogeny indicating that codon-pair context is species specific. For instance, the human ORFeome map is more similar to that of chimpanzee (Pan troglodytes) than to the mouse (Mus musculus) map. B) This result was confirmed using differential display maps (DDM) that subtract two codon-pair context maps. For example, H. sapiens_–_M. musculus (H.s vs M.m); H. sapiens_–_P. troglodytes (H.s vs P.t). In these differential display maps major codon-pair context differences (above 15) are shown in light blue and darker maps correspond to species with more similar codon-pair context biases. In the present example, the maps of H.s vs M.m and H.s vs P.t have 6% and 1% of blue cells, respectively. C and D) The same phylogenetical relationship could be detected for bacterial ORFeomes, as exemplified for Escherichia coli, Bacillus cereus and Salmonella typhi. The DDM built with these species have 55% (E.c vs B.c) and 20% (E.c vs S.t) of blue cells. E) Finally, the phylogenetical relationship was maintained when the above species were clustered according to the similarities of the codon-pair context maps. The yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe were added to include an intermediate group of lower eukaryotes in the tree. Adjusted residuals are colored in the maps according to the color scale shown, so that green cells correspond to preferred and red cells to rejected contexts.

Figure 3

Figure 3. Nucleotide context preferences can be detected in total genome sequences.

A large-scale map for codon-pair context was produced using either the ORFeome (panel A) or the total genome (panel B) sequences of 119 species (see Figure 1 and Methods). Such patterns are either universal i.e. present in every species, or visible only in special phylogenetic groups. Surprisingly, most of the ORFeome patterns were also present in total genome sequences, implying that the major forces that drive the evolution of coding sequences are not necessarily connected to mRNA translation. Moreover, when a Differential Map Display (DDM) was built to compare the two former maps (panel C) it became clear that eukaryotes have a more heterogeneous behavior, since they showed greater resemblance between coding and non-coding sequences (darker pattern in the DDM), but they also produced the larger differences found in the DDM (*). These differences correspond either to two-codon context rules imposed by the translational machinery and hence specific of ORFeomes, or to genome biases that are strongly repressed in coding sequences, where they are probably associated to increased decoding error rates. ORFeomes were arranged in the map by domain of life (Eukaryota, Archaea and Eubacteria from left to right) and sorted as shown in Figure S2. Adjusted residuals are colored in the maps so that green cells correspond to preferred and red cells to rejected contexts, while in the DDM major differences (above 15) between residuals of the previous maps are shown in light blue.

Figure 4

Figure 4. Influence of dinucleotide bias on the codon-pair context preferences.

A) In order to highlight the influence of dinucleotide bias on codon-pair contexts, the maps of H. sapiens, M. musculus, S. cerevisiae and E. coli were arranged according to their (N3-N1) context. High degree of context discrimination was achieved by these two positions in higher eukaryotes, especially for the dinucleotide CpG (blue square), however this effect was weak in yeast and E. coli showed an opposite preference pattern (green). Adjusted residuals are colored in the maps so that green cells correspond to preferred and red cells to rejected contexts. B) In order to further evaluate the role of the dinucleotide N3-N1 bias on codon-pair context biases dinucleotide preferences were determined using total genome sequences. The dinucleotide combinations with highest bias were displayed in green (preferred dinucleotides) or red (rejected ones) and correspond to dinucleotides that appear 1% above or bellow the expected level, respectively. The UpA dinucleotide is strongly repressed throughout all domains of life. Other constraints imposed on ORFeomes by genomes biases include the rejection of CpG dinucleotides in higher eukaryotes and the accumulation of CpA and UpG in higher eukaryotes or UpU and ApA in almost all organisms. The last preference is related to high number of tandem repeats of more than 3 consecutive Us or As (Figure S4). ORFeomes were arranged in both maps by domain of life (Eukaryota, Archaea and Eubacteria from left to right) and sorted as shown in Figure S2.

Figure 5

Figure 5. Genome dinucleotide bias has a strong influence on codon-pair context.

Since the most generalized negative codon-pair context rule detected corresponds to the general expression NNU3-A1NN, which includes the out-of-frame translation termination contexts NNU3-A1A2N and NNU3-A1G2N, other out-of-frame context terminators were analyzed separately. For this, the adjusted residuals of such contexts were included in an ORFeome comparison map. It was clear that NNU3-A1GN and NNU3-A1AN were indeed the most negative codon-pair contexts bearing out-of-frame stops, followed by NUA3-G1NN. The other groups of contexts tested did not generate codon-pair context rules, although some of them contained the strongly repressed UpA dinucleotide. The hypothesis that rejection of codon-pair contexts containing out-of-frame stop codons, namely NNU3-A1A2N and NNU3-A1G2N evolved to avoid premature termination was partially contradicted by the existence of similar patterns of NNU3-A1NN-type contexts that do not include any out-of-frame stops, namely NNU3-A1C2N and NNU3-A1U2N. ORFeomes were arranged in the map by domain of life (Eukaryota, Archaea and Eubacteria from left to right) and sorted as shown in Figure S2. Adjusted residuals are colored in the maps so that green cells correspond to preferred and red cells to rejected contexts.

Figure 6

Figure 6. Some codon-pair context patterns are associated to mRNA primary structure biases.

A) In order to identify ORFeome specific codon-pair context biases the two large scale context maps were filtered in such a way that only cells that yielded residual differences above 15 between the ORFeomes and total genomes sequences were shown. All other cases were colored in black (see Figure S5 for different display thresholds). Codon-pair context patterns specific of ORFeomes are highlighted on the side of panel A. B) To visualize the patterns that appear in genomes and are absent in ORfeomes, large-scale comparative maps obtained with total genomes and ORFeomes were subtracted and only the cells that yielded differences above 15 were displayed. This highlighted patterns that are strongly preferred or repressed in coding sequences and may correspond to mistranslation hot spots.

References

    1. Cliften PF, Fulton RS, Wilson RK, Johnston M. After the duplication: gene loss and adaptation in Saccharomyces genomes. Genetics. 2006;172:863–872. - PMC - PubMed
    1. van de Lagemaat LN, Gagnier L, Medstrand P, Mager DL. Genomic deletions and precise removal of transposable elements mediated by short identical DNA segments in primates. Genome Res. 2005;15:1243–1249. - PMC - PubMed
    1. Lin YW, Thi DA, Kuo PL, Hsu CC, Huang BD, et al. Polymorphisms associated with the DAZ genes on the human Y chromosome. Genomics. 2005;86:431–438. - PubMed
    1. Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A. 2004;101:3480–3485. - PMC - PubMed
    1. Chan SW, Henderson IR, Jacobsen SE. Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat Rev Genet. 2005;6:351–360. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources