Codon usage between genomes is constrained by genome-wide mutational processes - PubMed (original) (raw)
Comparative Study
. 2004 Mar 9;101(10):3480-5.
doi: 10.1073/pnas.0307827100. Epub 2004 Feb 27.
Affiliations
- PMID: 14990797
- PMCID: PMC373487
- DOI: 10.1073/pnas.0307827100
Comparative Study
Codon usage between genomes is constrained by genome-wide mutational processes
Swaine L Chen et al. Proc Natl Acad Sci U S A. 2004.
Abstract
Analysis of genome-wide codon bias shows that only two parameters effectively differentiate the genome-wide codon bias of 100 eubacterial and archaeal organisms. The first parameter correlates with genome GC content, and the second parameter correlates with context-dependent nucleotide bias. Both of these parameters may be calculated from intergenic sequences. Therefore, genome-wide codon bias in eubacteria and archaea may be predicted from intergenic sequences that are not translated. When these two parameters are calculated for genes from nonmammalian eukaryotic organisms, genes from the same organism again have similar values, and genome-wide codon bias may also be predicted from intergenic sequences. In mammals, genes from the same organism are similar only in the second parameter, because GC content varies widely among isochores. Our results suggest that, in general, genome-wide codon bias is determined primarily by mutational processes that act throughout the genome, and only secondarily by selective forces acting on translated sequences.
Figures
Fig. 1.
(a) Scree plot of singular values. Singular values (σ_j_) were obtained from a SVD of 400 genes from each of 100 genomes. (b) Contribution of var(uj)between (between-genome variance) to overall variance. Overall variance is scaled to 1 in each dimension. The rest of the overall variance is due to var(uj)within (within-genome variance). In only two dimensions, j = 1 and 2, is var(uj)between the major source of variance.
Fig. 2.
(a) Plot of versus genome GC content for each organism. Usage of the first eigencodon correlates with genome GC content (_R_2 = 0.961). (b) Plot of
versus intergenic bias. The second eigencodon correlates with a model constructed as a linear combination of intergenic bias parameters (_R_2 = 0.669). In both plots, open boxes are data points for A. thaliana, C. elegans, E. cuniculi, P. falciparum, S. cerevisiae, and S. pombe.
Fig. 3.
Eukaryotic genomes have low variance in usage of the second eigencodon. Expanded view of box and whisker plots of for j = 1,..., 8 for all prokaryotic genomes g, with values for eukaryotic genomes superimposed. A full diagram can be found in Fig. 5. Box and whisker plots are drawn in gray. Asterisks indicate outlying prokaryotic values. Values for eukaryotic organisms are drawn individually with symbols as indicated in the upper left corner. Compared with prokaryotic genomes, many eukaryotic genomes have large variance in the usage of eigencodon v1 but relatively small variance in usage of eigencodon v2. In general, variance is smaller for eukaryotic genomes than for prokaryotic genomes because eukaryotic genes tend to be longer than prokaryotic genes and hence provide less noisy samples of codon bias. Considering only long prokaryotic genes does not change the results qualitatively (see Figs. 7–9, which are published as supporting information on the PNAS web site).
Fig. 4.
Graph of components of predicted genome-wide codon bias vector, ĉg, based on intergenic nucleotide sequences versus components of actual genome-wide codon bias vector, c̄g. Each point in the plot represents a coordinate pair for some organism g and some codon m(w).
is a component of c̄g, and
is a component of ĉg. Different organisms and codons are not differentiated in these plots. Stop codons (TAA, TAG, and TGA) and the single codons for methionine (ATG) and tryptophan (TGG) were excluded. (a) Prokaryotes. Overall _R_2 = 0.858. Average for individual genomes is _R_2 = 0.840. (b) Data for the following eukaryotes: A. thaliana, C. elegans, E. cuniculi, P. falciparum, S. cerevisiae, and S. pombe. Overall _R_2 = 0.847. _R_2 values for the individual genomes are given in the text.
References
- Grantham, R. (1980) Trends Biochem. Sci. 5**,** 327-331.
- Ikemura, T. (1985) Mol. Biol. Evol. 2**,** 13-34. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- T32 GM007365/GM/NIGMS NIH HHS/United States
- HG00044/HG/NHGRI NIH HHS/United States
- 2T32GM07365/GM/NIGMS NIH HHS/United States
- GM51426/GM/NIGMS NIH HHS/United States
- T32 HG000044/HG/NHGRI NIH HHS/United States
- R01 GM051426/GM/NIGMS NIH HHS/United States
- K22 HG000044/HG/NHGRI NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous