Over- and under-representation of short oligonucleotides in DNA sequences (original) (raw)

Abstract

Strand-symmetric relative abundance functionals for di-, tri-, and tetranucleotides are introduced and applied to sequences encompassing a broad phylogenetic range to discern tendencies and anomalies in the occurrences of these short oligonucleotides within and between genomic sequences. For dinucleotides, TA is almost universally under-represented, with the exception of vertebrate mitochondrial genomes, and CG is strongly under-represented in vertebrates and in mitochondrial genomes. The traditional methylation/deamination/mutation hypothesis for the rarity of CG does not adequately account for the observed deficiencies in certain sequences, notably the mitochondrial genomes, yeast, and Neurospora crassa, which lack the standard CpG methylase. Homodinucleotides (AA.TT, CC.GG) and larger homooligonucleotides are over-represented in many organisms, perhaps due to polymerase slippage events. For trinucleotides, GCA.TGC tends to be under-represented in phage, human viral, and eukaryotic sequences, and CTA.TAG is strongly under-represented in many prokaryotic, eukaryotic, and viral sequences. The CCA.TGG triplet is ubiquitously over-represented in human viral and eukaryotic sequences. Among the tetranucleotides, several four-base-pair palindromes tend to be under-represented in phage sequences, probably as a means of restriction avoidance. The tetranucleotide CTAG is observed to be rare in virtually all bacterial genomes and some phage genomes. Explanations for these over- and under-representations in terms of DNA/RNA structures and regulatory mechanisms are considered.

1358

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Bernardi G., Mouchiroud D., Gautier C., Bernardi G. Compositional patterns in vertebrate genomes: conservation and change in evolution. J Mol Evol. 1988 Dec;28(1-2):7–18. doi: 10.1007/BF02143493. [DOI] [PubMed] [Google Scholar]
  2. Beutler E., Gelbart T., Han J. H., Koziol J. A., Beutler B. Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc Natl Acad Sci U S A. 1989 Jan;86(1):192–196. doi: 10.1073/pnas.86.1.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bird A. P. CpG-rich islands and the function of DNA methylation. Nature. 1986 May 15;321(6067):209–213. doi: 10.1038/321209a0. [DOI] [PubMed] [Google Scholar]
  4. Cedar H., Razin A. DNA methylation and development. Biochim Biophys Acta. 1990 May 24;1049(1):1–8. doi: 10.1016/0167-4781(90)90076-e. [DOI] [PubMed] [Google Scholar]
  5. Gilson E., Saurin W., Perrin D., Bachellier S., Hofnung M. Palindromic units are part of a new bacterial interspersed mosaic element (BIME). Nucleic Acids Res. 1991 Apr 11;19(7):1375–1383. doi: 10.1093/nar/19.7.1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gunsalus R. P., Yanofsky C. Nucleotide sequence and expression of Escherichia coli trpR, the structural gene for the trp aporepressor. Proc Natl Acad Sci U S A. 1980 Dec;77(12):7117–7121. doi: 10.1073/pnas.77.12.7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Honess R. W., Gompels U. A., Barrell B. G., Craxton M., Cameron K. R., Staden R., Chang Y. N., Hayward G. S. Deviations from expected frequencies of CpG dinucleotides in herpesvirus DNAs may be diagnostic of differences in the states of their latent genomes. J Gen Virol. 1989 Apr;70(Pt 4):837–855. doi: 10.1099/0022-1317-70-4-837. [DOI] [PubMed] [Google Scholar]
  8. Inman R. B. A denaturation map of the lambda phage DNA molecule determined by electron microscopy. J Mol Biol. 1966 Jul;18(3):464–476. doi: 10.1016/s0022-2836(66)80037-2. [DOI] [PubMed] [Google Scholar]
  9. JOSSE J., KAISER A. D., KORNBERG A. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem. 1961 Mar;236:864–875. [PubMed] [Google Scholar]
  10. Kozhukhin C. G., Pevzner P. A. Genome inhomogeneity is determined mainly by WW and SS dinucleotides. Comput Appl Biosci. 1991 Jan;7(1):39–49. doi: 10.1093/bioinformatics/7.1.39. [DOI] [PubMed] [Google Scholar]
  11. Lennon G. G., Fraser N. W. CpG frequency in large DNA segments. J Mol Evol. 1983;19(3-4):286–288. doi: 10.1007/BF02099976. [DOI] [PubMed] [Google Scholar]
  12. Lieb M. Recombination in the lambda repressor gene: evidence that very short patch (VSP) mismatch correction restores a specific sequence. Mol Gen Genet. 1985;199(3):465–470. doi: 10.1007/BF00330759. [DOI] [PubMed] [Google Scholar]
  13. McClelland M. Selection against dam methylation sites in the genomes of DNA of enterobacteriophages. J Mol Evol. 1984;21(4):317–322. doi: 10.1007/BF02115649. [DOI] [PubMed] [Google Scholar]
  14. Nussinov R. Nearest neighbor nucleotide patterns. Structural and biological implications. J Biol Chem. 1981 Aug 25;256(16):8458–8462. [PubMed] [Google Scholar]
  15. Nussinov R. Theoretical molecular biology: prospectives and perspectives. J Theor Biol. 1987 Mar 21;125(2):219–235. doi: 10.1016/s0022-5193(87)80043-7. [DOI] [PubMed] [Google Scholar]
  16. Ohno S. Universal rule for coding sequence construction: TA/CG deficiency-TG/CT excess. Proc Natl Acad Sci U S A. 1988 Dec;85(24):9630–9634. doi: 10.1073/pnas.85.24.9630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Otwinowski Z., Schevitz R. W., Zhang R. G., Lawson C. L., Joachimiak A., Marmorstein R. Q., Luisi B. F., Sigler P. B. Crystal structure of trp repressor/operator complex at atomic resolution. Nature. 1988 Sep 22;335(6188):321–329. doi: 10.1038/335321a0. [DOI] [PubMed] [Google Scholar]
  18. Rafferty J. B., Somers W. S., Saint-Girons I., Phillips S. E. Three-dimensional crystal structures of Escherichia coli met repressor with and without corepressor. Nature. 1989 Oct 26;341(6244):705–710. doi: 10.1038/341705a0. [DOI] [PubMed] [Google Scholar]
  19. Riggs A. D. DNA methylation and late replication probably aid cell memory, and type I DNA reeling could aid chromosome folding and enhancer function. Philos Trans R Soc Lond B Biol Sci. 1990 Jan 30;326(1235):285–297. doi: 10.1098/rstb.1990.0012. [DOI] [PubMed] [Google Scholar]
  20. Selker E. U. Premeiotic instability of repeated sequences in Neurospora crassa. Annu Rev Genet. 1990;24:579–613. doi: 10.1146/annurev.ge.24.120190.003051. [DOI] [PubMed] [Google Scholar]
  21. Tazi J., Bird A. Alternative chromatin structure at CpG islands. Cell. 1990 Mar 23;60(6):909–920. doi: 10.1016/0092-8674(90)90339-g. [DOI] [PubMed] [Google Scholar]