Compositional biases of bacterial genomes and evolutionary implications - PubMed (original) (raw)
Compositional biases of bacterial genomes and evolutionary implications
S Karlin et al. J Bacteriol. 1997 Jun.
Abstract
We compare and contrast genome-wide compositional biases and distributions of short oligonucleotides across 15 diverse prokaryotes that have substantial genomic sequence collections. These include seven complete genomes (Escherichia coli, Haemophilus influenzae, Mycoplasma genitalium, Mycoplasma pneumoniae, Synechocystis sp. strain PCC6803, Methanococcus jannaschii, and Pyrobaculum aerophilum). A key observation concerns the constancy of the dinucleotide relative abundance profiles over multiple 50-kb disjoint contigs within the same genome. (The profile is rhoXY* = fXY*/fX*fY* for all XY, where fX* denotes the frequency of the nucleotide X and fY* denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complementary sequence.) On the basis of this constancy, we refer to the collection [rhoXY*] as the genome signature. We establish that the differences between [rhoXY*] vectors of 50-kb sample contigs of different genomes virtually always exceed the differences between those of the same genomes. Various di- and tetranucleotide biases are identified. In particular, we find that the dinucleotide CpG=CG is underrepresented in many thermophiles (e.g., M. jannaschii, Sulfolobus sp., and M. thermoautotrophicum) but overrepresented in halobacteria. TA is broadly underrepresented in prokaryotes and eukaryotes, but normal counts appear in Sulfolobus and P. aerophilum sequences. More than for any other bacterial genome, palindromic tetranucleotides are underrepresented in H. influenzae. The M. jannaschii sequence is unprecedented in its extreme underrepresentation of CTAG tetranucleotides and in the anomalous distribution of CTAG sites around the genome. Comparative analysis of numbers of long tetranucleotide microsatellites distinguishes H. influenzae. Dinucleotide relative abundance differences between bacterial sequences are compared. For example, in these assessments of differences, the cyanobacteria Synechocystis, Synechococcus, and Anabaena do not form a coherent group and are as far from each other as general gram-negative sequences are from general gram-positive sequences. The difference of M. jannaschii from low-G+C gram-positive proteobacteria is one-half of the difference from gram-negative proteobacteria. Interpretations and hypotheses center on the role of the genome signature in highlighting similarities and dissimilarities across different classes of prokaryotic species, possible mechanisms underlying the genome signature, the form and level of genome compositional flux, the use of the genome signature as a chronometer of molecular phylogeny, and implications with respect to the three putative eubacterial, archaeal, and eukaryote domains of life and to the origin and early evolution of eukaryotes.
Similar articles
- Microbial genome analyses: global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities.
Paulsen IT, Sliwinski MK, Saier MH Jr. Paulsen IT, et al. J Mol Biol. 1998 Apr 3;277(3):573-92. doi: 10.1006/jmbi.1998.1609. J Mol Biol. 1998. PMID: 9533881 - Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea.
Koonin EV, Mushegian AR, Galperin MY, Walker DR. Koonin EV, et al. Mol Microbiol. 1997 Aug;25(4):619-37. doi: 10.1046/j.1365-2958.1997.4821861.x. Mol Microbiol. 1997. PMID: 9379893 - Frequent oligonucleotides and peptides of the Haemophilus influenzae genome.
Karlin S, Mrázek J, Campbell AM. Karlin S, et al. Nucleic Acids Res. 1996 Nov 1;24(21):4263-72. doi: 10.1093/nar/24.21.4263. Nucleic Acids Res. 1996. PMID: 8932382 Free PMC article. - Comparative DNA analysis across diverse genomes.
Karlin S, Campbell AM, Mrázek J. Karlin S, et al. Annu Rev Genet. 1998;32:185-225. doi: 10.1146/annurev.genet.32.1.185. Annu Rev Genet. 1998. PMID: 9928479 Review. - Global dinucleotide signatures and analysis of genomic heterogeneity.
Karlin S. Karlin S. Curr Opin Microbiol. 1998 Oct;1(5):598-610. doi: 10.1016/s1369-5274(98)80095-7. Curr Opin Microbiol. 1998. PMID: 10066522 Review.
Cited by
- Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification.
Alsop EB, Raymond J. Alsop EB, et al. PLoS One. 2013 Jul 1;8(7):e67337. doi: 10.1371/journal.pone.0067337. Print 2013. PLoS One. 2013. PMID: 23840870 Free PMC article. - Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering.
Bohlin J, Skjerve E, Ussery DW. Bohlin J, et al. BMC Genomics. 2009 Oct 21;10:487. doi: 10.1186/1471-2164-10-487. BMC Genomics. 2009. PMID: 19845945 Free PMC article. - Genome dynamics of short oligonucleotides: the example of bacterial DNA uptake enhancing sequences.
Bakkali M. Bakkali M. PLoS One. 2007 Aug 15;2(8):e741. doi: 10.1371/journal.pone.0000741. PLoS One. 2007. PMID: 17710141 Free PMC article. - Horizontal gene transfer and bacterial diversity.
Dutta C, Pan A. Dutta C, et al. J Biosci. 2002 Feb;27(1 Suppl 1):27-33. doi: 10.1007/BF02703681. J Biosci. 2002. PMID: 11927775 Review. - Alignment-free supervised classification of metagenomes by recursive SVM.
Cui H, Zhang X. Cui H, et al. BMC Genomics. 2013 Sep 22;14:641. doi: 10.1186/1471-2164-14-641. BMC Genomics. 2013. PMID: 24053649 Free PMC article.
References
- Trends Biochem Sci. 1996 Oct;21(10):370-2 - PubMed
- Trends Biochem Sci. 1995 Nov;20(11):443-8 - PubMed
- J Bacteriol. 1997 Jan;179(2):345-57 - PubMed
- J Mol Evol. 1997 May;44(5):528-41 - PubMed
- J Biol Chem. 1961 Mar;236:864-75 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources