Operons in Escherichia coli: genomic analyses and predictions - PubMed (original) (raw)

Operons in Escherichia coli: genomic analyses and predictions

H Salgado et al. Proc Natl Acad Sci U S A. 2000.

Abstract

The rich knowledge of operon organization in Escherichia coli, together with the completed chromosomal sequence of this bacterium, enabled us to perform an analysis of distances between genes and of functional relationships of adjacent genes in the same operon, as opposed to adjacent genes in different transcription units. We measured and demonstrated the expected tendencies of genes within operons to have much shorter intergenic distances than genes at the borders of transcription units. A clear peak at short distances between genes in the same operon contrasts with a flat frequency distribution of genes at the borders of transcription units. Also, genes in the same operon tend to have the same physiological functional class. The results of these analyses were used to implement a method to predict the genomic organization of genes into transcription units. The method has a maximum accuracy of 88% correct identification of pairs of adjacent genes to be in an operon, or at the borders of transcription units, and correctly identifies around 75% of the known transcription units when used to predict the transcription unit organization of the E. coli genome. Based on the frequency distance distributions, we estimated a total of 630 to 700 operons in E. coli. This step opens the possibility of predicting operon organization in other bacteria whose genome sequences have been finished.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Size distribution, in number of genes contained, of transcription units in RegulonDB, and size distribution of directons in the M54 version of the E. coli genome.

Figure 2

Figure 2

Frequency distance distributions of pairs of adjacent genes in operons versus those of pairs of adjacent genes at the boundaries between transcription units (t.u.). There are clear differences between both distributions, with genes in operons having peaks very near to distance 0. The highest peaks correspond to the −4 and −1 overlaps.

Figure 3

Figure 3

Data used to estimate the total number of operons in the entire E. coli genome. (a) Distance distributions at 10-bp intervals. (b) Frequency distance distributions. (c) Frequency distance distributions of adjacent genes in directons versus the average of those in operons and at transcription unit (t.u.) boundaries. Notice the nice correspondence of the peaks in c, which also confirms how well the sample (operons and transcription unit borders) represents the population (directons, or total adjacent genes transcribed in the same direction). The estimated total operons, as extrapolated from these data, goes from 630 to 700.

Figure 4

Figure 4

Frequency distance distributions as obtained by adding the frequencies at 10-bp intervals, and the log-likelihoods for a pair of genes to be in an operon at each distance interval.

Figure 5

Figure 5

Discrimination of known pairs of genes in operons by the use of distance log-likelihoods alone (dllh), and of distance and functional class log-likelihoods (tllh), at different thresholds. (a) Fraction of right and wrong positives at different thresholds. (b) Sensitivity (right pairs in operons detected/total pairs in operons), specificity (right pairs at borders/total pairs at borders), and accuracy (average of sensitivity and specificity) at different thresholds. The correct identifications are slightly better when functional classes are used.

Figure 6

Figure 6

Size distribution of known and predicted transcription units. As expected, the number of transcription units diminishes with their size in genes in a Poisson distribution style.

Similar articles

Cited by

References

    1. Overbeek R, Fonstein M, D'Souza M, Pusch G D, Maltsev N. Proc Natl Acad Sci USA. 1999;96:2896–2901. - PMC - PubMed
    1. Dandekar T, Snel B, Huynen M, Bork P. Trends Biochem Sci. 1998;23:324–328. - PubMed
    1. Lawrence J G, Roth J R. Genetics. 1996;143:1843–1860. - PMC - PubMed
    1. Lawrence J G. Trends Microbiol. 1997;5:355–359. - PubMed
    1. Glansdorff N. J Mol Evol. 1999;49:432–438. - PubMed

Publication types

MeSH terms

LinkOut - more resources