Rigorous pattern-recognition methods for DNA sequences (original) (raw)

Non-canonical sequence elements in the promoter structure. Cluster analysis of promoters recognized by Escherichia coli RNA polymerase

Nucleic Acids Research, 1997

The C-terminal domain (CTD) downstream from residue 235 of Escherichia coli RNA polymerase α subunit is involved in recognition of the promoter UP element. Here we have demonstrated, by DNase I and hydroxyl radical mapping, the presence of two UP element subsites on the promoter D of phage T7, each located half and one-and-a-half helix turns, respectively, upstream from the promoter -35 element. This non-typical UP element retained its αCTD-binding capability when transferred into the genetic environment of the rrnBP1 basic promoter, leading to transcription stimulation as high as the typical rrnBP1 UP element. Chemical protease FeBABE conjugated to αCTD S309C efficiently attacked the T7D UP element but not the rrnBP1 UP element. After alanine scanning, most of the amino acid residues that were involved in rrnBP1 interaction were also found to be involved in T7D UP element recognition, but alanine substitution at three residues had the opposite effect on the transcription activation between rrnBP1 and T7D promoters. Mutation E286A stimulated T7D transcription but inhibited rrnBP1 RNA synthesis, while L290A and K304A stimulated transcription from rrnBP1 but not the T7D promoter. Taken together, we conclude that although the overall sets of amino acid residues responsible for interaction with the two UP elements overlap, the mode of αCTD interaction with T7D UP element is different from that with rrnBP1 UP element, involving different residues on helices III and IV.

Generalized analysis of promoters (GAP): A method for dna sequence description

2001

Recent advances in the accessibility of databases containing representations of complex objects-exemplified by repositories of time-series data, information about biological macromolecules, or knowledge about metabolic pathways-have not been matched by availability of tools that facilitate the retrieval of objects of particular interest while aiding to understand their structure and relations. In applications such as the analysis of DNA sequences, on the other hand, requirements to retrieve objects on the basic of qualitative characteristics are poorly met by descriptions that emphasize precision and detail rather than structural features. This paper presents a method for identification of interesting qualitative features in biological sequences. Our approach relies on a generalized clustering methodology, where the features being sought correspond to the solutions of a multivariable, multiobjective optimization problem and generally correspond to fuzzy subsets of the object being represented. Foremost among the optimization objectives being considered are measures of the degree by which features resemble prototypical a Corresponding author. 1 May 14, 2004 13:24 WSPC/Trim Size: 9in x 6in for Review Volume newlibro˙v6 2 R. Romero Zaliz et al.

Promoter recognition by Escherichia coli RNA polymerase

Journal of Molecular Biology, 1989

The available evidence suggests that during the process of formation of a functional or "open" complex at a promoter, Escherichia coli RNA polymerase transiently realigns the two contacted regions of the promoter, thus stressing the intervening spacer DNA. We tested the possibility that this process plays an active role in the formation of an open complex. Two series of promoters were examined: one with spacer DNAs of 15 to 19 basepairs and a derivative for which the promoters additionally contained a one-base gap in the spacer, so as to relieve any stress imposed on the DNA. Consistent with an active role for the stressed DNA in driving open complex formation, we have found that for promoters with a 17-base-pair spacer, the presence of a gap leads to a delay in the formation of an open complex, at a step subsequent to the initial binding of RNA polymerase to the promoter. The results with the other gapped promoters rule out direct binding of RNA polymerase to the region of the gap and indicate an increased flexibility in the gapped DNA. As not all observations with the spacer length series of gapped and ungapped promoters can be interpreted in terms of an active role of the spacer DNA without additional assumptions, such a role must still be considered tentative.

DNA structural and physical properties reveal peculiarities in promoter sequences of the bacterium Escherichia coli K-12

SN Applied Sciences, 2021

The gene transcription of bacteria starts with a promoter sequence being recognized by a transcription factor found in the RNAP enzyme, this process is assisted through the conservation of nucleotides as well as other factors governing these intergenic regions. Faced with this, the coding of genetic information into physical aspects of the DNA such as enthalpy, stability, and base-pair stacking could suggest promoter activity as well as protrude differentiation of promoter and non-promoter data. In this work, a total of 3131 promoter sequences associated to six different sigma factors in the bacterium E. coli were converted into numeric attributes, a strong set of control sequences referring to a shuffled version of the original sequences as well as coding regions is provided. Then, the parameterized genetic information was normalized, exhaustively analyzed through statistical tests. The results suggest that strong signals in the promoter sequences match the binding site of transcri...

Promoter recognition by Escherichia coli RNA polymerase. Effects of single base pair deletions and insertions in the spacer DNA separating the -10 and -35 regions are dependent on spacer DNA sequence

Biochemistry, 1993

Escherichia coli promoters for transcription of ribosomal and tRNAs are greatly activated by an A+T-rich "UP" element upstream of the -35 region. These same promoters have also been found to otherwise deviate in several respects from the consensus promoter sequence. Here we present the results of a kinetic characterization of the interaction of Escherichia coli RNA polymerase with UP elementcontaining promoters which by virtue of consensus or near-consensus sequence features should be among the most optimal that can be encountered by Escherichia coli RNA polymerase. We show that for such promoters, (1) the second-order rate constant describing formation of the initial (closed) complex is close to that expected for a diffusion-limited process, (2) the extent of activation by the UP element is temperaturesensitive, (3) the UP element accelerates a process after DNA binding by RNA polymerase, and (4) the presence of the UP element delays promoter clearance upon addition of nucleoside triphosphates to preformed RNA polymerase-promoter complexes. Finally, we provide evidence in support of models which describe the DNA melting process accompanying open complex formation as initiating in the -10 promoter region and progressing in the downstream direction.

Promoters Recognized by Escherichia coli RNA Polymerase Selected by Function: Highly Efficient Promoters from Bacteriophage T5 Downloaded from

JOURNAL OF BACTERIOLOGY, Oct. p. 70-77, 1985

Highly efficient promoters of coliphage T5 were identified by selecting for functional properties. Eleven such promoters belonging to all three expression classes of the phage were analyzed. Their average AT content was 75% and reached 83% in subregions of the sequences. Besides the well-known conserved sequences around-10 and-33, they exhibited homologies outside the region commonly considered to be essential for promoter function. Interestingly, the consensus hexamers around-10 (TAT AAT) and-35 (TTG ACA) were never found simultaneously within the sequence of highly efficient promoters. Several of these promoters compete extremely well for Escherichia coli RNA polymerase and can be usedfor the efficient in vitro synthesis of defined RNA species. In addition, some of these promoters accept 7-mGpppA as the starting dinucleotide, thus producing capped mRNA in vitro which can be utilized in various eucaryotic translation systems. Promoters of the Escherichia coli system start synthesis of functional RNAs with vastly different efficiencies. Little is known, however, about the rules by which functional parameters are implemented within a promoter sequence. Despite our knowledge of more than 150 promoter sequences (10) and a wealth of genetic and biochemical data (19), we are still unable to make reasonable predictions on functional properties of a promoter from structural information alone. Consensus sequences of E. coli promoters derived from sequence compilations, have elucidated some important general features. However, synthesis of consensus promoters (5) have resulted in signals which are, at most, average in function (U. Deuschle and M. Kammerer, personal communication). This is not surprising if one considers the complexity of the process programmed by a promoter sequence as well as the fact that in the derivation of consensus sequences there is usually no value describing functional parameters given to individual sequences. We approached this problem in a different way. By selecting for the most efficient unregulated promoters in the E. coli system, we expected to reveal sequences which would exhibit pertinent structural features most clearly. The selection principles utilized for the identification of efficient promoters were the determination of (i) the rate of complex formation between RNA polymerase and promoter in vitro, (ii) the relative efficiency of RNA synthesis in vitro under competitive conditions, and (iii) the relative promoter strength in vivo. The in vitro analysis of promoter-carrying DNA fragments has been described previously (6, 7). For the in vivo study of promoters we developed cloning systems which allow the stable integration of strong promoters as well as the precise determination of their in vivo function (9, 21; U. Deuschle, M.S. thesis, University of Heidelberg, 1984). Of about 60 promoters tested (including those of coliphage T7, fd, and X) some of the most efficient signals were found in the genome of coliphage T5. Here we describe the application of the * Corresponding author. t Present address: F. Hoffmann-La Roche & Co. A.G., ZFE CH 4002 Basel, Switzerland. pDS1 vector system (21; Fig. 1) for the selective cloning of strong promoters, the identification and structural analysis of 11 promoters of the phage T5 genome, and some of the functional properties of these promoters. As can be seen from the results of this and previous studies (9), several promoters described here appear especially useful for the efficient in vitro synthesis of defined RNA species, and as some of the promoters accept 7-mGpppA as the starting dinucleotide capped RNAs can be directly obtained in vitro. This transcription-coupled capping allows an efficient and selective expression of cloned DNA sequences in vitro which has been found to be especially useful in studying the translocation of proteins into or through membranes (11, 23). MATERIALS AND METHODS Enzymes and chemicals. Restriction enzymes, T4 DNA ligase, calf intestinal alkaline phosphatase, and RNase Ti were purchased from Bethesda Research Laboratories, Gaithersburg, Md.; New England Biolabs, Inc., Beverly, Mass.; or Boehringer Mannheim Biochemicals, Indianapolis, Ind.; and T4 DNA kinase was obtained from H. Schaller (University of Heidelberg). Reactions were carried out as recommended by the supplier. The isolation of bacteriophage T5 DNA and E. coli RNA polymerase has been described previously (7). XhoI synthetic linkers were obtained from Collaborative Research, Inc., (Waltham, Mass.) and were present in ligation assays in a 20-fold molar excess relative to that of the various DNA fragments. [-y-32P]ATP and [a-32P] UTP were from Amersham & Buchler (Braunschweig, Federal Republic of Germany) and 7-mGpppA was obtained from P-L Biochemicals, Milwaukee, Wis. Plasmids and their nomenclature. The basic pDS1 vector system has been described previously, and here we follow previously proposed nomenclature (21). The identity of the promoters and terminators which have been integrated can be derived from the designation of the plasmid: pDS1/ PH207,tol describes a plasmid-carrying promoter PH207 in front of the coding sequence (dhfr) for dihydrofolate reductase (DHFR) and terminator to from phage lambda at site 1 (Fig. 1). Another terminator used was tfd from coli-phage fd (9).

DNA sequence elements located immediately upstream of the -10 hexamer in Escherichia coli promoters: a systematic study

Nucleic Acids Research, 2000

We have made a systematic study of how the activity of an Escherichia coli promoter is affected by the base sequence immediately upstream of the-10 hexamer. Starting with an activator-independent promoter, with a 17 bp spacing between the-10 and-35 hexamer elements, we constructed derivatives with all possible combinations of bases at positions-15 and-14. Promoter activity is greatest when the 'non-template' strand carries T and G at positions-15 and-14, respectively. Promoter activity can be further enhanced by a second T and G at positions-17 and-16, respectively, immediately upstream of the first 'TG motif'. Our results show that the base sequence of the DNA segment upstream of the-10 hexamer can make a significant contribution to promoter strength. Using published collections of characterised E.coli promoters, we have studied the frequency of occurrence of 'TG motifs' upstream of the promoters'-10 elements. We conclude that correctly placed 'TG motifs' are found at over 20% of E.coli promoters.

Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability

Journal of Biosciences, 2007

Analysis of various predicted structural properties of promoter regions in prokaryotic as well as eukaryotic genomes had earlier indicated that they have several common features, such as lower stability, higher curvature and less bendability, when compared with their neighboring regions. Based on the difference in stability between neighboring upstream and downstream regions in the vicinity of experimentally determined transcription start sites, a promoter prediction algorithm has been developed to identify prokaryotic promoter sequences in whole genomes. The average free energy (E) over known promoter sequences and the difference (D) between E and the average free energy over the entire genome (G) are used to search for promoters in the genomic sequences. Using these cutoff values to predict promoter regions across entire Escherichia coli genome, we achieved a reliability of 70% when the predicted promoters were cross verified against the 960 transcription start sites (TSSs) listed in the Ecocyc database. Annotation of the whole E. coli genome for promoter region could be carried out with 49% accuracy. The method is quite general and it can be used to annotate the promoter regions of other prokaryotic genomes.