Generalized analysis of promoters (GAP): A method for dna sequence description (original) (raw)
Related papers
A Hybrid Promoter Analysis Methodology for Prokaryotic Genomes
2009
One of the biggest challenges in genomics is the elucidation of the design principles controlling gene expression. Current approaches examine promoter sequences for particular features, such as the presence of binding sites for a transcriptional regulator, and identify recurrent relationships among these features termed network motifs. To define the expression dynamics of a group of genes, the strength of the connections in a network must be specified, and these are determined by the cis-promoter features participating in the regulation. Approaches that homogenize features among promoters (e.g., relying on consensuses to describe the various promoter features) and even across species hamper the discovery of the key differences that distinguish promoters that are co-regulated by the same transcriptional regulator. Thus, we have developed a model-based approach to analyze proteobacterial genomes for promoter features that is specifically designed to account for the variability in sequence, location and topology intrinsic to differential gene expression. We applied our method to characterize network motifs controlled by the PhoP/PhoQ regulatory system of Escherichia coli and Salmonella enterica serovar Typhimurium. We identify key features that enable the PhoP protein to produce distinct kinetic patterns in target genes, which could not have been uncovered just by inspecting network motifs.
A novel sequence and context based method for promoter recognition
Bioinformation, 2014
Identification of promoters in DNA sequence using computational techniques is a significant research area because of its direct association in transcription regulation. A wide range of algorithms are available for promoter prediction. Most of them are polymerase dependent and cannot handle eukaryotes and prokaryotes alike. This study proposes a polymerase independent algorithm, which can predict whether a given DNA fragment is a promoter or not, based on the sequence features and statistical elements. This algorithm considers all possible pentamers formed from the nucleotides A, C, G, and T along with CpG islands, TATA box, initiator elements, and downstream promoter elements. The highlight of the algorithm is that it is not polymerase specific and can predict for both eukaryotes and prokaryotes in the same computational manner even though the underlying biological mechanisms of promoter recognition differ greatly. The proposed Method, Promoter Prediction System-PPS-CBM achieved a sensitivity, specificity, and accuracy percentages of 75.08, 83.58 and 79.33 on E. coli data set and 86.67, 88.41 and 87.58 on human data set. We have developed a tool based on PPS-CBM, the proposed algorithm, with which multiple sequences of varying lengths can be tested simultaneously and the result is reported in a comprehensive tabular format. The tool also reports the strength of the prediction.
Nucleotide patterns aiding in prediction of eukaryotic promoters
PloS one, 2017
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into "promoters" and "non-promoters" even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription facto...
Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes
BMC Bioinformatics, 2008
BACKGROUND: Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. RESULTS: We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I sigma70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the alpha subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the sigma70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. CONCLUSION: The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance
Characterization of Prokaryotic and Eukaryotic Promoters Using Hidden Markov Models
1996
In this paper we utilize hidden Markov models (HMMs) and information theory to analyze prokaryotic and eukaryotic promoters. We perform this analysis with special emphasis on the fact that promoters are divided into a number of different classes, depending on which polymeraseassociated factors that bind to them. We find that HMMs trained on such subclasses of Escherichia coli promoters (specifically, the socalled a r° and a 54 classes) give an excellent classification of unknown promoters with respect to sigma-class.
Curvature and flexibility as promoter regions classifiers in gram-negative bacteria
F1000Research, 2012
The gene expression control is a fundamental process in cellular activities, performed through the interaction of multiple regulatory mechanisms. The proper regulation of transcription is crucial for a single-cell prokaryote since its environment can change dramatically and instantly. The promoters are recognized as one of the transcription regulatory regions, since recruit the transcriptional machinery through the binding of regulatory proteins in their DNA sequences. The characterizing promoter regions in silico has difficulties, since these elements are short and degenerated, providing a high probability of finding similar sequences in other parts of the genome. Therefore, the embedding of structural characteristics can increase the accuracy of prediction methods [1-2]. In bacteria, RNApolymerase holoenzyme is responsible for promoter recognition and the gene expression starts. This enzyme consists of five subunits (2α, β, β', ω) and an additional sigma (σ) subunit factor. A collection of different σ subunits act as key regulators of bacterial gene expression. The substitution of one σ factor by another can initiate the transcription of different groups of genes [3]. A promoter sequence is characterized by the presence of two conserved DNA elements called-10 and-35 (upstream). These elements are defined according to the distance which have in relation to the transcriptional start site (position 1) and are represented by TATAAT-TTGACA nucleotides [4]. The upstream region (promoter) has distinct sequence properties compared to downstream region (non-promoter), such as differences in the structural characteristics of flexibility, stability and curvature [5]. Artificial neural networks (ANNs) have been widely used in nucleic acid sequences analysis, since they present ability to recognize and classify quantitative and qualitative patterns in data analysis [6]. This work aims to predict, recognize and characterize promoter regions recognized by sigma factor 28 (σ28) employing an approach of artificial neural networks using as input parameter curvature and flexibility data of the sequence.
Journal of Biosciences, 2007
Analysis of various predicted structural properties of promoter regions in prokaryotic as well as eukaryotic genomes had earlier indicated that they have several common features, such as lower stability, higher curvature and less bendability, when compared with their neighboring regions. Based on the difference in stability between neighboring upstream and downstream regions in the vicinity of experimentally determined transcription start sites, a promoter prediction algorithm has been developed to identify prokaryotic promoter sequences in whole genomes. The average free energy (E) over known promoter sequences and the difference (D) between E and the average free energy over the entire genome (G) are used to search for promoters in the genomic sequences. Using these cutoff values to predict promoter regions across entire Escherichia coli genome, we achieved a reliability of 70% when the predicted promoters were cross verified against the 960 transcription start sites (TSSs) listed in the Ecocyc database. Annotation of the whole E. coli genome for promoter region could be carried out with 49% accuracy. The method is quite general and it can be used to annotate the promoter regions of other prokaryotic genomes.
A novel method SEProm for prokaryotic promoter prediction based on DNA structure and energetics
Bioinformatics
Motivation Despite conservation in general architecture of promoters and protein–DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. Results Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes in...