Discovering Cis-Regulatory Modules by Optimizing Barbecues (original) (raw)
Related papers
Computational detection of cis-regulatory modules
2003
The transcriptional regulation of a metazoan gene depends on the cooperative action of multiple transcription factors that bind to cis-regulatory modules (CRMs) located in the neighborhood of the gene. By integrating multiple signals, CRMs confer an organism specific spatial and temporal rate of transcription. Results: Based on the hypothesis that genes that are needed in exactly the same conditions might share similar regulatory switches, we have developed a novel methodology to find CRMs in a set of coexpressed or coregulated genes. The ModuleSearcher algorithm finds for a given gene set the best scoring combination of transcription factor binding sites within a sequence window using an A * procedure for tree searching. To keep the level of noise low, we use DNA sequences that are most likely to contain functional cis-regulatory information, namely conserved regions between human and mouse orthologous genes. The ModuleScanner performs genomic searches with a predicted CRM or with a user-defined CRM known from the literature to find possible target genes. The validity of a set of putative targets is checked using Gene Ontology annotations. We demonstrate the use and effectiveness of the ModuleSearcher and ModuleScanner algorithms and test their specificity and sensitivity on semi-artificial data. Next, we search for a module in a cluster of gene expression profiles of human cell cycle genes. Availability: The ModuleSearcher is available as a web service within the TOUCAN workbench for regulatory sequence analysis, which can be downloaded from http: //www.esat.kuleuven.ac.be/∼dna/BioI.
Stubb: a program for discovery and analysis of cis-regulatory modules
Nucleic Acids Research, 2006
Given the DNA-binding specificities (motifs) of one or more transcription factors, an important bioinformatics problem is to discover significant clusters of binding sites for the transcription factors(s). Such clusters often correspond to cis-regulatory modules mediating regulation of an adjacent gene. In earlier work, we developed the Stubb program that uses a probabilistic model and a maximum likelihood approach to efficiently detect cis-regulatory modules over genomic scales. It may optionally exploit a second related genome to improve module prediction accuracy. We describe here the use of a web-based interface for the Stubb program. The interface is equipped with a special post-processing step for in-depth analysis of specific modules, in order to reveal individual binding sites predicted in the module. The web server may be accessed at the
Nucleic Acids Research, 2013
In higher organisms, gene regulation is controlled by the interplay of non-random combinations of multiple transcription factors (TFs). Although numerous attempts have been made to identify these combinations, important details, such as mutual positioning of the factors that have an important role in the TF interplay, are still missing. The goal of the present work is in silico mapping of some of such associating factors based on their mutual positioning, using computational screening. We have selected the process of myogenesis as a study case, and we focused on TF combinations involving master myogenic TF Myogenic differentiation (MyoD) with other factors situated at specific distances from it. The results of our work show that some musclespecific factors occur together with MyoD within the range of ±100 bp in a large number of promoters. We confirm co-occurrence of the MyoD with musclespecific factors as described in earlier studies. However, we have also found novel relationships of MyoD with other factors not specific for muscle. Additionally, we have observed that MyoD tends to associate with different factors in proximal and distal promoter areas. The major outcome of our study is establishing the genome-wide connection between biological interactions of TFs and close co-occurrence of their binding sites.
CREME: Cis-Regulatory Module Explorer for the human genome
Nucleic Acids Research, 2004
The binding of transcription factors to specific regulatory sequence elements is a primary mechanism for controlling gene transcription. Eukaryotic genes are often regulated by several transcription factors whose binding sites are tightly clustered and form cis-regulatory modules. In this paper, we present a web server, CREME, for identifying and visualizing cis-regulatory modules in the promoter regions of a given set of potentially co-regulated genes. CREME relies on a database of putative transcription factor binding sites that have been annotated across the human genome using a library of position weight matrices and evolutionary conservation with the mouse and rat genomes. A search algorithm is applied to this data set to identify combinations of transcription factors whose binding sites tend to co-occur in close proximity in the promoter regions of the input gene set. The identified cis-regulatory modules are statistically scored and significant combinations are reported and graphically visualized. Our web server is available at http://creme.dcode.org.
Identifying the conserved network of cis-regulatory sites of a eukaryotic genome
Proceedings of the National Academy of Sciences, 2005
A major focus of genome research has been to decipher the cis-regulatory code that governs complex transcriptional regulation. We report a computational approach for identifying conserved regulatory motifs of an organism directly from whole genome sequences of several related species without reliance on additional information. We first construct phylogenetic profiles for each promoter, then use a blast -like algorithm to efficiently search through the entire profile space of all of the promoters in the genome to identify conserved motifs and the promoters that contain them. Statistical significance is estimated by modified Karlin–Altschul statistics. We applied this approach to the analysis of 3,524 Saccharomyces cerevisiae promoters and identified a highly organized regulatory network involving 3,315 promoters and 296 motifs. This network includes nearly all of the currently known motifs and covers >90% of known transcription factor binding sites. Most of the predicted coregulat...
2011
Gene expression regulation is an intricate, dynamic phenomenon essential for all biological functions. The necessary instructions for gene expression are encoded in cisregulatory elements that work together and interact with the RNA polymerase to confer spatial and temporal patterns of transcription. Therefore, the identification of these elements is currently an active area of research in computational analysis of regulatory sequences. However, the problem is difficult since the combinatorial interactions between the regulating factors can be very complex. Here we present a web server that identifies cis-regulatory modules given a set of transcription factor binding sites and, additionally, also RNA polymerase sites for a group of genes.
Genome Research, 2001
We report a simple new algorithm, cis/TF, that uses genomewide expression data and the full genomic sequence to match transcription factors to their binding sites. Most previous computational methods discovered binding sites by clustering genes having similar expression patterns and then identifying over-represented subsequences in the promoter regions of those genes. By contrast, cis/TF asserts that B is a likely binding site of a transcription factor T if the expression pattern of T is correlated to the composite expression patterns of all genes containing B, even when those genes are not mutually correlated. Thus, our method focuses on binding sites rather than genes. The algorithm has successfully identified experimentally-supported transcription factor binding relationships in tests on several data sets from Saccharomyces cerevisiae.
Nucleic Acids Research, 2004
information and high-throughput methods to measure gene expression levels open the door to explore transcriptional regulation using computational tools. Combinatorial regulation and sparseness of regulatory elements throughout the genome allow organisms to control the spatial and temporal patterns of gene expression. Here we study the organization of cis-regulatory elements in sets of co-regulated genes. We build an algorithm to search for combinations of transcription factor binding sites that are enriched in a set of potentially co-regulated genes with respect to the whole genome. No knowledge is assumed about involvement of speci®c sets of transcription factors. Instead, the search is exhaustively conducted over combinations of up to four binding sites obtained from databases or motif search algorithms. We evaluate the performance on random sets of genes as a negative control and on three biologically validated sets of co-regulated genes in yeasts,¯ies and humans. We show that we can detect DNA regions that play a role in the control of transcription. These results shed light on the structure of transcription regulatory regions in eukaryotes and can be directly applied to clusters of co-expressed genes obtained in gene expression studies. Supplementary information is available at
Statistical significance of cis-regulatory modules
BMC bioinformatics, 2007
It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning. We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to...
In Silico Analysis of Cis-Regulatory Elements on Co-Expressed Genes
Cis-regulatory elements (CREs) are regions of non-coding DNA that regulate the transcription of nearby genes. CREs typically regulate gene transcription by functioning as binding sites for transcription factors. Publicly available database of co-expressed gene sets would be valuable tools for a wide variety of experimental designs, including targeting of genes for functional identification or for regulatory investigation. The study of CREs effect on expression can improve our understanding of co-expression genes and gene networks. In present study we compared the correlation between expression and CREs in co-expression genes with LTP5 by using the Genevestigator database that provides co-expressed genes deduced from microarray data, and SCOPE that uses to find CRDs in 800bp of upstream of the DNA sequences of co-expression genes. The result revealed that three motifs (TGSCAB, ATWTGYMG and CBTATC) of PRISM algorithm, GCCAC motif of BEAM and ATTGNVANNYGG motif of SPACER algorithm distribute in promoter of co-expressed genes and LUX transcription factor was identified by UniPROB database. We present here a new comparison method for detecting key cis-regulatory elements that effect on co-expressed genes to find a relation to clarify the function and regulation of particular genes and gene networks under stress conditions.