Speeding Cis-Trans Regulation Discovery by Phylogenomic Analyses Coupled with Screenings of an Arrayed Library of Arabidopsis Transcription Factors (original) (raw)

PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences

Nucleic Acids Research, 2002

PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.

Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes

PLANT PHYSIOLOGY, 2003

The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5Ј-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/ CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.

Genome wide analysis of Arabidopsis core promoters

BMC Genomics, 2005

Background Core promoters are the gene regulatory regions most proximal to the transcription start site (TSS), central to the formation of pre-initiation complexes and for combinatorial gene regulation. The DNA elements required for core promoter function in plants are poorly understood. To establish the sequence motifs that characterize plant core promoters and to compare them to the corresponding sequences in animals, we took advantage of available full-length cDNAs (FL-cDNAs) and predicted upstream regulatory sequences to carry out the analysis of 12,749 Arabidopsis core promoters. Results Using a combination of expectation maximization and Gibbs sampling methods, we identified several motifs overrepresented in Arabidopsis core promoters. One of them corresponded to the TATA element, for which an in-depth analysis resulted in the generation of robust TATA Nucleotide Frequency Matrices (NFMs) capable of predicting Arabidopsis TATA elements with a high degree of confidence. We established that approximately 29% of all Arabidopsis promoters contain TATA motifs, clustered around position -32 with respect to the TSS. The presence of TATA elements was associated with genes represented more frequently in EST collections and with shorter 5' UTRs. No cis -elements were found over-represented in TATA-less, compared to TATA-containing promoters. Conclusion Our studies provide a first genome-wide illustration of the composition and structure of core Arabidopsis promoters. The percentage of TATA-containing promoters is much lower than commonly recognized, yet comparable to the number of Drosophila promoters containing a TATA element. Although several other DNA elements were identified as over-represented in Arabidopsis promoters, they are present in only a small fraction of the genes and they represent elements not previously described in animals, suggesting a distinct architecture of the core promoters of plant and animal genes.

A comprehensive map of preferentially located motifs reveals distinct proximal cis-regulatory sequences in plants

Frontiers in Plant Science, 2022

Identification of cis-regulatory sequences controlling gene expression is an arduous challenge that is being actively explored to discover key genetic factors responsible for traits of agronomic interest. Here, we used a genome-wide de novo approach to investigate preferentially located motifs (PLMs) in the proximal cis-regulatory landscape of Arabidopsis thaliana and Zea mays. We report three groups of PLMs in both the 5'-and 3'-gene-proximal regions and emphasize conserved PLMs in both species, particularly in the 3'gene-proximal region. Comparison with resources from transcription factor and microRNA binding sites shows that 79% of the identified PLMs are unassigned, although some are supported by MNase-defined cistrome occupancy analysis. Enrichment analyses further reveal that unassigned PLMs provide functional predictions that differ from those derived from transcription factor and microRNA binding sites. Our study provides a comprehensive map of PLMs and demonstrates their potential utility for future characterization of orphan genes in plants.

READS - A Resource for Plant Non-coding Regulatory Sequence Analysis

Plant Tissue Culture and Biotechnology, 2011

Identification and analysis of regulatory sequences that control gene expression can be greatly facilitated by database-assisted bioinformatic approaches. READS (Regulatory Element Analysis DatabaSe) has been created as a web-accessible freely available database of plant non-coding regulatory sequences. It currently contains more than 300 known and putative promoters of constitutive as well as stress inducible genes belonging to diverse plants. The database has been manually curated with promoters collected mainly from scientific publications, thereafter cross-referenced with other resources (NCBI database, PubMed, PubMed Central). A user-friendly interface has been provided to allow easy access and analysis of data using different query options. A blast utility has also been provided, allowing users to search against all entries in the database. For each promoter, certain features such as expression data, GC content, core elements etc., were provided to assist in characterization of the regulatory sequences. To our knowledge, READS is the first plant promoter database that allows retrieval of sequences based on expression pattern. Thus the database can be utilized as a useful resource for identification of important putative regulatory cis-elements in promoters by analysis of upstream regions of hundreds of coregulated or co-expressed genes. Such knowledge can also be of use for identifying minimal or stress inducible promoters for effective transgene expression. We aim to provide the most up-to-date collection of promoters of well-characterized stress inducible and constitutively expressed genes from many plant species. Hence, this resource will be updated regularly to incorporate new sequences. READs is available at http://www.pbtlabdu.net/READS/.

Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences

Bioinformatics, 2005

To better understand the regulatory networks that control plant gene expression, tools are needed to systematically analyze and visualize promoter regulatory sequences in Arabidopsis thaliana. We have developed the Athena database, which contains 30 067 predicted Arabidopsis promoter sequences and consensus sequences for 105 previously characterized transcription factor (TF) binding sites. Athena provides four novel tools to facilitate the analysis of promoter sequences: a promoter visualization tool to enable the rapid inspection of key regulatory sequences in multiple promoters; a TF binding site enrichment tool to identify statistically over-represented TF sites occurring in a user-selected subset of promoters; a data-mining tool to rapidly select promoter sequences containing the specified combination of TF binding sites; and a tool to display the distribution of TF binding site positions in a selected set of promoter sequences.

AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors

BMC Bioinformatics, 2003

The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cisregulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002).

In Silico Prediction of Regulatory Elements and Corresponding Protein-Dna Interactions in Plant Promoters

2013

The importance of cis or trans acting regulatory elements in gene regulation is quite obvious. Exploring these elements in vivo demands extensive experimentation and is time intensive. In silico methods of predicting these elements have been developed in this regard. In present study around 300 promoters belonging to monocots, dicots and algae were analysed through Consite tool for prediction of regulatory elements. Many putative regulatory elements of diverse functions were found in these promoters. In monocots, TATA-binding proteins (TBP), in dicots, hunchback and in Algae, aryl hydrocarbon receptor nuclear translocator (ARNT) were abundantly represented with 55, 33 and 86% respectively. It was observed that all three plant groups exhibited different families of transcription factors like basic helix-loop-helix (bHLH), basic helixloop-helix leucine zipper (bHLH-ZIP), Forkhead, RUNT, HOMEO-ZIP, zinc finger (ZN-FINGER), REL, Nuclear receptor, MADS, bZIP and TATA-box. Moreover, selec...

Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks

PLANT PHYSIOLOGY, 2009

Analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has demonstrated that genes with an overall similar expression pattern are often enriched for similar functions. This guilt-by-association principle can be applied to define modular gene programs, identify cis-regulatory elements, or predict gene functions for unknown genes based on their coexpression neighborhood. We evaluated the potential to use Gene Ontology (GO) enrichment of a gene's coexpression neighborhood as a tool to predict its function but found overall low sensitivity scores (13%-34%). This indicates that for many functional categories, coexpression alone performs poorly to infer known biological gene functions. However, integration of cis-regulatory elements shows that 46% of the gene coexpression neighborhoods are enriched for one or more motifs, providing a valuable complementary source to functionally annotate genes. Through the integration of coexpression data, GO annotations, and a set of known cis-regulatory elements combined with a novel set of evolutionarily conserved plant motifs, we could link many genes and motifs to specific biological functions. Application of our coexpression framework extended with cis-regulatory element analysis on transcriptome data from the cell cycle-related transcription factor OBP1 yielded several coexpressed modules associated with specific cis-regulatory elements. Moreover, our analysis strongly suggests a feed-forward regulatory interaction between OBP1 and the E2F pathway. The ATCOECIS resource (http:// bioinformatics.psb.ugent.be/ATCOECIS/) makes it possible to query coexpression data and GO and cis-regulatory element annotations and to submit user-defined gene sets for motif analysis, providing an access point to unravel the regulatory code underlying transcriptional control in Arabidopsis (Arabidopsis thaliana).