A comprehensive map of preferentially located motifs reveals distinct proximal cis-regulatory sequences in plants (original) (raw)

Plant-PLMview: a database for identifyingcis-regulatory sequences with preferential positions in gene-proximal regions of plants

BackgroundEstablishing relationships between transcription factors and target genes is essential for understanding the mechanisms regulating gene expression, which play a fundamental role in plant adaptation to the local environment. Despite the importance of this research area and the tremendous progress in sequencing methods such as ChIP-seq and DAP-seq, we are still far from a complete reconstruction of thecis-regulatory landscape. Only a small number of transcription factors can be assessed from experimental data to identify theircis-regulatory binding sites. This highlights the role thatin silicoapproaches can play to complement experimental data.ResultsWe have developed Plant-PLMview, a web-accessible database for detecting preferentiallycis-regulatory sequences in the gene-proximal regions of 20 plant species. Users of Plant-PLMview can (i) access the proximal regions of controlled genes from 20 plant species, (ii) query their own DNA motifs or access 840cis-regulatory sequen...

Speeding Cis-Trans Regulation Discovery by Phylogenomic Analyses Coupled with Screenings of an Arrayed Library of Arabidopsis Transcription Factors

PLoS ONE, 2011

Transcriptional regulation is an important mechanism underlying gene expression and has played a crucial role in evolution. The number, position and interactions between cis-elements and transcription factors (TFs) determine the expression pattern of a gene. To identify functionally relevant cis-elements in gene promoters, a phylogenetic shadowing approach with a lipase gene (LIP1) was used. As a proof of concept, in silico analyses of several Brassicaceae LIP1 promoters identified a highly conserved sequence (LIP1 element) that is sufficient to drive strong expression of a reporter gene in planta. A collection of ca. 1,200 Arabidopsis thaliana TF open reading frames (ORFs) was arrayed in a 96-well format (RR library) and a convenient mating based yeast one hybrid (Y1H) screening procedure was established. We constructed an episomal plasmid (pTUY1H) to clone the LIP1 element and used it as bait for Y1H screenings. A novel interaction with an HD-ZIP (AtML1) TF was identified and abolished by a 2 bp mutation in the LIP1 element. A role of this interaction in transcriptional regulation was confirmed in planta. In addition, we validated our strategy by reproducing the previously reported interaction between a MYB-CC (PHR1) TF, a central regulator of phosphate starvation responses, with a conserved promoter fragment (IPS1 element) containing its cognate binding sequence. Finally, we established that the LIP1 and IPS1 elements were differentially bound by HD-ZIP and MYB-CC family members in agreement with their genetic redundancy in planta. In conclusion, combining in silico analyses of orthologous gene promoters with Y1H screening of the RR library represents a powerful approach to decipher cis-and trans-regulatory codes.

Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana

BMC Bioinformatics, 2011

Background: Several large-scale gene co-expression networks have been constructed successfully for predicting gene functional modules and cis-regulatory elements in Arabidopsis (Arabidopsis thaliana). However, these networks are usually constructed and analyzed in an ad hoc manner. In this study, we propose a completely parameter-free and systematic method for constructing gene co-expression networks and predicting functional modules as well as cis-regulatory elements. Results: Our novel method consists of an automated network construction algorithm, a parameter-free procedure to predict functional modules, and a strategy for finding known cis-regulatory elements that is suitable for consensus scanning without prior knowledge of the allowed extent of degeneracy of the motif. We apply the method to study a large collection of gene expression microarray data in Arabidopsis. We estimate that our coexpression network has~94% of accuracy, and has topological properties similar to other biological networks, such as being scale-free and having a high clustering coefficient. Remarkably, among the~300 predicted modules whose sizes are at least 20, 88% have at least one significantly enriched functions, including a few extremely significant ones (ribosome, p < 1E-300, photosynthetic membrane, p < 1.3E-137, proteasome complex, p < 5.9E-126). In addition, we are able to predict cis-regulatory elements for 66.7% of the modules, and the association between the enriched cis-regulatory elements and the enriched functional terms can often be confirmed by the literature. Overall, our results are much more significant than those reported by several previous studies on similar data sets. Finally, we utilize the co-expression network to dissect the promoters of 19 Arabidopsis genes involved in the metabolism and signaling of the important plant hormone gibberellin, and achieved promising results that reveal interesting insight into the biosynthesis and signaling of gibberellin. Conclusions: The results show that our method is highly effective in finding functional modules from real microarray data. Our application on Arabidopsis leads to the discovery of the largest number of annotated Arabidopsis functional modules in the literature. Given the high statistical significance of functional enrichment and the agreement between cis-regulatory and functional annotations, we believe our Arabidopsis gene modules can be used to predict the functions of unknown genes in Arabidopsis, and to understand the regulatory mechanisms of many genes.

PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences

Nucleic Acids Research, 2002

PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC and MEDLINE databases are provided when available. Data about the transcription sites are extracted mainly from the literature, supplemented with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are given as well. New features have been implemented to search for plant cis-acting regulatory elements in a query sequence. Furthermore, links are now provided to a new clustering and motif search method to investigate clusters of co-expressed genes. New regulatory elements can be sent automatically and will be added to the database after curation. The PlantCARE relational database is available via the World Wide Web at http://sphinx.rug.ac.be:8080/PlantCARE/.

Stable unmethylated DNA demarcates expressed genes and their cis-regulatory space in plant genomes

bioRxiv (Cold Spring Harbor Laboratory), 2020

The genomic sequences of crops continue to be produced at a frenetic pace. However, it remains challenging to develop complete annotations of functional genes and regulatory elements in these genomes. Here, we explore the potential to use DNA methylation profiles to develop more complete annotations. Using leaf tissue in maize, we define ~100,000 unmethylated regions (UMRs) that account for 5.8% of the genome; 33,375 UMRs are found greater than 2 kilobase pairs from genes. UMRs are highly stable in multiple vegetative tissues and they capture the vast majority of accessible chromatin regions from leaf tissue. However, many UMRs are not accessible in leaf (leaf-iUMRs) and these represent a set of genomic regions with potential to become accessible in specific cell types or developmental stages. Leaf-iUMRs often occur near genes that are expressed in other tissues and are enriched for transcription factor (TF) binding sites of TFs that are also not expressed in leaf tissue. The leaf-iUMRs exhibit unique chromatin modification patterns and are enriched for chromatin interactions with nearby genes. The total UMRs space in four additional monocots ranges from 80-120 megabases, which is remarkably similar considering the range in genome size of 271 megabases to 4.8 gigabases. In summary, based on the profile from a single tissue, DNA methylation signatures pinpoint both accessible regions and regions poised to become accessible or expressed in other tissues. UMRs provide powerful filters to distill large genomes down to the small fraction of putative functional genes and regulatory elements. Significance Statement Crop genomes can be very large with many repetitive elements and pseudogenes. Distilling a genome down to the relatively small fraction of regions that are functionally valuable for trait variation can be like looking for needles in a haystack. The unmethylated regions in a genome are highly stable during vegetative development and can reveal the locations of potentially expressed genes or cis-regulatory elements. This approach provides a framework towards complete annotation of genes and discovery of cis-regulatory elements using methylation profiles from only a single tissue. .

Unraveling Transcriptional Control in Arabidopsis Using cis-Regulatory Elements and Coexpression Networks

PLANT PHYSIOLOGY, 2009

Analysis of gene expression data generated by high-throughput microarray transcript profiling experiments has demonstrated that genes with an overall similar expression pattern are often enriched for similar functions. This guilt-by-association principle can be applied to define modular gene programs, identify cis-regulatory elements, or predict gene functions for unknown genes based on their coexpression neighborhood. We evaluated the potential to use Gene Ontology (GO) enrichment of a gene's coexpression neighborhood as a tool to predict its function but found overall low sensitivity scores (13%-34%). This indicates that for many functional categories, coexpression alone performs poorly to infer known biological gene functions. However, integration of cis-regulatory elements shows that 46% of the gene coexpression neighborhoods are enriched for one or more motifs, providing a valuable complementary source to functionally annotate genes. Through the integration of coexpression data, GO annotations, and a set of known cis-regulatory elements combined with a novel set of evolutionarily conserved plant motifs, we could link many genes and motifs to specific biological functions. Application of our coexpression framework extended with cis-regulatory element analysis on transcriptome data from the cell cycle-related transcription factor OBP1 yielded several coexpressed modules associated with specific cis-regulatory elements. Moreover, our analysis strongly suggests a feed-forward regulatory interaction between OBP1 and the E2F pathway. The ATCOECIS resource (http:// bioinformatics.psb.ugent.be/ATCOECIS/) makes it possible to query coexpression data and GO and cis-regulatory element annotations and to submit user-defined gene sets for motif analysis, providing an access point to unravel the regulatory code underlying transcriptional control in Arabidopsis (Arabidopsis thaliana).

Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics

Genome biology, 2006

Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be...

Large-Scale cis-Element Detection by Analysis of Correlated Expression and Sequence Conservation between Arabidopsis and Brassica oleracea

PLANT PHYSIOLOGY, 2006

The rapidly increasing amount of plant genomic sequences allows for the detection of cis-elements through comparative methods. In addition, large-scale gene expression data for Arabidopsis (Arabidopsis thaliana) have recently become available. Coexpression and evolutionarily conserved sequences are criteria widely used to identify shared cis-regulatory elements. In our study, we employ an integrated approach to combine two sources of information, coexpression and sequence conservation. Best-candidate orthologous promoter sequences were identified by a bidirectional best blast hit strategy in genome survey sequences from Brassica oleracea. The analysis of 779 microarrays from 81 different experiments provided detailed expression information for Arabidopsis genes coexpressed in multiple tissues and under various conditions and developmental stages. We discovered candidate transcription factor binding sites in 64% of the Arabidopsis genes analyzed. Among them, we detected experimentally verified binding sites and showed strong enrichment of shared cis-elements within functionally related genes. This study demonstrates the value of partially shotgun sequenced genomes and their combinatorial use with functional genomics data to address complex questions in comparative genomics.

ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis

Nucleic Acids Research, 2007

Publicly available database of co-expressed gene sets would be a valuable tool for a wide variety of experimental designs, including targeting of genes for functional identification or for regulatory investigation. Here, we report the construction of an Arabidopsis thaliana trans-factor and cis-element prediction database (ATTED-II) that provides coregulated gene relationships based on co-expressed genes deduced from microarray data and the predicted cis elements. ATTED-II (http://www.atted. bio.titech.ac.jp) includes the following features: (i) lists and networks of co-expressed genes calculated from 58 publicly available experimental series, which are composed of 1388 GeneChip data in A.thaliana; (ii) prediction of cis-regulatory elements in the 200 bp region upstream of the transcription start site to predict co-regulated genes amongst the co-expressed genes; and (iii) visual representation of expression patterns for individual genes. ATTED-II can thus help researchers to clarify the function and regulation of particular genes and gene networks.