Improved protein-binding microarrays for the identification of DNA-binding specificities of transcription factors (original) (raw)

footprintDB: a database of transcription factors with annotated cis elements and binding interfaces

Bioinformatics, 2013

Motivation: Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. Results: FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value.

MotifAdjuster: a tool for computational reassessment of transcription factor binding site annotations

Genome Biology, 2009

Valuable binding-site annotation data are stored in databases. However, several types of errors can, and do, occur in the process of manually incorporating annotation data from the scientific literature into these databases. Here, we introduce MotifAdjuster http://dig.ipk-gatersleben.de/ MotifAdjuster.html, a tool that helps to detect these errors, and we demonstrate its efficacy on public data sets. Rationale The regulation of gene expression involves a complex system of interacting components in all living organisms [1] and is of fundamental interest, for instance, for cell maintenance and development. One level of regulation is realized by DNAbinding transcription factors (TFs). The DNA-binding domain of a TF is capable of recognizing specific binding sites (BSs) in the promoter regions of its target genes [2]. Binding of a TF can induce (activator) or inhibit (repressor) the transcription of its target genes. The general ability to control a target gene may depend on the BS itself, its strand orientation, and its position with respect to the transcription start site. If other BSs are present, the ability of a TF to bind the DNA may additionally depend on strand orientations and positions of these BSs. One important prerequisite for research on gene regulation is the reliable annotation of BSs. The approximate regions on the double-stranded DNA sequence bound by TFs can be determined by wet-lab experiments such as electrophoretic mobility shift assays (EMSAs) [3], DNAse footprinting [4], enzyme-linked immunosorbent assay (ELISA) [5,6], ChIPchip [7], or mutations of the putative BS and subsequent expression studies. Because TFs bind to double-stranded DNA, the strand annotations of nonpalindromic BSs in the databases are either missing or added, based on manual inspection or predictions from bioinformatics tools such as MEME [8], Gibbs Sampler [9,10], Improbizer [11], SeSiM-CMC [12], or A-GLAM [13]. After wet-lab identification, data about transcriptional gene regulatory interactions, including the annotated BSs, are published in the scientific literature. Subsequently, these data are extracted by curation teams and manually entered into databases on transcriptional gene regulation such as Cory-neRegNet [14], PRODORIC [15], or RegulonDB [16] for prokaryotes, and AGRIS [17], AthaMap [18], CTCFBSDB [19], JASPAR [20], OregAnno [21], SCPD [22], TRANSFAC [23],

Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays

Nature Genetics, 2004

We have developed a new DNA microarray-based technology, termed protein binding microarrays (PBMs), that allows rapid, high-throughput characterization of the in vitro DNA binding site sequence specificities of transcription factors in a single day. Using PBMs, we identified the DNA binding site sequence specificities of the yeast transcription factors Abf1, Rap1, and Mig1. Comparison of these proteins' in vitro binding sites versus their in vivo binding sites indicates that PBM-derived sequence specificities can accurately reflect in vivo DNA sequence specificities. In addition to previously identified targets, Abf1, Rap1, and Mig1 bound to 107, 90, and 75 putative new target intergenic regions, respectively, many of which were upstream of previously uncharacterized open reading frames (ORFs). Comparative sequence analysis indicates that many of these newly identified sites are highly conserved across five sequenced sensu stricto yeast species and thus are likely to be functional in vivo binding sites that potentially are utilized in a conditionspecific manner. Similar PBM experiments will likely be useful in identifying novel cis regulatory elements and transcriptional regulatory networks in various genomes.

cis Element/Transcription Factor Analysis (cis/TF): A Method for Discovering Transcription Factor/cis Element Relationships

Genome Research, 2001

We report a simple new algorithm, cis/TF, that uses genomewide expression data and the full genomic sequence to match transcription factors to their binding sites. Most previous computational methods discovered binding sites by clustering genes having similar expression patterns and then identifying over-represented subsequences in the promoter regions of those genes. By contrast, cis/TF asserts that B is a likely binding site of a transcription factor T if the expression pattern of T is correlated to the composite expression patterns of all genes containing B, even when those genes are not mutually correlated. Thus, our method focuses on binding sites rather than genes. The algorithm has successfully identified experimentally-supported transcription factor binding relationships in tests on several data sets from Saccharomyces cerevisiae.

Identification of functional cis-regulatory elements by sequential enrichment from a randomized synthetic DNA library

BMC Plant Biology, 2013

Background: The identification of endogenous cis-regulatory DNA elements (CREs) responsive to endogenous and environmental cues is important for studying gene regulation and for biotechnological applications but is labor and time intensive. Alternatively, by taking a synthetic biology approach small specific DNA binding sites tailored to the needs of the scientist can be generated and rapidly identified. Results: Here we report a novel approach to identify stimulus-responsive synthetic CREs (SynCREs) from an unbiased random synthetic element (SynE) library. Functional SynCREs were isolated by screening the SynE libray for elements mediating transcriptional activity in plant protoplasts. Responsive elements were chromatin immunoprecipitated by targeting the active Ser-5 phosphorylated RNA polymerase II CTD (Pol II ChIP). Using sequential enrichment, deep sequencing and a bioinformatics pipeline, candidate responsive SynCREs were identified within a pool of constitutively active DNA elements and further validated. These included bonafide biotic/abiotic stress-responsive motifs along with novel SynCREs. We tested several SynCREs in Arabidopsis and confirmed their response to biotic stimuli. Conclusions: Successful isolation of synthetic stress-responsive elements from our screen illustrates the power of the described methodology. This approach can be applied to any transfectable eukaryotic system since it exploits a universal feature of the eukaryotic Pol II.

Using protein-binding microarrays to study transcription factor specificity: homologs, isoforms and complexes

Briefings in Functional Genomics, 2014

Protein^DNAbindingis central to specificityin generegulation, andmethods for characterizing transcription factor (TF)D NA binding remain crucial to studies of regulatory specificity. High-throughput (HT) technologies have revolutionized our ability to characterize protein^DNA binding by significantly increasing the number of binding measurements that can be performed.Protein-bindingmicroarrays (PBMs) are a robust and powerful HT platform for studying DNA-binding specificity of TFs. Analysis of PBM-determined DNA-binding profiles has provided new insight into the scope and mechanisms of TF binding diversity. In this review, we focus specifically on the PBM technique and discuss its application to the study of TF specificity, in particular, the binding diversity of TF homologs and multi-protein complexes.

ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites

Nucleic Acids Research, 2015

Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing.

MATCHTM: a tool for searching transcription factor binding sites in DNA sequences

Nucleic Acids Research, 2003

Match TM is a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. Match TM is closely interconnected and distributed together with the TRANSFAC 1 database. In particular, Match TM uses the matrix library collected in TRANSFAC 1 and therefore provides the possibility to search for a great variety of different transcription factor binding sites. Several sets of optimised matrix cutoff values are built in the system to provide a variety of search modes of different stringency. The user may construct and save his/her specific user profiles which are selected subsets of matrices including default or user-defined cutoff values. Furthermore a number of tissue-specific profiles are provided that were compiled by the TRANSFAC 1