Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data (original) (raw)

Inferring transcriptional regulatory networks from high-throughput data

Bioinformatics, 2007

Motivation: Inferring the relationships between transcription factors (TFs) and their targets has utmost importance for understanding the complex regulatory mechanisms in cellular systems. However, the transcription factor activities (TFAs) cannot be measured directly by standard microarray experiment owing to various posttranslational modifications. In particular, cooperative mechanism and combinatorial control are common in gene regulation, e.g. TFs usually recruit other proteins cooperatively to facilitate transcriptional reaction processes. Results: In this article, we propose a novel method for inferring transcriptional regulatory networks (TRN) from gene expression data based on protein transcription complexes and mass action law. With gene expression data and TFAs estimated from transcription complex information, the inference of TRN is formulated as a linear programming (LP) problem which has a globally optimal solution in terms of L 1 norm error. The proposed method not only can easily incorporate ChIP-Chip data as prior knowledge, but also can integrate multiple gene expression datasets from different experiments simultaneously. A unique feature of our method is to take into account protein cooperation in transcription process. We tested our method by using both synthetic data and several experimental datasets in yeast. The extensive results illustrate the effectiveness of the proposed method for predicting transcription regulatory relationships between TFs with co-regulators and target genes. Availability: The software TRNinfer is available from

A new framework for identifying combinatorial regulation of transcription factors: A case study of the yeast cell cycle

Journal of Biomedical Informatics, 2007

By integrating heterogeneous functional genomic datasets, we have developed a new framework for detecting combinatorial control of gene expression, which includes estimating transcription factor activities using a singular value decomposition method and reducing high-dimensional input gene space by considering genomic properties of gene clusters. The prediction of cooperative gene regulation is accomplished by either Gaussian Graphical Models or Pairwise Mixed Graphical Models. The proposed framework was tested on yeast cell cycle datasets: (1) 54 known yeast cell cycle genes with 9 cell cycle regulators and (2) 676 putative yeast cell cycle genes with 9 cell cycle regulators. The new framework gave promising results on inferring TF-TF and TF-gene interactions. It also revealed several interesting mechanisms such as negatively correlated protein-protein interactions and low affinity protein-DNA interactions that may be important during the yeast cell cycle. The new framework may easily be extended to study other higher eukaryotes.

Learning transcriptional regulation on a genome scale: a theoretical analysis based on gene expression data

Briefings in bioinformatics, 2012

The recent advent of high-throughput microarray data has enabled the global analysis of the transcriptome, driving the development and application of computational approaches to study transcriptional regulation on the genome scale, by reconstructing in silico the regulatory interactions of the gene network. Although there are many in-depth reviews of such 'reverse-engineering' methodologies, most have focused on the practical aspect of data mining, and few on the biological problem and the biological relevance of the methodology. Therefore, in this review, from a biological perspective, we used a set of yeast microarray data as a working example, to evaluate the fundamental assumptions implicit in associating transcription factor (TF)^target gene expression levels and estimating TFs' activity, and further explore cooperative models. Finally we confirm that the detailed transcription mechanism is overly-complex for expression data alone to reveal, nevertheless, future network reconstruction studies could benefit from the incorporation of context-specific information, the modeling of multiple layers of regulation (e.g. micro-RNA), or the development of approaches for context-dependent analysis, to uncover the mechanisms of gene regulation.

Computational Reconstruction of Transcriptional Relationships from ChIP-Chip Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2000

Background: A transcriptional regulatory module (TRM) is a set of genes that is regulated by a common set of transcription factors (TFs). By organizing the genome into TRMs, a living cell can coordinate the activities of many genes and carry out complex functions. Therefore, identifying TRMs is helpful for understanding gene regulation. Results: Integrating gene expression and ChIP-chip data, we develop a method, called MOdule Finding Algorithm (MOFA), for reconstructing TRMs of the yeast cell cycle. MOFA identified 87 TRMs, which together contain 336 distinct genes regulated by 40 TFs. Using various kinds of data, we validated the biological relevance of the identified TRMs. Our analysis shows that different combinations of a fairly small number of TFs are responsible for regulating a large number of genes involved in different cell cycle phases and that there may exist crosstalk between the cell cycle and other cellular processes. MOFA is capable of finding many novel TF-target gene relationships and can determine whether a TF is an activator or/and a repressor. Finally, MOFA refines some clusters proposed by previous studies and provides a better understanding of how the complex expression program of the cell cycle is regulated. Conclusion: MOFA was developed to reconstruct TRMs of the yeast cell cycle. Many of these TRMs are in agreement with previous studies. Further, MOFA inferred many interesting modules and novel TF combinations. We believe that computational analysis of multiple types of data will be a powerful approach to studying complex biological systems when more and more genomic resources such as genome-wide protein activity data and protein-protein interaction data become available.

Identifying regulatory networks by combinatorial analysis of promoter elements

Nature Genetics, 2001

Several computational methods based on microarray data are currently used to study genome-wide transcriptional regulation. Few studies, however, address the combinatorial nature of transcription, a well-established phenomenon in eukaryotes. Here we describe a new approach using microarray data to uncover novel functional motif combinations in the promoters of Saccharomyces cerevisiae. In addition to identifying novel motif combinations that affect expression patterns during the cell cycle, sporulation and various stress responses, we observed regulatory cross-talk among several of these processes. We have also generated motif-association maps that provide a global view of transcription networks. The maps are highly connected, suggesting that a small number of transcription factors are responsible for a complex set of expression patterns in diverse conditions. This approach may be useful for modeling transcriptional regulatory networks in more complex eukaryotes.

Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

BMC Bioinformatics, 2010

Background: The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results: In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a wellknown dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion: High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

Integrative analysis of time course microarray data and DNA sequence data via log-linear models for identifying dynamic transcriptional regulatory networks

International journal of data mining and bioinformatics, 2013

Since eukaryotic transcription is regulated by sets of Transcription Factors (TFs) having various transcriptional time delays, identification of temporal combinations of activated TFs is important to reconstruct Transcriptional Regulatory Networks (TRNs). Our methods combine time course microarray data, information on physical binding between the TFs and their targets and the regulatory sequences of genes using a log-linear model to reconstruct dynamic functional TRNs of the yeast cell cycle and human apoptosis. In conclusion, our results suggest that the proposed dynamic motif search method is more effective in reconstructing TRNs than the static motif search method.

BRNI: Modular analysis of transcriptional regulatory programs

BMC Bioinformatics, 2009

Background: Transcriptional responses often consist of regulatory modules -sets of genes with a shared expression pattern that are controlled by the same regulatory mechanisms. Previous methods allow dissecting regulatory modules from genomics data, such as expression profiles, protein-DNA binding, and promoter sequences. In cases where physical protein-DNA data are lacking, such methods are essential for the analysis of the underlying regulatory program.

Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data

Nucleic Acids Research, 2010

Deciphering transcription factor networks from microarray data remains difficult. This study presents a simple method to infer the regulation of transcription factors from microarray data based on well-characterized target genes. We generated a catalog containing transcription factors associated with 2720 target genes and 6401 experimentally validated regulations. When it was available, a distinction between transcriptional activation and inhibition was included for each regulation. Next, we built a tool (www.tfacts.org) that compares submitted gene lists with target genes in the catalog to detect regulated transcription factors. TFactS was validated with published lists of regulated genes in various models and compared to tools based on in silico promoter analysis. We next analyzed the NCI60 cancer microarray data set and showed the regulation of SOX10, MITF and JUN in melanomas. We then performed microarray experiments comparing gene expression response of human fibroblasts stimulated by different growth factors. TFactS predicted the specific activation of Signal transducer and activator of transcription factors by PDGF-BB, which was confirmed experimentally. Our results show that the expression levels of transcription factor target genes constitute a robust signature for transcription factor regulation, and can be efficiently used for microarray data mining.

Inference of transcriptional regulation relationships from gene expression data

Bioinformatics, 2003

Motivation: In order to find gene regulatory networks from microarray data, it is important to first find direct regulatory relationships between pairs of genes. Results: We propose a new method for finding potential regulatory relationships between pairs of genes from microarray time series data and apply it to expression data for cell-cycle related genes in yeast. We compare our algorithm, dubbed the event method, with the earlier correlation method and the edge detection method by Filkov et al. When tested on known transcriptional regulation genes, all three methods are able to find similar numbers of true positives. The results indicate that our algorithm is able to identify true positive pairs that are different from those found by the two other methods. We also compare the correlation and the event methods using synthetic data and find that typically, the event method obtains better results. Availability: Software is available upon request. Contact: