Identification of cis-regulatory modules in promoters of human genes exploiting mutual positioning of transcription factors (original) (raw)

Global and gene-specific analyses show distinct roles for Myod and Myog at a common set of promoters

The EMBO Journal, 2006

We used a combination of genome-wide and promoterspecific DNA binding and expression analyses to assess the functional roles of Myod and Myog in regulating the program of skeletal muscle gene expression. Our findings indicate that Myod and Myog have distinct regulatory roles at a similar set of target genes. At genes expressed throughout the program of myogenic differentiation, Myod can bind and recruit histone acetyltransferases. At early targets, Myod is sufficient for near full expression, whereas, at late expressed genes, Myod initiates regional histone modification but is not sufficient for gene expression. At these late genes, Myog does not bind efficiently without Myod; however, transcriptional activation requires the combined activity of Myod and Myog. Therefore, the role of Myog in mediating terminal differentiation is, in part, to enhance expression of a subset of genes previously initiated by Myod.

Genome-wide patterns of promoter sharing and co-expression in bovine skeletal muscle

BMC Genomics, 2011

Background: Gene regulation by transcription factors (TF) is species, tissue and time specific. To better understand how the genetic code controls gene expression in bovine muscle we associated gene expression data from developing Longissimus thoracis et lumborum skeletal muscle with bovine promoter sequence information.

Spatial re-organization of myogenic regulatory sequences temporally controls gene expression

Nucleic acids research, 2015

During skeletal muscle differentiation, the activation of some tissue-specific genes occurs immediately while others are delayed. The molecular basis controlling temporal gene regulation is poorly understood. We show that the regulatory sequences, but not other regions of genes expressed at late times of myogenesis, are in close physical proximity in differentiating embryonic tissue and in differentiating culture cells, despite these genes being located on different chromosomes. Formation of these inter-chromosomal interactions requires the lineage-determinant MyoD and functional Brg1, the ATPase subunit of SWI/SNF chromatin remodeling enzymes. Ectopic expression of myogenin and a specific Mef2 isoform induced myogenic differentiation without activating endogenous MyoD expression. Under these conditions, the regulatory sequences of late gene loci were not in close proximity, and these genes were prematurely activated. The data indicate that the spatial organization of late genes con...

Identification of regulatory regions which confer muscle-specific gene expression

Journal of Molecular Biology, 1998

For many newly sequenced genes, sequence analysis of the putative protein yields no clue on function. It would be bene®cial to be able to identify in the genome the regulatory regions that confer temporal and spatial expression patterns for the uncharacterized genes. Additionally, it would be advantageous to identify regulatory regions within genes of known expression pattern without performing the costly and time consuming laboratory studies now required. To achieve these goals, the wealth of case studies performed over the past 15 years will have to be collected into predictive models of expression. Extensive studies of genes expressed in skeletal muscle have identi®ed speci®c transcription factors which bind to regulatory elements to control gene expression. However, potential binding sites for these factors occur with suf®cient frequency that it is rare for a gene to be found without one. Analysis of experimentally determined muscle regulatory sequences indicates that muscle expression requires multiple elements in close proximity. A model is generated with predictive capability for identifying these muscle-speci®c regulatory modules. Phylogenetic footprinting, the identi®cation of sequences conserved between distantly related species, complements the statistical predictions. Through the use of logistic regression analysis, the model promises to be easily modi®ed to take advantage of the elucidation of additional factors, cooperation rules, and spacing constraints.

Tissue-specific regulatory elements in mammalian promoters

Molecular Systems Biology, 2007

Transcription factor-binding sites and the cis-regulatory modules they compose are central determinants of gene expression. We previously showed that binding site motifs and modules in proximal promoters can be used to predict a significant portion of mammalian tissue-specific transcription. Here, we report on a systematic analysis of promoters controlling tissue-specific expression in heart, kidney, liver, pancreas, skeletal muscle, testis and CD4 T cells, for both human and mouse. We integrated multiple sources of expression data to compile sets of transcripts with strong evidence for tissue-specific regulation. The analysis of the promoters corresponding to these sets produced a catalog of predicted tissue-specific motifs and modules, and cis-regulatory elements. Predicted regulatory interactions are supported by statistical evidence, and provide a foundation for targeted experiments that will improve our understanding of tissue-specific regulatory networks. In a broader context, methods used to construct the catalog provide a model for the analysis of genomic regions that regulate differentially expressed genes.

Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers

Genome Research, 2010

Clustering of multiple transcription factor binding sites (TFBSs) for the same transcription factor (TF) is a common feature of cis-regulatory modules in invertebrate animals, but the occurrence of such homotypic clusters of TFBSs (HCTs) in the human genome has remained largely unknown. To explore whether HCTs are also common in human and other vertebrates, we used known binding motifs for vertebrate TFs and a hidden Markov model-based approach to detect HCTs in the human, mouse, chicken, and fugu genomes, and examined their association with cis-regulatory modules. We found that evolutionarily conserved HCTs occupy nearly 2% of the human genome, with experimental evidence for individual TFs supporting their binding to predicted HCTs. More than half of the promoters of human genes contain HCTs, with a distribution around the transcription start site in agreement with the experimental data from the ENCODE project. In addition, almost half of the 487 experimentally validated developmental enhancers contain them as well-a number more than 25-fold larger than expected by chance. We also found evidence of negative selection acting on TFBSs within HCTs, as the conservation of TFBSs is stronger than the conservation of sequences separating them. The important role of HCTs as components of developmental enhancers is additionally supported by a strong correlation between HCTs and the binding of the enhancer-associated coactivator protein Ep300 (also known as p300). Experimental validation of HCT-containing elements in both zebrafish and mouse suggest that HCTs could be used to predict both the presence of enhancers and their tissue specificity, and are thus a feature that can be effectively used in deciphering the gene regulatory code. In conclusion, our results indicate that HCTs are a pervasive feature of human cis-regulatory modules and suggest that they play an important role in gene regulation in the human and other vertebrate genomes.

Predicting transcription factor synergism

Nucleic Acids Research, 2002

Transcriptional regulation is mediated by a battery of transcription factor (TF) proteins, that form complexes involving protein±protein and protein±DNA interactions. Individual TFs bind to their cognate cis-elements or transcription factor-binding sites (TFBS). TFBS are organized on the DNA proximal to the gene in groups con®ned to a few hundred base pair regions. These groups are referred to as modules. Various modules work together to provide the combinatorial regulation of gene transcription in response to various developmental and environmental conditions. The sets of modules constitute a promoter model. Determining the TFs that preferentially work in concert as part of a module is an essential component of understanding transcriptional regulation. The TFs that act synergistically in such a fashion are likely to have their cis-elements co-localized on the genome at speci®c distances apart. We exploit this notion to predict TF pairs that are likely to be part of a transcriptional module on the human genome sequence. The computational method is validated statistically, using known interacting pairs extracted from the literature. There are 251 TFBS pairs up to 50 bp apart and 70 TFBS pairs up to 200 bp apart that score higher than any of the known synergistic pairs. Further investigation of 50 pairs randomly selected from each of these two sets using PubMed queries provided additional supporting evidence from the existing biological literature suggesting TF synergism for these novel pairs.

Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site

PLoS ONE, 2007

Background. Transcription factors (TF) regulate expression by binding to specific DNA sequences. A binding event is functional when it affects gene expression. Functionality of a binding site is reflected in conservation of the binding sequence during evolution and in over represented binding in gene groups with coherent biological functions. Functionality is governed by several parameters such as the TF-DNA binding strength, distance of the binding site from the transcription start site (TSS), DNA packing, and more. Understanding how these parameters control functionality of different TFs in different biological contexts is a must for identifying functional TF binding sites and for understanding regulation of transcription. Methodology/ Principal Findings. We introduce a novel method to screen the promoters of a set of genes with shared biological function (obtained from the functional Gene Ontology (GO) classification) against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. More than 8000 human (and 23,000 mouse) genes, were assigned to one of 134 GO sets. Their promoters were searched (from 200 bp downstream to 1000 bp upstream the TSS) for 414 known DNA motifs. We optimized the sequence similarity score threshold, independently for every location window, taking into account nucleotide heterogeneity along the promoters of the target genes. The method, combined with binding sequence and location conservation between human and mouse, identifies with high probability functional binding sites for groups of functionally-related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were tested experimentally. Conclusions/Significance. We identified reliably functional TF binding sites. This is an essential step towards constructing regulatory networks. The promoter region proximal to the TSS is of central importance for regulation of transcription in human and mouse, just as it is in bacteria and yeast.

Positional distribution of human transcription factor binding sites

Nucleic Acids Research, 2008

We developed a method for estimating the positional distribution of transcription factor (TF) binding sites using ChIP-chip data, and applied it to recently published experiments on binding sites of nine TFs: OCT4, SOX2, NANOG, HNF1A, HNF4A, HNF6, FOXA2, USF1 and CREB1. The data were obtained from a genome-wide coverage of promoter regions from 8-kb upstream of the transcription start site (TSS) to 2-kb downstream. The number of target genes of each TF ranges from few hundred to several thousand. We found that for each of the nine TFs the estimated binding site distribution is closely approximated by a mixture of two components: a narrow peak, localized within 300-bp upstream of the TSS, and a distribution of almost uniform density within the tested region. Using Gene Ontology (GO) and Enrichment analysis, we were able to associate (for each of the TFs studied) the target genes of both types of binding with known biological processes. Most GO terms were enriched either among the proximal targets or among those with a uniform distribution of binding sites. For example, the three stemness-related TFs have several hundred target genes that belong to 'development' and 'morphogenesis' whose binding sites belong to the uniform distribution.