Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers (original) (raw)

Transcriptional enhancers: from properties to genome-wide predictions

Nature Reviews Genetics, 2014

During animal development, a single cell -the fertilized egg -gives rise to a multitude of different cell types and organs. These acquire different morphologies and functions by expressing different sets of genes. The initial step in gene expression is the transcription of the genomic DNA of the gene into RNA by RNA polymerase II (Pol II). The genomic sequence in the immediate vicinity of the transcription start site (TSS), which is also known as the core promoter, is sufficient to assemble the Pol II machinery. However, transcription is often weak in the absence of regulatory DNA regions that are more distant from the TSS; these regions are called enhancers or cis-regulatory modules (CRMs).

Compound cis-regulatory elements with both boundary and enhancer sequences in the human genome

Bioinformatics, 2013

Motivation: It has been suggested that presumably distinct classes of genomic regulatory elements may actually share common sets of features and mechanisms. However, there has been no genome-wide assessment of the prevalence of this phenomenon. Results: To evaluate this possibility, we performed a bioinformatic screen for the existence of compound regulatory elements in the human genome. We identified numerous such colocated boundary and enhancer elements from human CD4 þ T cells. We report evidence that such compound regulatory elements possess unique chromatin features and facilitate cell type-specific functions related to inflammation and immune response in CD4

Comparative genomics using teleost fish helps to systematically identify target gene bodies of functionally defined human enhancers

Background: Human genome is enriched with thousands of conserved non-coding elements (CNEs). Recently, a medium throughput strategy was employed to analyze the ability of human CNEs to drive tissue specific expression during mouse embryogenesis. These data led to the establishment of publicly available genome wide catalog of functionally defined human enhancers. Scattering of enhancers over larger regions in vertebrate genomes seriously impede attempts to pinpoint their precise target genes. Such associations are prerequisite to explore the significance of this in vivo characterized catalog of human enhancers in development, disease and evolution. Results: This study is an attempt to systematically identify the target gene-bodies for functionally defined human CNE-enhancers. For the purpose we adopted the orthology/paralogy mapping approach and compared the CNE induced reporter expression with reported endogenous expression pattern of neighboring genes. This procedure pinpointed specific target gene-bodies for the total of 192 human CNE-enhancers. This enables us to gauge the maximum genomic search space for enhancer hunting: 4 Mb of genomic sequence around the gene of interest (2 Mb on either side). Furthermore, we used human-rodent comparison for a set of 159 orthologous enhancer pairs to infer that the central nervous system (CNS) specific gene expression is closely associated with the cooperative interaction among at least eight distinct transcription factors: SOX5, HFH, SOX17, HNF3β, c-FOS, Tal1beta-E47S, MEF and FREAC.

Transcription factor binding at enhancers: shaping a genomic regulatory landscape in flux

Frontiers in Genetics, 2012

The mammalian genome is packed tightly in the nucleus of the cell. This packing is primarily facilitated by histone proteins and results in an ordered organization of the genome in chromosome territories that can be roughly divided in heterochromatic and euchromatic domains. On top of this organization several distinct gene regulatory elements on the same chromosome or other chromosomes are thought to dynamically communicate via chromatin looping. Advances in genome-wide technologies have revealed the existence of a plethora of these regulatory elements in various eukaryotic genomes. These regulatory elements are defined by particular in vitro assays as promoters, enhancers, insulators, and boundary elements. However, recent studies indicate that the in vivo distinction between these elements is often less strict. Regulatory elements are bound by a mixture of common and lineage-specific transcription factors which mediate the long-range interactions between these elements. Inappropriate modulation of the binding of these transcription factors can alter the interactions between regulatory elements, which in turn leads to aberrant gene expression with disease as an ultimate consequence. Here we discuss the bi-modal behavior of regulatory elements that act in cis (with a focus on enhancers), how their activity is modulated by transcription factor binding and the effect this has on gene regulation.

Characterization of enhancer function from genome-wide analyses

Annual review of genomics and human genetics, 2012

There has been a recent surge in the use of genome-wide methodologies to identify and annotate the transcriptional regulatory elements in the human genome. Here we review some of these methodologies and the conceptual insights about transcription regulation that have been gained from the use of genome-wide studies. It has become clear that the binding of transcription factors is itself a highly regulated process, and binding does not always appear to have functional consequences. Numerous properties have now been associated with regulatory elements that may be useful in their identification. Several aspects of enhancer function have been shown to be more widespread than was previously appreciated, including the highly combinatorial nature of transcription factor binding, the postinitiation regulation of many target genes, and the binding of enhancers at early stages to maintain their competence during development. Going forward, the integration of multiple genome-wide data sets shou...

Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

Nucleic Acids Research, 2012

More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements.

Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements

BMC Biology, 2011

Background: Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases. Results: Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.

Systematic human/zebrafish comparative identification of cis-regulatory activity around vertebrate developmental transcription factor genes

2009

Pan-vertebrate developmental cis-regulatory elements are discernible as highly conserved noncoding elements (HCNEs) and are often dispersed over large areas around the pleiotropic genes whose expression they control. On the loci of two developmental transcription factor genes, SOX3 and PAX6, we demonstrate that HCNEs conserved between human and zebrafish can be systematically and reliably tested for their regulatory function in multiple stable transgenes in zebrafish, and their genomic reach estimated with confidence using synteny conservation and HCNE density along these loci. HCNEs of both human and zebrafish function as specific developmental enhancers in zebrafish. We show that human HCNEs result in expression patterns in zebrafish equivalent to those in mouse, establishing zebrafish as a suitable model for large-scale testing of human developmental enhancers. Orthologous human and zebrafish enhancers underwent functional evolution within their sequence and often directed related but non-identical expression patterns. Despite an evolutionary distance of 450 million years, one pax6 HCNE drove expression in identical areas when comparing zebrafish vs. human HCNEs. HCNEs from the same area often drive overlapping patterns, suggesting that multiple regulatory inputs are required to achieve robust and precise complex expression patterns exhibited by developmental genes.

Identification of conserved regulatory elements by comparative genome analysis

Journal of Biology, 2003

Background: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments.

Whole Genome Human/Mouse Phylogenetic Footprinting of Potential Transcription Regulatory Signals

Biocomputing 2003 - Proceedings of the Pacific Symposium, 2002

Phylogenetic footprinting is an efficient approach for revealing potential transcription factor binding sites in promoter sequences. The idea is based on an assumption that functional sites in promoters should evolve much slower then other regions that do not bear any conservative function. Therefore, potential transcription factor (TF) binding sites that are found in the evolutionally conservative regions of promoters have more chances to be considered as "real" sites. The most difficult step of the phylogenetic footprinting is alignment of promoter sequences between different organisms (f.e. human and mouse). The conventional alignment methods often can not align promoters due to the high level of sequence variability. We have developed a new alignment method that takes into account similarity in distribution of potential binding sites (motif-based alignment). This method has been used effectively for promoter alignment and for revealing new potential binding sites for various transcription factors. We made a systematic phylogenetic footprinting of human/mouse conserved noncoding sequences (CNS). 60 thousand potential binding sites were revealed in human and mouse genomes. We have developed a database of the predicted potential TF binding sites.