Eukaryotic core promoters and the functional basis of transcription initiation (original) (raw)

. Author manuscript; available in PMC: 2019 Apr 1.

Published in final edited form as: Nat Rev Mol Cell Biol. 2018 Oct;19(10):621–637. doi: 10.1038/s41580-018-0028-8

Abstract

RNA polymerase II (Pol II) core promoters are specialized DNA sequences at transcription start sites of protein-coding and non-coding genes that support the assembly of the transcription machinery and transcription initiation. They enable the highly regulated transcription of genes by selectively receiving and integrating regulatory cues from distal enhancers and associated regulatory proteins. In this Review we discuss the defining properties of gene core promoters, including their sequence features, chromatin architecture, and transcription initiation patterns. We provide an overview of molecular mechanisms underlying the function and regulation of core promoters and their emerging functional diversity, which defines distinct transcription programmes. Based on the established properties of gene core promoters, we discuss transcription start sites within enhancers and integrate recent results obtained from dedicated functional assays to propose a functional model of transcription initiation. This model can explain the nature and function of transcription initiation at gene starts and at enhancers and the different functional roles of core promoters, of RNA polymerase II and its associated factors and of the activating cues provided by enhancers and the transcription factors and cofactors they recruit.

Introduction

The development of complex organisms with many morphologically and functionally diverse cell types from a single cell is largely determined by the genetic information contained within genomic DNA1,2. This genetic information includes both protein-coding sequences of genes and non-coding regulatory elements that govern when, where and to what level each gene will be expressed. Regulated gene expression is essential for the integrity of all eukaryotic cells and organisms3, has a central role in cell differentiation and metabolism, and its disruption leads to disease4.

Gene expression starts with transcription, the copying of a DNA sequence into an RNA transcript by RNA polymerase II (Pol II), which transcribes all protein-coding and many non-coding genes. Transcription typically initiates at a defined position, the transcription start site (TSS), at the 5’ end of a gene, which we refer to as gene start. The TSS is embedded within a core promoter, which is a short sequence encompassing ˜50 base-pairs (bp) upstream and ˜50 bp downstream of the TSS (FIG. 1a). The core promoter serves as a binding platform for the transcription machinery, which comprises Pol II and its associated general transcription factors (GTFs)5. Core promoters are sufficient to direct transcription initiation6, but generally have low basal activity, which can be further suppressed by chromatin or activated by often more distally located regulatory elements called enhancers1,7,8. Enhancers bind regulatory proteins known as transcription factors and recruit transcription cofactors (reviewed in REFS 1,9), and can increase transcription from a core promoter independent of their relative distance and orientation1,7,8. More recently, this traditional view of gene expression and the role of enhancers and core promoters have been challenged by the observation that many genomic positions outside annotated gene starts initiate transcription, including positions within enhancers (FIG. 1b).

Figure 1. Properties and function of core promoters and enhancers.

Figure 1

a) The traditional view of transcription initiation postulates that transcription initiates at gene core promoters, which recruit the transcription machinery consisting of RNA polymerase II (Pol II) and general transcription factors (GTFs), thereby leading to the formation of the pre-initiation-complex (PIC) and transcription initiation. Transcription from core promoters is activated by enhancers, which can be located distally and bind sequence-specific transcription factors (TF), which recruit cofactors (COF) that convey the activating cues to the PIC at the core promoter. (b) Active enhancers exhibit divergent transcription of short, unstable enhancer RNAs (eRNAs) from two separate transcription start sites (TSSs) located at the edges of the nucleosome-depleted region where the enhancer resides. (c) Promoters produce long, stable mRNAs from a gene core promoter in the sense direction (orientation of the gene) and short, unstable upstream antisense RNAs (uaRNAs) from the upstream edge of a nucleosome depleted region that contains the transcription factor-bound proximal promoter. Separate pre-initiation complexes drive unidirectional transcription from each of the two TSSs.

Genome-wide transcription initiation

Sites of transcription initiation can be identified using various methods that capture the 5’ ends of Pol II transcripts by exploiting their characteristic properties. For example, cap analysis of gene expression (CAGE)10 and similar 5’ end-capture approaches11,12 take advantage of the cap structure at the 5’ end of Pol II transcripts to detect the TSS and RNA abundance. Complementary methods use properties of nascent transcripts associated with Pol II to detect their TSSs and assess their transcription rates1316, thereby distinguishing true initiation events from sites of potential post-transcriptional cleavage and recapping17.

Applying such large-scale approaches to map TSSs genome-wide in different cell types of various model organisms12,1822 is not only building comprehensive catalogues of gene TSSs and the regulation of transcription initiation, but has revealed the pervasive transcription of eukaryotic genomes23,24. Transcription initiation at many positions distal to annotated gene starts, especially at enhancers, is challenging the traditional model of gene expression, which has implied that transcription is initiated specifically at gene core promoters and regulated by distally located enhancers14,15,25,26 (FIG. 1a).

Transcription initiation at enhancers

Widespread transcription of mammalian enhancers was detected in many cell types14,2528, and the production of enhancer RNAs (eRNAs) was suggested to be predictive of active enhancers26,29. Indeed, eRNA transcription correlates with target gene transcription in inducible systems30,31 and in different cell types26, and often, though not always, precedes the target-gene activation29,31.

Transcription from enhancers is often bi-directional15,26 and initiates at two distinct sites, which drive divergent transcription from the edges of a nucleosome-depleted region (NDR) that is established at active enhancers (FIG. 1b). However, unlike gene core promoters, which support the production of stable transcripts, enhancers mainly produce short, unstable transcripts in both directions15,32.

Antisense transcription at promoters

Bi-directional transcription was also detected at promoters, where the transcription of protein-coding genes is often coupled with the transcription of short non-coding RNAs in the reverse orientation15,3336. These antisense transcripts, known as promoter upstream transcripts (PROMPTs) or upstream antisense RNAs (uaRNAs), are transcribed by separate Pol II complexes from divergently oriented TSSs located at the upstream edge of the nucleosome-depleted proximal promoter region that contains transcription-factor binding sites37,38 (FIG. 1c). Similar to eRNAs, these antisense transcripts are typically unstable, though some promoters seem to produce long and polyadenylated divergent transcripts39,40.

The observed divergent transcription at promoter and enhancer regions, together with other similarities, prompted the proposal of a unified architecture of transcription initiation at those elements15,41,42. According to this model, promoters and enhancers both initiate transcription similarly, but only at gene promoters are transcripts stabilized post-initiation by the presence of 5’ splice sites and by the absence of premature polyadenylation signals15,43,44.

In this review, we first summarize the insights obtained from studying core promoters of annotated genes and then discuss to what extent the properties of these bona fide core promoters can be found at TSSs within other genomic regulatory elements, including enhancers. This order of discussion reflects notion that gene core promoters have specifically evolved to initiate stable transcripts in a highly regulated manner, whereas the cause and the role of transcription initiation outside gene starts has remained unclear. We further discuss the assembly and activation of the transcription machinery at core promoters and how this machinery is regulated by distal enhancers via transcription factors and cofactors. Finally, we integrate these established promoter properties with recent results from dedicated functional assays to propose a functional model of transcription initiation that can account for transcription from promoters and from enhancers based on these elements’ sequence-encoded activities.

Properties of gene core promoters

Mapping endogenous transcription initiation sites1416,1922,45 has characterized different features of core promoters, including their diverse sequence and chromatin properties and the (focused or dispersed) distribution of transcription initiation sites, which together define three different types of core promoters46 (BOX 1).

Box 1. Transcription initiation patterns and core-promoter types.

The comprehensive mapping of gene core promoters has revealed several transcription initiation patterns and sequence and chromatin properties.

Dichotomy of the promoter shape

Mapping endogenous transcription initiation at single nucleotide resolution revealed striking differences between core promoters45,58, leading to the classification of ‘focused’ or ‘sharp’ core promoters, which have a single, well-defined transcription start site (TSS; see figure, part a) and ’dispersed’ or ’broad’ promoters45, which have multiple closely-spaced TSSs that are used with similar frequency (see figure, part b). These transcription initiation patterns (or promoter shapes) are found across species, including in fish21 and fly12,19,68, and are associated with distinct gene categories: focused initiation preferentially occurs in core promoters of highly cell-type specific genes with restricted expression patterns, whereas dispersed initiation is mainly associated with housekeeping genes expressed in many cell types19,22,45,68 and in mammals with CpG-island (CGI)-overlapping promoters of regulators of development.

Three types of core promoters

Based on different properties, including initiation pattern, sequence composition and motifs, chromatin configuration and gene function, three main types of core promoters in metazoa have been proposed46: (1) core promoters with sharp initiation patterns, un-precisely positioned nucleosomes89 and TATA-box and Inr motifs (see figure, part a). These promoters tend to have key regulatory elements near their TSSs235 and are activate in terminally differentiated cells in adult tissues, in which case they acquire histone H3 Lys 4 trimethylation (H3K4me3) and H3 Lys 27 acetylation (H3K27ac), which are associated with active transcription. (2) Core promoters of broadly expressed housekeeping genes, which are associated with dispersed transcription initiation19,45 and a well-defined nucleosome-depleted region (NDR) flanked by precisely positioned nucleosomes89 marked by H3K4me3 and H3K27ac (see figure, part b). In mammals, these core promoters overlap individual CGIs45; in flies they are enriched in a specific set of variably-positioned motifs including Ohler1, Ohler6 and DNA replication-related element (DRE)68. (3) Core promoters of key developmental transcription factors involved in patterning and morphogenesis. In mammals they resemble housekeeping-gene core promoters, which in embryonic stem cells however are distinctly bivalently marked with both H3K4me3 and the repressive modification H3K27me3 (REF. 236; see figure, part c). This presumably primes them for activation in the correct cell lineage and for silencing in all other cells. In mammals such ‘poised’ promoters are associated with long individual CGIs or multiple CGIs75 and often produce long non-coding divergent transcripts39,40. In flies promoters of this class tend to contain a downstream promoter element (DPE) and have focused initiation62. Both in mammals and flies, they are often surrounded by arrays of highly conserved non-coding elements, which might act as distal enhancers62,75.

Box 1 figure.

Box 1 figure

Sequence properties

By definition, the main task of core promoters is to support the assembly of the pre-initiation complex (PIC), which consists of Pol II and GTFs, and to guide transcription initiation from precise positions at defined levels6. The important role of the core promoter sequence in conferring these functions was recently corroborated by analyzing single nucleotide polymorphisms and other genetic variants, which across different fruit fly strains affected both transcription levels and TSS choice within core promoters47. These variations were found to often disrupt crucial sequence features known as core-promoter motifs, many of which are known to recruit GTFs and mediate PIC assembly (Table 1).

Table 1. Known core-promoter motifs and the (general) transcription factors that bind to them.

Core-promoter motifs

Several core-promoter motifs have fixed positioning relative to a single, well-defined TSS. For example, the well-known TATA-box motif48,49 is located ˜30bp upstream of a single dominant TSS50 in ’focused’ core promoters (BOX 1). Although the TATA-box is conserved from yeast to human, it is found only in a minority of core promoters, for instance ˜5% in fly51,52. The TATA-box is recognized and bound by the TATA-box binding protein53 (TBP; Table 1), one of the components of the Transcription Factor IID (TFIID) complex, a GTF that mediates Pol II recruitment and PIC assembly54,55 and thereby might determine TSS choice at a fixed downstream position.

Another core promoter motif with a fixed position relative to transcription initiation is the Initiator (Inr) motif, which directly overlaps the TSS56. The Inr is more abundant than the TATA-box52 but is not universal, and its consensus sequence differs between fly and human. The fly Inr motif is longer, more information-rich and encompasses several nucleotides that are adjacent to the TSS and were shown to serve as a binding site for additional components of TFIID57 (Table 1). By contrast, human Inr was initially defined as pyrimidine (C or T) followed by a purine (A or G), positioned such that the purine is the first transcribed nucleotide45. However, more recently a human Inr motif with higher information content was found in focused core promoters, and several nucleotides outside the dinucleotide core motif were suggested to be important for transcription initiation in vitro58 (Table 1).

In promoters that lack a TATA-box, the Inr is often accompanied by another motif, the downstream promoter element (DPE), which is positioned downstream of the TSS59 (Table 1). The DPE motif was initially discovered in fly and, based on the investigation of individual promoters, was suggested to also be present in human60, even though it was never found over-represented in human promoters45,52. Several subunits of TFIID are suggested to bind DPE, and a strict requirement for Inr–DPE spacing is thought to be essential for cooperative binding of TFIID55,60. Since in fly TATA-box and DPE rarely co-occur, they were suggested to be associated with functionally distinct groups of genes51,52,61,62 (BOX 1).

In addition to these three most abundant core-promoter motifs, other motifs with defined positions relative to the TSS include ten element (MTE)63 in fly, TFIIB recognition elements (BREs)64,65 and downstream core elements (DCE)66 in human. These motifs are bound by specific GTFs in vitro64,67 (Table 1), thus potentially mediating PIC recruitment and assembly. Furthermore, analysis of large collections of core promoters allowed the computational definition of over-represented sequences, leading to the discovery of other motifs without apparent spacing requirements relative to the TSS51,52. In flies, these include Ohler motifs 1, 6 & 7, and DNA replication-related element (DRE), which were found mainly in promoters with dispersed initiation patterns associated with housekeeping genes51,68 (BOX 1).

The described core-promoter motifs are over-represented in gene core promoters and are more rarely associated with non-genic initiation sites. Some enhancer TSSs and promoter antisense TSSs contain weak or degenerate forms of TATA-box or Inr motifs15,26,38, and the closer such motifs are to the consensus, the more promoter-like the enhancers are69 (see below).

The discovery of core-promoter motifs and their importance for transcription initiation has motivated the design of synthetic core promoters that efficiently assemble the PIC and support high level of transcription initiation for transgene expression in both fly and human systems7072. Such promoters are also often used for biochemical and structural characterization of the PIC.

Characteristic (di)nucleotide composition

Apart from defined sequence motifs, gene core promoters often have distinct nucleotide compositions. For example in vertebrates many core promoters overlap with CpG islands (CGI), which are regions with elevated GC content and high density of CpG dinucleotides73. CGI promoters typically lack defined motifs and are mainly associated with housekeeping genes45,74 or key developmental regulators involved in embryo patterning and morphogenesis75 (BOX 1). The mechanisms by which CGIs confer core promoter function are still unknown.

Characteristic patterns of dinucleotide composition have also been found downstream of the TSS, where A- or T-containing dinucleotides occur in periodic patterns21,76. The similarity between such patterns and the preferential sequence composition reported to underlie nucleosomal DNA7779 suggests a close connection between nucleosome positioning and TSS positions, especially at core promoters that lack motifs and have broad initiation patterns21,22,76.

Chromatin configuration

While most genomic DNA shows limited accessibility as it is wrapped around histone octamers to form nucleosomes, active core promoters are devoid of nucleosomes, which makes them accessible and allows PIC assembly and Pol II recruitment. Indeed, NDRs flanked by precisely positioned and phased downstream nucleosomes are hallmarks of active core promoters in all eukaryotic cells8082. However, recent studies suggested that such NDRs might not be depleted of nucleosomes but rather occupied by highly dynamic nucleosomes containing the histone variants H3.3 and H2A.Z83, and other non-canonical or partial nucleosomal particles8486. These features were proposed to ensure accessibility of the transcription machinery and associated factors to DNA, suggesting that nucleosome occupancy and accessibility to DNA at core promoters are not necessarily mutually exclusive87,88.

Promoters with different initiation patterns differ in chromatin architecture and nucleosome positioning: dispersed promoters have more clearly defined NDRs and are associated with well-positioned nucleosomes downstream of the TSS89 (BOX 1). Similarly, in yeast two distinct types of promoters can be distinguished by the presence of either fragile nucleosomes or stably positioned nucleosomes, which correlates with distinct underlying sequences90.

Despite the obvious correlation between open, accessible chromatin and active transcription from promoters, the causal relationship between the two is still not clear. There is evidence that some transcription factors, sometimes called pioneer factors91, can bind to closed chromatin and recruit chromatin remodelling factors to open the chromatin, thereby allowing Pol II binding and transcription initiation92,93 (reviewed in REF. 9). Similarly, the presence of H2A.Z in the first downstream (+1) nucleosome is believed to decrease the barrier this nucleosome imposes on transcribing Pol II94. A complementary possibility is that low level of transcription by Pol II is required to keep the chromatin open and allow transcription factors to bind38,95,96. These mechanisms are not mutually exclusive and they are likely combined, presumably with different contributions at different types of core promoters96. H3.3 for example appears to be both downstream and upstream of transcription: it is deposited into nucleosomes independently of DNA replication97 preferentially at promoters and enhancers98 where it replaces the canonical H3 histone that is ejected during transcription. Once it accumulates at promoters, it could facilitate subsequent rounds of transcription98.

Post-translational histone modifications

Another prominent feature of promoter-associated chromatin is the presence of specific post-translational modifications of histones99,100. Nucleosomes downstream of active promoters bear tri-methylation of histone H3 Lys 4 (H3K4me3) and acetylation of H3 Lys 27 (H3K27ac)100 (BOX 1). Whether and how these modifications contribute to promoter function is unclear. In budding yeast, for example, H3K4 methylation occurs downstream of transcription and is mediated by the recruitment of histone-lysine N-methyltransferase, H3 lysine-4 specific (SET1) by the transcribing Pol II (REF. 101). H3K4me3 was suggested to provide a memory (‘bookmark’) of recent transcriptional activity, thereby facilitating new rounds of transcription101. However, the rapid and complete loss of H3K4me3 and transcription in the absence of transcription activators suggests that H3K4me3 alone is not sufficient to maintain active transcription102. A bookmarking function was also proposed for H4K5ac, which can recruit the transcriptional cofactor bromodomain-containing protein 4 (BRD4) and facilitate post-mitotic re-activation of a previously active genomic locus103. Histone acetylation might work through decreasing the affinity of DNA to nucleosomes and promoting open chromatin, similar to acetylation of the histone core104106, or by directly providing binding sites for cofactors that bind acetylated lysine residues, such as BRD4107.

Although H3K4me3 and H3K27ac correlate strongly with transcriptional activity, whether they are causally involved in transcription is not clear. H3K4me3 seems dispensable for transcription in flies, since cells containing non-methylatable forms of both canonical and variant H3 histones show regulated transcription108,109. Similarly, cells with a Lys-to-Arg mutation at position 27 on canonical histone H3 exhibit de-repression of Polycomb silenced genes, implying that transcription does not require Lys 27 acetylation at canonical H3 (REF. 110). This suggests that Lys 27 acetylation of the histone variant H3.3 is important or that histone acetylation is only a by-product of the acetyltransferases P300/CBP, whose relevant targets could include transcription factors111113 and the Pol II complex itself114. Such data, together with recent studies that found the pervasive enhancer mark H3K4me1 to be dispensable for enhancer activity115,116, caution against attributing functions to histone modifications based purely on correlation and emphasize the need for functional studies to discern causation from correlation117.

A striking example of histone modifications that causally direct transcription was recently found at Piwi-interacting RNA (piRNA) source loci in fly heterochromatin. Transcription of these loci is carried out by an alternative transcription machinery that is specifically recruited to the heterochromatin mark H3K9me3 through the H3K9me3 reader heterochromatin protein 1 (HP1; REF. 118). Although this shows that histone modifications associated with bona fide core promoters are not necessarily required for transcription, it also demonstrates that in principle modified histones are able to modulate transcription.

Transcription initiation at promoters

Transcription from gene core promoters is a step-wise process that results in a defined transcriptional output. Understanding the molecular mechanisms underlying each of the individual steps is essential for understanding their activation by distal cues.

Role of the pre-initiation complex

Assembly of the PIC at core promoters and initiation of transcription involves six GTFs, which recognize and bind core promoter elements, recruit Pol II and activate it for productive transcription119 (FIG. 2a). A sequential model of PIC assembly, proposed based on biochemical and structural studies, includes the recognition of core-promoter elements by TFIID, binding of TFIIA and TFIIB, recruitment of the Pol II–TFIIF complex, and finally the binding of TFIIE followed by TFIIH (reviewed in REFS 120,121). This model was further supported by a recent single-molecule imaging study that provided additional insight into the dynamics of GTF binding122. PIC assembly is followed by DNA-duplex melting and the formation of an open PIC, which supports the synthesis of the first nucleotides of the nascent transcript, after which Pol II is released from the core-promoter and the GTFs that bind it (’promoter escape’; FIG. 2b). High-resolution structures of both closed and open PICs, including double-stranded and melted DNA, respectively, revealed contacts between individual GTFs and core promoter DNA and shed light on the molecular events leading to PIC assembly, promoter opening and transcription initiation at core promoters55,123,124.

Figure 2. Regulation of different steps of transcription from core promoters.

Figure 2

a) Pre-initiation complex (PIC) assembly and RNA polymerase II (Pol II) recruitment. The first step of transcription initiation is the assembly of the PIC consisting of Pol II and six general transcription factors (GTFs): transcription factor IIA (TFIIA), TFIIB, TFIID, TFIIE, TFIIF and TFIIH (left). Enhancers can promote PIC assembly by recruiting transcription factors (TFs) and cofactors (COFs) that directly interact with GTFs or Pol II (right). b) Initiation by Pol II and ’promoter escape’. After PIC assembly, the DNA duplex at core promoters melts (not shown) and allows Pol II to initiate transcription at the transcription start site (TSS). To continue transcribing, Pol II has to dissociate (escape) from the TSS-binding GTFs, which is mediated by phosphorylation of Ser 5 and Ser 7 of the Pol II carboxy-terminal domain (CTD) by TFIIH. Enhancers can aid this process by recruiting cofactors such as the Mediator complex (MED) or the acetyltransferase CBP/P300 (see main text for these and other cofactors’ functions). c) Pol II promoter-proximal pausing. After escaping from the TSS, Pol II synthetizes a short stretch of nascent RNA (30-50 nucleotides) and then pauses downstream of the TSS. DRB sensitivity inducing factor (DSIF) and negative elongation factor (NELF) bind to Pol II and the nascent RNA and promote Pol II pausing. Pause-release is mediated by cyclin-dependent kinase 9 (CDK9), which is a subunit of the positive transcription elongation factor b (P-TEFb) that phosphorylates DSIF, NELF and Ser 2 of the Pol II CTD. This leads to dissociation of NELF and entry of Pol II into productive elongation. Enhancers promote this process by recruiting cofactors that either recruit and stimulate CDK9 or directly affect pause-release, such as Brd4 and p300. d) Regulation of transcription bursting. Transcription occurs in short ‘bursts’, which comprise groups of initiation events separated by periods of inactivity. The core promoter sequence determines burst size, that is the number of transcribing Pol II molecules per burst (left), while enhancers increase bursting frequency from their target core promoter (right). ‘+’ denotes target activation and ‘-‘ denotes target inhibition.

Both biochemical and structural studies agree that TFIID has a central role in recognizing and binding core-promoter elements and nucleating PIC assembly. In addition, TFIID selectively binds H3K4me3, thereby enabling cross-talk between chromatin and PIC assembly125. Apart from regulating accessibility to DNA (reviewed in REF. 9), TFIID recruitment is therefore the first step at which transcription can be regulated and indeed, some transcription factors can bind and potentially recruit TFIID to core promoters126128. In addition, TFIID composition might also influence transcription. Canonical TFIID consists of TBP and TBP-associated factors54, which can be replaced by different paralogs to form alternative TFIID complexes (reviewed in REFS 129132). For example, TBP-related factor 2 (TRF2) substitutes TBP at promoters of many housekeeping genes and is essential for their activation133135.

As biochemical and structural studies of PIC assembly and function typically consider only a few well-defined or synthetic core promoters that contain canonical core promoter motifs55,70, the mechanism of GTF recruitment and regulation at other types of core promoters is unclear and might differ. Indeed, mapping the binding sites of various PIC components genome-wide in yeast revealed a distinct interplay between the PIC and nucleosomes at promoters containing strong TATA-box motifs versus those with only weak or no TATA-box motifs136. In yeast, the presence of a strong TATA-box has been used to distinguish between SAGA complex-dominated and TFIID-dominated promoters137,138. SAGA-dominated promoters more often contain strong TATA-box motifs and are associated with genes responsive to stress, whereas TFIID-dominated promoters are depleted of such strong TATA-box motifs137,138. However, the two complexes might not be mutually exclusively employed at distinct types of promoters, but regulate different steps that are more or less rate-limiting at the different promoter types138,139. This is consistent with recent observations that the transcription of nearly all yeast genes depends to some extent on TFIID140 and that SAGA is involved in regulating both TATA-containing and TATA-less promoters139.

RNA Polymerase II pausing

At many genes, once Pol II has cleared from the TSS, it transcribes only 30-50 nucleotides downstream of the TSS and then undergoes promoter-proximal pausing141143 (FIG. 2c). Paused Pol II was initially detected at heat-shock-responsive genes in their inactive state144 and shown to be rapidly released into productive elongation upon heat-shock145, thereby enabling strong and rapid gene activation. Release from promoter-proximal pausing involves phosphorylation by the cyclin-dependent kinase 9 (CDK9) subunit of the positive transcription elongation factor b (P-TEFb) of several components of the paused transcription elongation complex, including negative elongation factor (NELF), DRB sensitivity inducing factor (DSIF) and Pol II itself145,146 (FIG. 2c).

The prevalence and tight regulation of Pol II promoter-proximal pausing demonstrates that PIC recruitment and transcription initiation are not necessarily the rate-limiting steps of transcription at all promoters. Rather, promoter-proximal pausing provides an additional opportunity to regulate transcription by allowing rapid release of already engaged Pol II into productive elongation146, thereby eliminating dependencies on the slower steps of recruitment and initiation. This might be beneficial when rapid or synchronous changes in gene expression are required. For example, in early fly embryos promoters with paused Pol II are activated synchronously across all cells147, which is important for coordinating tissue morphogenesis148. Similarly, genes with paused Pol II in fly embryos were enriched for developmental regulators and it is likely that pausing facilitates rapid changes in spatial and temporal activity of these genes during development141. By contrast, in mouse embryonic stem cells paused Pol II is enriched at genes regulating cell cycle and signal transduction, and is suggested to regulate development through the control of signaling pathways149.

Different genes might, however, differ in their rate-limiting step for productive transcription. Some genes could predominantly be regulated by releasing stably paused Pol II, whereas for other genes regulation might occur mainly at the initiation step. In addition, the stability of paused Pol II at different promoters greatly differs: half-lives of paused Pol II measured by inhibiting both pause-release and de novo initiation, range from several minutes to an hour and more150152. At promoters that support stable Pol II pausing with low turn-over rates (half-life >30 min), stalled Pol II seems to block new transcription initiation151,153, presumably by steric hindrance as previously predicted154. By contrast, at promoters with high turn-over of paused Pol II (half-life of only minutes) there may be no interference with transcription initiation152, potentially allowing tight regulation at the initiation step followed by non-limiting pause-release. Such an antagonistic relationship between pausing duration and transcription initiation frequency might create a pause–initiation balance153, which could allow influencing one step by regulating another step, for example increasing initiation frequency by stimulating CDK9-mediated release of paused Pol II (REFS 153,154).

The nature of the trigger of Pol II pausing is not known and it was suggested that the sequence downstream of the TSS might play an important role. Core promoters of the most strongly paused genes often have elevated GC content downstream of the TSS, including the GC-rich DPE or Pause button (PB) motifs155 (Table 1). While these motifs might recruit specific proteins, GC-rich sequences might also simply slow down the Pol II (REF. 156). Similarly, transcription might also be hindered by the topological stress due to supercoiling of DNA downstream of the transcribing Pol II (REFS 157,158). In addition, chromatin has been implicated in Pol II pausing, since the +1 nucleosome could represent a barrier to Pol II at essentially all genes resulting in downstream or distal pausing94. However, the causal relationship between nucleosome positioning and Pol II transcription is not clear and it was also suggested that the paused Pol II is required to keep the promoter region clear of nucleosomes96, rather than the other way around.

Interestingly, most or all genes seem to require CDK9 for productive elongation, including those without GC-rich sequences downstream of the TSSs and those for which no accumulation of paused Pol II is detected146,150,153. The global down-regulation of transcription upon CDK9 inhibition, even at enhancers159,160, indicates that Pol II pausing or a pausing-like checkpoint between initiation and elongation occurs for essentially all Pol II-mediated transcription, irrespective of whether paused Pol II accumulates to detectable levels. Such a checkpoint might be important to ensure RNA 5’ capping, the assembly of a functional elongation complex, including Topoisomerase I recruitment and activation161, and the recruitment of other proteins required for elongation and co-transcriptional processes. This suggests that promoter-proximal pausing is an inherent property of transcription by Pol II and is triggered independently of the core-promoter sequence, potentially via the 5’ end of the nascent RNA, which after transcription of about 18 nucleotides starts protruding from Pol II. Indeed, the two pausing-establishing factors DSIF and NELF require a nascent transcript longer than 18 nucleotides to stably associate with the Pol II elongation complex162 (reviewed in REF. 163). Furthermore, recent biochemical and structural studies of a complex containing Pol II and DSIF revealed that DSIF contacts nascent RNA exiting from Pol II, suggesting a role of this interaction in establishing Pol II pausing164166. According to this model, pausing is triggered independently of the sequence and chromatin properties at the pause-site, which nevertheless might influence the stability of the interactions between DNA, nascent RNA and paused Pol II. Strengthening these interactions could increase the duration of pausing and potentially explain the elevated GC content at sites that accumulate high levels of paused Pol II, that is stable RNA-DNA hybrids owing to higher GC content at the pause site might increase the duration of pausing.

Regulation by enhancers and cofactors

Active promoters are often in spatial proximity to enhancers167170 and the establishment of such contacts between promoters and distal enhancers is related to the three-dimensional organisation of chromatin in the nucleus9,171173. Promoter activation might occur upon establishing contacts with an enhancer, or by the recruitment of transcription factors to pre-formed enhancer–core promoter interactions. The latter was found to be prevalent in fly development, where enhancer–core promoter interactions are established prior to gene activation and appear stable during development174. In either case, promoters need to be sufficiently close to their enhancers to be activated.

Modes of core-promoter activation

The different steps required for productive transcription by Pol II all provide opportunity for regulation: PIC assembly, Pol II activation and transcription initiation, Pol II pausing and release into productive elongation (see above and REF. 175). Core promoters receive regulatory input from enhancers and this is mediated by transcription factors that directly bind short transcription-factor binding sites within enhancers, and by transcriptional cofactors, which are recruited by transcription factors through protein–protein interactions. Cofactors often have enzymatic activities and can post-translationally modify components of the transcription machinery and the surrounding nucleosomes, thereby affecting the different processes taking place at target core promoters.

Promoting pre-initiation complex assembly and RNA polymerase II activation

The most straightforward way to increase transcription from a core promoter is to increase the rate of transcription initiation by promoting PIC assembly and Pol II recruitment and activation. Several transcription factors or cofactors recruited by enhancers directly interact with components of the transcription machinery leading to stabilization of PIC at core promoters and increased initiation (FIG. 2a). For example, the Mediator complex is recruited to enhancers, interacts with the PIC at core promoters and transduces activating cues to increase Pol II recruitment and PIC assembly176. In yeast, Mediator seems to directly contact TFIIH and stimulate phosphorylation of the Ser 5 residues in the carboxy-terminal domain (CTD) of Pol II by the TFIIH subunit cyclin-dependent kinase 7 (CDK7) (REF. 177; Supplementary information S1 (box)). Ser 5 phosphorylation is considered important for Pol II to escape from the core promoter-bound GTFs and to initiate transcription (FIG. 2b). Similarly, the acetyltransferase p300, which is a cofactor widely associated with many active enhancers178, can acetylate GTFs or Pol II at target core promoters112,179 and this is required for the induction of growth-factor response genes114.

Promoting Pol II pause–release

Many core promoters support the recruitment of high levels of Pol II and are rather regulated at the level of pause-release142,145,180. Transition into productive elongation is coupled to phosphorylation of the Pol II CTD at Ser 2 residues (Supplementary information S1 (box)) and of DSIF and NELF by CDK9, which is the kinase subunit of P-TEFb (FIG. 2c). P-TEFb can be recruited to core promoters by the transcriptional cofactor BRD4181,182, which is bound to many enhancers and is involved in regulating specific subset of genes183,184. Thus, enhancers that recruit high levels of BRD4, such as those involved in oncogene activation185,186 may preferentially function through releasing paused Pol II through CDK9. However, BRD proteins also regulate the transition to productive transcription elongation independently of CDK9 recruitment, since BRD protein degradation globally impairs transcription elongation but does not impact CDK9 recruitment to target genes187,188. P300 and Pol II-associated factor 1 (PAF1) have also been reported to be involved in pause release179,189. PAF1 seems to be required for pausing at enhancers and promoters and the loss of PAF1 leads to increased promoter activity, potentially through enhancer activation160.

Modulating transcription bursts

Transcription occurs in short but intense ‘bursts’, which comprise groups of initiation events separated by periods of inactivity190,191, as if promoters stochastically transition between inactive and active or permissive states192,193. This stochastic nature of transcription means that transcription activation could be achieved in one of two ways: by increasing the amplitude (size) of bursts, that is, the number of transcribing Pol II molecules per burst, or by increasing the frequency of bursts. The latter was shown to be the case both in regulation of developmental genes in fly embryos193 and in activation of the β-globin promoter by its locus-control region194. In contrast, burst size is a fixed property of the core promoter that is determined by the core promoter sequence, which mediates GTF binding192,195,196 (FIG. 2d). Indeed, the presence of the TATA-box motif supports larger burst size in yeast195, which might enable rapid transcriptional responses to stress196, yet appears to disproportionally contribute to transcriptional noise and increased cell-to-cell transcript variability197. Activation of core promoters that support large burst size by an enhancer that increases the frequency of bursting will lead to high transcriptional output. This might explain the observation made in reporter assays that enhancers most highly activate TATA-box-containing core promoters198.

Specificity and responsiveness

Although forced interaction of an enhancer with a core promoter can be sufficient to activate transcription199, this is not the case for all promoters suggesting that enhancers have preferences or specificities towards some promoters and, vice versa, that promoters can only be activated by certain enhancers but not others.

Sequence-encoded enhancer–core-promoter specificity

For example, reporter genes with TATA-box-containing or with DPE-containing promoters integrated at identical genomic positions were differentially expressed in fly embryos200, suggesting that they differentially responded to genomic enhancers. Similarly, core promoters derived from fly housekeeping genes or from developmental genes were differentially activated by distinct sets of enhancers in an otherwise constant plasmid environment201. This is indicative of a sequence-encoded enhancer–core-promoter specificity that separates developmental and housekeeping transcription programs201, a notion that was corroborated by a complementary approach that showed that different promoters respond specifically either to developmental enhancers or to housekeeping enhancers198.

The specificity of core promoters towards regulatory input is not necessarily confined to different sets of genes. For example, in zebrafish a global switch in initiation pattern from focused to dispersed occurs at many genes during embryonic development21, suggesting that they use two different, overlapping core promoter sequences that respond differentially to enhancers active during either maternal or zygotic transcription.

Enhancer-binding regulatory proteins mediate core-promoter specificities

Activation of core promoters by enhancers is mediated by transcription factors and cofactors that have a central role in conveying regulatory cues from enhancers to core promoters and presumably mediate the enhancers’ specificities. Some transcription factors and cofactors can activate transcription on their own when tethered to core promoters202206. Furthermore, when tested with different core promoters in a constant reporter setup, some factors displayed preferences towards certain core promoters206,207. An intriguing hypothesis that could explain such observations is that different types of core promoter support the assembly of structurally or compositionally distinct PIC complexes that are biochemically compatible with different types of transcription factors and cofactors. One such example is TRF2 replacing TBP in PICs assembled at housekeeping gene promoters133135 (FIG. 3; reviewed in REFS 9,208).

Figure 3. Sequence-Bencoded specificity of core promoters towards enhancers and activation by specific transcription (co)factors.

Figure 3

Different types of core promoters respond differentially to distal enhancers, that is an enhancer can activate them (solid arrows) or not (dashed arrows). This selectivity or specificity is mediated by different transcription factors (TF) and cofactors (COF), which display core promoter preferences likely based on biochemical compatibilities between the cofactors and core promoter-bound general transcription factors (GTFs). Mapping and understanding preferences and compatibilities between cofactors and core promoters is an important goal for future research. Pol II, RNA polymerase II; TBP, TATA-box binding protein; TRF2, TBP-related factor 2.

The suggested specificity between core promoters and activating factors was further corroborated by loss-of-function studies that either specifically inhibited cofactor function179,183,184 or depleted cofactors139,140,209 and showed preferential downregulation of certain genes but not others. For example, in yeast, the depletion of different Mediator subunits leads to differential gene downregulation and seems to preferentially affect SAGA-regulated genes209. In mammals, inhibition of BRD4 leads to preferential downregulation of Myc183,185 — a property that is exploited for therapeutic purposes. Similarly, inhibition of p300 seems to most strongly affect core promoters of highly paused genes characterized by distinct chromatin configuration and binding of specific factors, and appears to differentially affect Pol II recruitment and initiation versus Pol II pause-release, depending on the core promoter type179. These observations suggest that transcription of different genes might depend on different cofactors.

A functional model of transcription

The properties of core promoters establish them as specialized sequences that support transcription initiation and Pol II pause-release in response to activating cues from distal enhancers. Enhancers have been regarded as amplifiers of transcription from proximal or distal core promoters7,8, a function mediated by transcription factors and cofactors8. The term ‘promoters’ refers to sequences at gene starts, which can autonomously drive high levels of productive transcription. Promoters comprise in close proximity core promoters and supporting activating sequences, which are called proximal promoters or proximal enhancers (discussed in REFS 198,210). Enhancers therefore share several characteristics with promoters, such as the binding of transcription factors and cofactors26, but also – more unexpectedly – the binding of GTFs and Pol II (REFS 28,211,212) and the ability to initiate transcription15,26,29,42,213 (FIG. 1b).

To understand the similarities and differences between core promoters, enhancers and promoters, it is instructive to establish activity-based definitions of these elements using dedicated assays designed to specifically probe the defining function of each of these elements (BOX 2). Such assays specifically assess enhancer activity as the ability to activate transcription at a distal core promoter214,215; core promoter function as the ability to initiate transcription in response to distal regulatory cues198; and promoter activity as the ability to autonomously drive transcription198,216. One recently-developed assay simultaneously measures both enhancer and promoter activity69.

Box 2. Measuring core promoter and enhancer activities.

Dedicated activity-based assays that specifically measure enhancer, core promoter or promoter activity allow function-based definition of regulatory elements (see fig.).

Enhancers activate transcription at distal core promoters

Enhancer activity is measured in reporter assays that test the ability to activate transcription at a distal core promoter and drive the expression of a reporter gene (fig. a). Enhancer activity has been reported for intergenic and intronic sequences, but also for some sequences that overlap gene promoters201,214,215,217,218,237,238. Such promoters support both core-promoter activity and enhancer activity through the respective sequence elements (fig. d). The many promoter regions that do not show enhancer activity likely support only core promoter functionality and therefore cannot activate transcription at a distal core promoter.

Core promoters initiate transcription in response to regulatory input

Analogously to directly measuring distal enhancer activity, core promoter activity can be specifically assessed in dedicated reporter assays that measure the ability to initiate transcription in response to activating input from an enhancer, that is, measure enhancer responsiveness (fig. b). Candidates with high enhancer responsiveness mainly coincide with gene transcription start sites (TSSs) and contain core-promoter motifs such as TATA-boxes and Inr motifs198,239,240. Unlike gene core promoters, TSSs within enhancers show very low or no responsiveness198, suggesting that enhancers in general have a very weak or no sequence-based propensity to respond to distal activating cues and act as core promoters (fig. d).

Autonomous promoter function is conferred by sequences supporting both core promoter and enhancer activities

Although the above methods assess core-promoter activity as the responsiveness to a defined regulatory input, promoter activity is typically defined as the ability to drive transcription autonomously69,216 (fig. c). Such autonomously functioning promoters typically contain both core promoter activity and enhancer activity; an enhancer in this context is also called a proximal promoter or an upstream-activating sequence (fig. d).

Box 2 figure.

Box 2 figure

Such dedicated functional assays demonstrated for example that promoter regions can activate transcription from distal core promoters, meaning they can function as enhancers198,201,214,217,218, and that enhancer regions can autonomously give rise to productive transcription and function as promoters69,216,217. However, these approaches also found that enhancer function and core-promoter function frequently do not co-occur69,198,201,214,216218, indicating that the two functions can be carried-out by the same genomic region, but are not strictly coupled or interdependent219.

Fortuitous initiation at enhancers

Enhancer activity is mediated through the binding of transcription factors and the recruitment of cofactors, which not only mediate activation of target core promoters but create high transcription activation potential at the enhancers themselves. Enhancers should therefore naturally have the tendency to activate transcription close to or within the enhancer, presumably at sites that most closely resemble bona fide core promoters. Given the low sequence stringency (that is ‘information content’) of many core promoter motifs (Table 1), many sequences at either side of an enhancer resemble degenerate core-promoter motifs. Transcription initiation is in fact expected at any (random) sequence that is in the vicinity of strongly activating factors, because achieving perfect activation specificity towards core promoters or entirely preventing background initiation at accessible DNA would be energetically costly and could only evolve under strong selective pressure.

Fortuitous transcription initiation resulting from high activator concentrations can explain several observations related to transcription initiation at enhancers, including the presence of degenerate Inr and TATA-box motifs at TSSs within enhancers15,26,38, the bidirectional initiation pattern at enhancers15,32 or at open chromatin in general220 (FIG. 4a), and the observations that eRNAs are inducible25 and cell-type specific26 — in both cases, eRNA transcription follows the activity of the enhancer, that is, the recruitment of strong transcription activators. It is also consistent with TSSs within enhancers generally showing very low enhancer responsiveness and thus having little or no capacity to support distally regulated transcription initiation as bona fide core promoters do198 (BOX 2). Moreover, the more similar the TSSs within enhancers are to bona fide core promoters, the higher the level of productive transcription from the enhancer69. Therefore, although some enhancers can function as promoters, enhancers generally do not do so and the difference stems from the presence or absence of sequence-encoded core-promoter functionality.

Figure 4. Functional model of transcription initiation at genomic promoters and enhancers.

Figure 4

a) Model of transcription initiation at enhancers (left) and promoters (right) arising from their distinct sequence-encoded activities. Enhancers bind transcription factors (TF) and recruit cofactors (COF), thereby creating a high local concentration of transcription activators. This should lead to fortuitous transcription initiation at proximal sites that resemble bona fide core promoters (“best-of-random sites”), resulting in divergent transcription of short unstable enhancer RNAs (eRNAs). Promoters transcribe stable mRNAs from a dedicated gene core promoter and – due to high activator concentration – will also show fortuitous transcription initiation in the antisense direction. b) Model of evolution of a functional core promoter or an enhancer. Newly emerging transcription-factor binding sites (blue) create enhancer-like activity and exhibit low levels of bidirectional transcription at best-of-random sites. If such transcription is harmful, it might be actively suppressed by DNA methylation (pins)222, repressive factors223 or repressive chromatin and the transcription factor binding sites will degenerate over time. If by contrast the transcription in one or both directions is beneficial, the respective transcription start site will be positively selected and evolve to a fully functional core promoter (red) with strong core promoter motifs, able to support high levels of regulated and productive transcription. Transcription in the non-beneficial direction will remain low and yield non-stable upstream-antisense RNAs (uaRNAs). The activator binding sites near core promoters are often referred to as ‘proximal promoter’. Finally, if the transcription from a putative regulatory sequence is neutral and its enhancer activity is beneficial, the enhancer function should be strengthened and the enhancer will transcribe low levels of bidirectional eRNAs from best-of-random sites.

Evolution of enhancers and promoters

The model of fortuitous initiation at enhancers is consistent with the finding that bidirectional transcription is the ground state of evolutionarily new promoter regions and that uni-directionality is an acquired trait of gene core promoters221. Newly emerged transcription-factor binding sites confer enhancer-like activity, which initially leads to low levels of bidirectional transcription initiation221 (FIG. 4b). If this transcription is harmful, it might become silenced, for example by repressive chromatin222,223, and the binding site might eventually decay. If by contrast transcription in one or both directions is beneficial, the respective TSS sequence could be positively selected and evolve into a fully functional core promoter with strong core promoter motifs, able to support regulated and productive transcription. Similarly, a core promoter that is regulated exclusively by distal enhancers could acquire proximal activator binding sites and thus promoter activity.

The functions of enhancer RNAs

In the model of fortuitous initiation at enhancers, eRNAs are unavoidable by-products of transcription activators, yet this does not exclude the possibility that eRNA transcription or eRNAs themselves are functional. It is possible that evolution took advantage of their correlation with transcriptional activity to modulate enhancer activity (reviewed in REF. 224). For example, eRNA transcription might ensure accessibility to DNA95,96 and eRNAs might be involved in the formation of activating micro-environments in the form of non-membrane bound compartments with high concentrations of transcription activators225227, which is similar to what has been reported for for germline P granules228, RNA granules229,230 and the formation of condensed heterochromatin231,232. Such hypotheses that consider conceptually novel ways to understand the regulatory environment at enhancers should motivate future studies of eRNA function.

Perspective and future directions

Pol II core promoters are genomic elements that support PIC assembly and transcription initiation, and function as specialized sequences that have evolved to enable highly regulated gene transcription. We propose a functional model that defines regulatory elements by their function rather than by their genomic position; we argue that core promoters and enhancers are the two principal gene regulatory elements, that they have distinct functionalities and that they have evolved for distinct purposes: initiating productive transcription locally (core promoters) versus boosting transcription locally or distally (enhancers).

We are intrigued by the widespread occurrence of Pol II pausing at most promoters and enhancers153,159,160, which might indicate that a pausing-like checkpoint between transcription initiation and elongation is an intrinsic property of all Pol II-mediated transcription. As such, it might be triggered not by the DNA sequence at the down-stream pause-site but, for example, by the 5’ end of the nascent RNA as it protrudes from Pol II. It will be interesting to see if the successful resolution of this checkpoint is necessary for productive elongation and whether this could be the main difference between transcription at promoters and enhancers.

We would like to highlight the existence of different types of core promoters with distinct properties, especially preferences towards different enhancers and cofactors that are presumably based on biochemical compatibilities (FIG. 3). Elucidating such preferences and compatibilities and determining the differences between various core-promoter types is crucial at a time when we have an increasingly complete understanding of the mechanisms that determine genome structure and spatial contacts of enhancers and their target core promoters (reviewed in REFS 233,234), and when transcription regulation is becoming the focus of targeted intervention and novel therapeutic strategies.

Supplementary Material

Supplementary information S1

Glossary.

Core promoter

Short sequence flanking the transcription start site (typically ˜50 base-pairs upstream and ˜50 base-pairs downstream) that is sufficient to assemble the RNA polymerase II transcription machinery and initiate transcription.

General transcription factors

(GTF) Proteins that together with RNA polymerase II constitute the transcription machinery at the core promoter.

Transcription factors

Proteins that directly bind a specific DNA sequence through their DNA-binding domain and regulate the level of transcription by recruiting Pol II or transcriptional cofactors through their trans-activation domain.

Transcriptional cofactors

Proteins that do not directly bind DNA, but are recruited by DNA-binding transcription factors to regulate transcription of target genes.

Enhancer RNAs

(eRNAs) Short unstable non-coding RNAs (<2kb), usually not spliced or polyadenylated, which are transcribed from enhancers and rapidly degraded by the exosome.

Nucleosome-depleted region

(NDR) Genomic region depleted of canonical nucleosomes; usually associated with active regulatory elements such as promoters and enhancers.

Promoters

Genomic regions encompassing a gene core promoter and an upstream proximal promoter, which together autonomously drive transcription.

Proximal promoter

Transcription-activating sequence immediately upstream of the core promoter (typically up to 250bp upstream of the transcription start site), which contains binding sites for sequence-specific transcription factors and functions like an enhancer.

Pre-initiation complex

(PIC) A large complex of proteins, including RNA polymerase II and its general transcription factors, that assembles at core promoters and is required for transcription initiation.

CpG islands

(CGI) GC-rich genomic sequences with the frequency of CpG dinucleotides higher than in the rest of the genome (which is generally depleted of CpG dinucleotides in mammals).

Piwi-interacting RNA

(piRNA) Small non-coding RNA (26-31 nucleotides) that interacts with Argonaute proteins from the Piwi family and mediates transcriptional and post-transcriptional gene silencing of transposable elements.

Promoter-proximal pausing

Pausing of RNA polymerase II downstream of the transcription start site; controls the transition into productive transcription elongation.

Enhancer responsiveness

The extent to which transcription from a core promoter is induced by a distal enhancer.

SAGA complex

Spt–Ada–Gcn5-acetyltransferase (SAGA) is a coactivator complex with different chromatin-modifying modules, including for example the Gcn5 histone acetyltransferease.

Acknowledgements

The authors thank F. Mürdter, M.A. Zabidi, P.R. Andersen, C. Plaschka and C. Bernecky for helpful comments on the manuscript. V.H. is supported by a long-term postdoctoral fellowship from the Human Frontier Science Program (HFSP, grant number LT000324/2016-L). Research in the Stark group is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 647320) and by the Austrian Science Fund (FWF, F4303-B09). Basic research at the IMP is supported by Boehringer Ingelheim GmbH and the Austrian Research Promotion Agency (FFG).

Footnotes

Competing interests

The authors declare no competing interests.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author contributions

All authors contributed equally to all aspects of the article.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary information S1