Boris Lenhard | Imperial College London (original) (raw)
Papers by Boris Lenhard
Core promoters integrate regulatory inputs of genes1–3. Global dynamics of promoter usage can rev... more Core promoters integrate regulatory inputs of genes1–3. Global dynamics of promoter usage can reveal systemic changes in how genomic sequence is interpreted by the cell4 Here we report the first analysis of promoter dynamics and code switching in the mammalian germ line, characterising the full cycle of transitions from embryonic stem cells through germline, oogenesis, and zygotic genome activation. Using Super Low Input Carrier-CAGE5,6 (SLIC-CAGE) we show that mouse germline development starts with the somatic promoter code, followed by a prominent switch to the maternal code during follicular oogenesis. The sequence features underlying the shift from somatic to maternal code are conserved across vertebrates, despite large differences in promoter nucleotide compositions. In addition, we show that, prior to this major shift, the promoters of gonadal germ cells diverge from the canonical somatic transcription initiation. This divergence is distinct from the promoter code used later b...
Nucleic Acids Research
The core-promoter, a stretch of DNA surrounding the transcription start site (TSS), is a major in... more The core-promoter, a stretch of DNA surrounding the transcription start site (TSS), is a major integration-point for regulatory-signals controlling gene-transcription. Cellular differentiation is marked by divergence in transcriptional repertoire and cell-cycling behaviour between cells of different fates. The role promoter-associated gene-regulatory-networks play in development-associated transitions in cell-cycle-dynamics is poorly understood. This study demonstrates in a vertebrate embryo, how core-promoter variations define transcriptional output in cells transitioning from a proliferative to cell-lineage specifying phenotype. Assessment of cell proliferation across zebrafish embryo segmentation, using the FUCCI transgenic cell-cycle-phase marker, revealed a spatial and lineage-specific separation in cell-cycling behaviour. To investigate the role differential promoter usage plays in this process, cap-analysis-of-gene-expression (CAGE) was performed on cells segregated by cyclin...
The core promoter, a stretch of DNA surrounding the transcription start site (TSS) is a major int... more The core promoter, a stretch of DNA surrounding the transcription start site (TSS) is a major integration point for regulatory signals controlling gene transcription. The process of cell differentiation is accompanied by a marked divergence in transcriptional repertoire between cells of different fates, accompanied by changes in cellular behaviour, in particular their proliferative activity. Investigation of divergent core promoter architectures suggest distinct regulatory networks act on the core promoter, modulating cell behavior through transcriptional profile changes, which ultimately drives key transitions in cellular behaviour during embryonic development. The role that promoter-associated gene regulatory networks play in development associated transitions in cell cycle dynamics (e.g. during differentiation) however, is poorly understood. In this study we demonstrate in a developing in vivo model, how core promoter variations play a key role in defining transcriptional output in cells transitioning from a proliferative to cell-lineage specifying phenotype. The FUCCI transgenic system, differentially marks cells in G1 and S/G2/M phases of the cell cycle and can therefore be used to separate rapidly and slowly cycling cells in vivo, by virtue of the cell cycle stage they primarily inhabit. Longitudinal assessment of cell proliferation rate during zebrafish embryo development, using this system, revealed a spatial and lineage-specific separation in cell cycling behaviour across post-gastrulation embryos. In order to investigate the role differential promoter usage plays in this process, cap analysis of gene expression (CAGE) was performed on fluorescent associated cell sorted (FACS) FUCCI zebrafish embryos going through somitogenesis, separating cells in accordance with the rate of their cell cycling. This analysis revealed a dramatic increase in lineage and tissue-specific gene expression, concurrent with a slowing of their cell cycling. Core promoters associated with rapidly cycling cells, showed broad distribution of transcription start site usage, featuring positionally constrained CCAAT-box, while slowly cycling cells favoured sharp TSS usage coupled with canonical TATA-box utilisation and enrichment of Sp1 binding sites. These results demonstrate the regulatory role of core promoters in cell cycle-dependent transcription regulation, during somitogenesis stages of embryo development.
SUMMARYIn many animal models, primordial germ cell (PGC) development depends on maternally-deposi... more SUMMARYIn many animal models, primordial germ cell (PGC) development depends on maternally-deposited germ plasm to avoid somatic cell fate. Here, we show that PGCs respond to regulatory information from the germ plasm in two distinct phases and mechanisms in zebrafish. We show that PGCs commence zygotic genome activation together with the rest of the embryo with no demonstrable differences in transcriptional and chromatin accessibility levels. Thus, cytoplasmic germ plasm determinants only affect post-transcriptional stabilisation of RNAs to diverge transcriptome from somatic cells, which, unexpectedly, also activate germ cell-specific genes. Perinuclear relocalisation of germ plasm is coupled to dramatic divergence in chromatin opening and transcriptome from somatic cells characterised by PGC-specific chromatin topology. Furthermore, we reveal Tdrd7, regulator of germ plasm localisation, as crucial determinant of germ fate acquisition.
Journal of Visualized Experiments
Cap analysis of gene expression (CAGE) is a method used for single-nucleotide resolution detectio... more Cap analysis of gene expression (CAGE) is a method used for single-nucleotide resolution detection of RNA polymerase II transcription start sites (TSSs). Accurate detection of TSSs enhances identification and discovery of core promoters. In addition, active enhancers can be detected through signatures of bidirectional transcription initiation. Described here is a protocol for performing super-low input carrier-CAGE (SLIC-CAGE). This SLIC adaptation of the CAGE protocol minimizes RNA losses by artificially increasing the RNA amount through use of an in vitro transcribed RNA carrier mix that is added to the sample of interest, thus enabling library preparation from nanogram-amounts of total RNA (i.e., thousands of cells). The carrier mimics the expected DNA library fragment length distribution, thereby eliminating biases that could be caused by the abundance of a homogenous carrier. In the last stages of the protocol, the carrier is removed through degradation with homing endonucleases and the target library is amplified. The target sample library is protected from degradation, as the homing endonuclease recognition sites are long (between 18 and 27 bp), making the probability of their existence in the eukaryotic genomes very low. The end result is a DNA library ready for next-generation sequencing. All steps in the protocol, up to sequencing, can be completed within 6 days. The carrier preparation requires a full working day; however, it can be prepared in large quantities and kept frozen at-80 °C. Once sequenced, the reads can be processed to obtain genome-wide single-nucleotide resolution TSSs. TSSs can be used for core promoter or enhancer discovery, providing insight into gene regulation. Once aggregated to promoters, the data can also be used for 5'-centric expression profiling.
Conserved Noncoding Elements (CNEs) are elements exhibiting extreme noncoding conservation in Met... more Conserved Noncoding Elements (CNEs) are elements exhibiting extreme noncoding conservation in Metazoan genomes. They cluster around developmental genes and act as long-range enhancers, yet nothing that we know about their function explains the observed conservation levels. Clusters of CNEs coincide with topologically associating domains (TADs), indicating ancient origins and stability of TAD locations. This has suggested further hypotheses about the still elusive origin of CNEs, and has provided a comparative genomics-based method of estimating the position of TADs around developmentally regulated genes in genomes where chromatin conformation capture data is missing. To enable researchers in gene regulation and chromatin biology to start deciphering this phenomenon, we developed CNEr, a R/Bioconductor toolkit for large-scale identification of CNEs and for studying their genomic properties. We apply CNEr to two novel genome comparisons - fruit fly vs tsetse fly, and two sea urchin ge...
Bioinformatics
Motivation Clusters of extremely conserved non-coding elements (CNEs) mark genomic regions devote... more Motivation Clusters of extremely conserved non-coding elements (CNEs) mark genomic regions devoted to cis-regulation of key developmental genes in Metazoa. We have recently shown that their span coincides with that of topologically associating domains (TADs), making them useful for estimating conserved TAD boundaries in the absence of Hi-C data. The standard approach—detecting CNEs in genome alignments and then establishing the boundaries of their clusters—requires tuning of several parameters and breaks down when comparing closely related genomes. Results We present a novel, kurtosis-based measure of pairwise non-coding conservation that requires no pre-set thresholds for conservation level and length of CNEs. We show that it performs robustly across a large span of evolutionary distances, including across the closely related genomes of primates for which standard approaches fail. The method is straightforward to implement and enables detection and comparison of clusters of CNEs an...
Genome Research
Cap analysis of gene expression (CAGE) is a methodology for genome-wide quantitative mapping of m... more Cap analysis of gene expression (CAGE) is a methodology for genome-wide quantitative mapping of mRNA 5′ ends to precisely capture transcription start sites at a single nucleotide resolution. In combination with high-throughput sequencing, CAGE has revolutionized our understanding of the rules of transcription initiation, led to discovery of new core promoter sequence features, and discovered transcription initiation at enhancers genome-wide. The biggest limitation of CAGE is that even the most recently improved version (nAnT-iCAGE) still requires large amounts of total cellular RNA (5 µg), preventing its application to scarce biological samples such as those from early embryonic development or rare cell types. Here, we present SLIC-CAGE, a Super-Low Input Carrier-CAGE approach to capture 5′ ends of RNA polymerase II transcripts from as little as 5–10 ng of total RNA. This dramatic increase in sensitivity is achieved by specially designed, selectively degradable carrier RNA. We demon...
Brain : a journal of neurology, Jan 9, 2018
The transcription factor BCL11B is essential for development of the nervous and the immune system... more The transcription factor BCL11B is essential for development of the nervous and the immune system, and Bcl11b deficiency results in structural brain defects, reduced learning capacity, and impaired immune cell development in mice. However, the precise role of BCL11B in humans is largely unexplored, except for a single patient with a BCL11B missense mutation, affected by multisystem anomalies and profound immune deficiency. Using massively parallel sequencing we identified 13 patients bearing heterozygous germline alterations in BCL11B. Notably, all of them are affected by global developmental delay with speech impairment and intellectual disability; however, none displayed overt clinical signs of immune deficiency. Six frameshift mutations, two nonsense mutations, one missense mutation, and two chromosomal rearrangements resulting in diminished BCL11B expression, arose de novo. A further frameshift mutation was transmitted from a similarly affected mother. Interestingly, the most se...
Nature, Mar 15, 2018
Gametes are highly specialized cells that can give rise to the next generation through their abil... more Gametes are highly specialized cells that can give rise to the next generation through their ability to generate a totipotent zygote. In mice, germ cells are first specified in the developing embryo around embryonic day (E) 6.25 as primordial germ cells (PGCs). Following subsequent migration into the developing gonad, PGCs undergo a wave of extensive epigenetic reprogramming around E10.5-E11.5, including genome-wide loss of 5-methylcytosine. The underlying molecular mechanisms of this process have remained unclear, leading to our inability to recapitulate this step of germline development in vitro. Here we show, using an integrative approach, that this complex reprogramming process involves coordinated interplay among promoter sequence characteristics, DNA (de)methylation, the polycomb (PRC1) complex and both DNA demethylation-dependent and -independent functions of TET1 to enable the activation of a critical set of germline reprogramming-responsive genes involved in gamete generati...
Nucleic acids research, Jan 17, 2017
JASPAR (http://jaspar.genereg.net) is an openaccess database of curated, non-redundant transcript... more JASPAR (http://jaspar.genereg.net) is an openaccess database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
Nucleic acids research, Jan 7, 2017
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an... more Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks-dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to ...
The EMBO journal, Nov 4, 2017
While β-catenin has been demonstrated as an essential molecule and therapeutic target for various... more While β-catenin has been demonstrated as an essential molecule and therapeutic target for various cancer stem cells (CSCs) including those driven by MLL fusions, here we show that transcriptional memory from cells of origin predicts AML patient survival and allows β-catenin-independent transformation in MLL-CSCs derived from hematopoietic stem cell (HSC)-enriched LSK population but not myeloid-granulocyte progenitors. Mechanistically, β-catenin regulates expression of downstream targets of a key transcriptional memory gene, Hoxa9 that is highly enriched in LSK-derived MLL-CSCs and helps sustain leukemic self-renewal. Suppression of Hoxa9 sensitizes LSK-derived MLL-CSCs to β-catenin inhibition resulting in abolishment of CSC transcriptional program and transformation ability. In addition, further molecular and functional analyses identified Prmt1 as a key common downstream mediator for β-catenin/Hoxa9 functions in LSK-derived MLL-CSCs. Together, these findings not only uncover an une...
Genome Biology
Background: Inactivation of one X chromosome is established early in female mammalian development... more Background: Inactivation of one X chromosome is established early in female mammalian development and can be reversed in vivo and in vitro when pluripotency factors are re-expressed. The extent of reactivation along the inactive X chromosome (Xi) and the determinants of locus susceptibility are, however, poorly understood. Here we use cell fusion-mediated pluripotent reprograming to study human Xi reactivation and allele-specific single nucleotide polymorphisms (SNPs) to identify reactivated loci. Results: We show that a subset of human Xi genes is rapidly reactivated upon re-expression of the pluripotency network. These genes lie within the most evolutionary recent segments of the human X chromosome that are depleted of LINE1 and enriched for SINE elements, predicted to impair XIST spreading. Interestingly, this cadre of genes displays stochastic Xi expression in human fibroblasts ahead of reprograming. This stochastic variability is evident between clones, by RNA-sequencing, and at the single-cell level, by RNA-FISH, and is not attributable to differences in repressive histone H3K9me3 or H3K27me3 levels. Treatment with the DNA demethylating agent 5-deoxy-azacytidine does not increase Xi expression ahead of reprograming, but instead reveals a second cadre of genes that only become susceptible to reactivation upon induction of pluripotency. Conclusions: Collectively, these data not only underscore the multiple pathways that contribute to maintaining silencing along the human Xi chromosome but also suggest that transcriptional stochasticity among human cells could be useful for predicting and engineering epigenetic strategies to achieve locus-specific or domain-specific human Xi gene reactivation.
Nature Communications
Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding e... more Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding elements (CNEs). CNEs exhibit unexplained extreme levels of sequence conservation, with many acting as developmental long-range enhancers. Clusters of CNEs define the span of regulatory inputs for many important developmental regulators and have been described previously as genomic regulatory blocks (GRBs). Their function and distribution around important regulatory genes raises the question of how they relate to 3D conformation of these loci. Here, we show that clusters of CNEs strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human and Drosophila. The set of TADs that are associated with high levels of noncoding conservation exhibit distinct properties compared to TADs devoid of extreme noncoding conservation. The close correspondence between extreme noncoding conservation and TADs suggests that these TADs are ancient, revealing a regulatory architecture conserved over hundreds of millions of years.
Seminars in cell & developmental biology, Jan 16, 2016
Core promoters are minimal regions sufficient to direct accurate initiation of transcription and ... more Core promoters are minimal regions sufficient to direct accurate initiation of transcription and are crucial for regulation of gene expression. They are highly diverse in terms of associated core promoter motifs, underlying sequence composition and patterns of transcription initiation. Distinctive features of promoters are also seen at the chromatin level, including nucleosome positioning patterns and presence of specific histone modifications. Recent advances in identifying and characterizing promoters using next-generation sequencing-based technologies have provided the basis for their classification into functional groups and have shed light on their modes of regulation, with important implications for transcriptional regulation in development. This review discusses the methodology and the results of genome-wide studies that provided insight into the diversity of RNA polymerase II promoter architectures in vertebrates and other Metazoa, and the association of these architectures ...
Core promoters are minimal regions sufficient to direct accurate initiation of transcription and ... more Core promoters are minimal regions sufficient to direct accurate initiation of transcription and are crucial for regulation of gene expression. They are highly diverse in terms of associated core promoter motifs, underlying sequence composition and patterns of transcription initiation. Distinctive features of promoters are also seen at the chromatin level, including nucleosome positioning patterns and presence of specific histone modifications. Recent advances in identifying and characterizing promoters using next-generation sequencing-based technologies have provided the basis for their classification into functional groups and have shed light on their modes of regulation, with important implications for transcriptional regulation in development. This review discusses the methodology and the results of genome-wide studies that provided insight into the diversity of RNA polymerase II promoter architectures in vertebrates and other Metazoa, and the association of these architectures ...
Journal of Biological Chemistry, 2016
Isoleucyl-tRNA synthetase (IleRS) is unusual among aminoacyl-tRNA synthetases in having a tRNA-de... more Isoleucyl-tRNA synthetase (IleRS) is unusual among aminoacyl-tRNA synthetases in having a tRNA-dependent pre-transfer editing activity. Alongside the typical bacterial IleRS (such as Escherichia coli IleRS), some bacteria also have the enzymes (eukaryote-like) that cluster with eukaryotic IleRSs and exhibit low sensitivity to the antibiotic mupirocin. Our phylogenetic analysis suggests that the ileS1 and ileS2 genes of contemporary bacteria are the descendants of genes that might have arisen by an ancient duplication event before the separation of bacteria and archaea. We present the analysis of evolutionary constraints of the synthetic and editing reactions in eukaryotic/eukaryotelike IleRSs, which share a common origin but diverged through adaptation to different cell environments. The enzyme from the yeast cytosol exhibits tRNA-dependent pre-transfer editing analogous to E. coli IleRS. This argues for the presence of this proofreading in the common ancestor of both IleRS types and an ancient origin of the synthetic site-based quality control step. Yet surprisingly, the eukaryote-like enzyme from Streptomyces griseus IleRS lacks this capacity; at the same time, its synthetic site displays the 10 3-fold drop in sensitivity to antibiotic mupirocin relative to the yeast enzyme. The discovery that pre-transfer editing is optional in IleRSs lends support to the notion that the conserved post-transfer editing domain is the main checkpoint in these enzymes. We substantiated this by showing that under error-prone conditions S. griseus IleRS is able to rescue the growth of an E. coli lacking functional IleRS, providing the first evidence that tRNA-dependent pre-transfer editing in IleRS is not essential for cell viability. Aminoacyl-tRNA synthetases (aaRS) 3 establish the genetic code through the specific attachment of amino acid to their * This work was supported by the Unity through Knowledge Fund Grant 8/13
Cancer Cell, 2016
If citing, it is advised that you check and use the publisher's definitive version for pagination... more If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.
Nucleic acids research, Jan 15, 2015
MicroRNAs (miRNAs) play a major role in the post-transcriptional regulation of target genes, espe... more MicroRNAs (miRNAs) play a major role in the post-transcriptional regulation of target genes, especially in development and differentiation. Our understanding about the transcriptional regulation of miRNA genes is limited by inadequate annotation of primary miRNA (pri-miRNA) transcripts. Here, we used CAGE-seq and RNA-seq to provide genome-wide identification of the pri-miRNA core promoter repertoire and its dynamic usage during zebrafish embryogenesis. We assigned pri-miRNA promoters to 152 precursor-miRNAs (pre-miRNAs), the majority of which were supported by promoter associated post-translational histone modifications (H3K4me3, H2A.Z) and RNA polymerase II (RNAPII) occupancy. We validated seven miR-9 pri-miRNAs by in situ hybridization and showed similar expression patterns as mature miR-9. In addition, processing of an alternative intronic promoter of miR-9-5 was validated by 5' RACE PCR. Developmental profiling revealed a subset of pri-miRNAs that are maternally inherited. M...
Core promoters integrate regulatory inputs of genes1–3. Global dynamics of promoter usage can rev... more Core promoters integrate regulatory inputs of genes1–3. Global dynamics of promoter usage can reveal systemic changes in how genomic sequence is interpreted by the cell4 Here we report the first analysis of promoter dynamics and code switching in the mammalian germ line, characterising the full cycle of transitions from embryonic stem cells through germline, oogenesis, and zygotic genome activation. Using Super Low Input Carrier-CAGE5,6 (SLIC-CAGE) we show that mouse germline development starts with the somatic promoter code, followed by a prominent switch to the maternal code during follicular oogenesis. The sequence features underlying the shift from somatic to maternal code are conserved across vertebrates, despite large differences in promoter nucleotide compositions. In addition, we show that, prior to this major shift, the promoters of gonadal germ cells diverge from the canonical somatic transcription initiation. This divergence is distinct from the promoter code used later b...
Nucleic Acids Research
The core-promoter, a stretch of DNA surrounding the transcription start site (TSS), is a major in... more The core-promoter, a stretch of DNA surrounding the transcription start site (TSS), is a major integration-point for regulatory-signals controlling gene-transcription. Cellular differentiation is marked by divergence in transcriptional repertoire and cell-cycling behaviour between cells of different fates. The role promoter-associated gene-regulatory-networks play in development-associated transitions in cell-cycle-dynamics is poorly understood. This study demonstrates in a vertebrate embryo, how core-promoter variations define transcriptional output in cells transitioning from a proliferative to cell-lineage specifying phenotype. Assessment of cell proliferation across zebrafish embryo segmentation, using the FUCCI transgenic cell-cycle-phase marker, revealed a spatial and lineage-specific separation in cell-cycling behaviour. To investigate the role differential promoter usage plays in this process, cap-analysis-of-gene-expression (CAGE) was performed on cells segregated by cyclin...
The core promoter, a stretch of DNA surrounding the transcription start site (TSS) is a major int... more The core promoter, a stretch of DNA surrounding the transcription start site (TSS) is a major integration point for regulatory signals controlling gene transcription. The process of cell differentiation is accompanied by a marked divergence in transcriptional repertoire between cells of different fates, accompanied by changes in cellular behaviour, in particular their proliferative activity. Investigation of divergent core promoter architectures suggest distinct regulatory networks act on the core promoter, modulating cell behavior through transcriptional profile changes, which ultimately drives key transitions in cellular behaviour during embryonic development. The role that promoter-associated gene regulatory networks play in development associated transitions in cell cycle dynamics (e.g. during differentiation) however, is poorly understood. In this study we demonstrate in a developing in vivo model, how core promoter variations play a key role in defining transcriptional output in cells transitioning from a proliferative to cell-lineage specifying phenotype. The FUCCI transgenic system, differentially marks cells in G1 and S/G2/M phases of the cell cycle and can therefore be used to separate rapidly and slowly cycling cells in vivo, by virtue of the cell cycle stage they primarily inhabit. Longitudinal assessment of cell proliferation rate during zebrafish embryo development, using this system, revealed a spatial and lineage-specific separation in cell cycling behaviour across post-gastrulation embryos. In order to investigate the role differential promoter usage plays in this process, cap analysis of gene expression (CAGE) was performed on fluorescent associated cell sorted (FACS) FUCCI zebrafish embryos going through somitogenesis, separating cells in accordance with the rate of their cell cycling. This analysis revealed a dramatic increase in lineage and tissue-specific gene expression, concurrent with a slowing of their cell cycling. Core promoters associated with rapidly cycling cells, showed broad distribution of transcription start site usage, featuring positionally constrained CCAAT-box, while slowly cycling cells favoured sharp TSS usage coupled with canonical TATA-box utilisation and enrichment of Sp1 binding sites. These results demonstrate the regulatory role of core promoters in cell cycle-dependent transcription regulation, during somitogenesis stages of embryo development.
SUMMARYIn many animal models, primordial germ cell (PGC) development depends on maternally-deposi... more SUMMARYIn many animal models, primordial germ cell (PGC) development depends on maternally-deposited germ plasm to avoid somatic cell fate. Here, we show that PGCs respond to regulatory information from the germ plasm in two distinct phases and mechanisms in zebrafish. We show that PGCs commence zygotic genome activation together with the rest of the embryo with no demonstrable differences in transcriptional and chromatin accessibility levels. Thus, cytoplasmic germ plasm determinants only affect post-transcriptional stabilisation of RNAs to diverge transcriptome from somatic cells, which, unexpectedly, also activate germ cell-specific genes. Perinuclear relocalisation of germ plasm is coupled to dramatic divergence in chromatin opening and transcriptome from somatic cells characterised by PGC-specific chromatin topology. Furthermore, we reveal Tdrd7, regulator of germ plasm localisation, as crucial determinant of germ fate acquisition.
Journal of Visualized Experiments
Cap analysis of gene expression (CAGE) is a method used for single-nucleotide resolution detectio... more Cap analysis of gene expression (CAGE) is a method used for single-nucleotide resolution detection of RNA polymerase II transcription start sites (TSSs). Accurate detection of TSSs enhances identification and discovery of core promoters. In addition, active enhancers can be detected through signatures of bidirectional transcription initiation. Described here is a protocol for performing super-low input carrier-CAGE (SLIC-CAGE). This SLIC adaptation of the CAGE protocol minimizes RNA losses by artificially increasing the RNA amount through use of an in vitro transcribed RNA carrier mix that is added to the sample of interest, thus enabling library preparation from nanogram-amounts of total RNA (i.e., thousands of cells). The carrier mimics the expected DNA library fragment length distribution, thereby eliminating biases that could be caused by the abundance of a homogenous carrier. In the last stages of the protocol, the carrier is removed through degradation with homing endonucleases and the target library is amplified. The target sample library is protected from degradation, as the homing endonuclease recognition sites are long (between 18 and 27 bp), making the probability of their existence in the eukaryotic genomes very low. The end result is a DNA library ready for next-generation sequencing. All steps in the protocol, up to sequencing, can be completed within 6 days. The carrier preparation requires a full working day; however, it can be prepared in large quantities and kept frozen at-80 °C. Once sequenced, the reads can be processed to obtain genome-wide single-nucleotide resolution TSSs. TSSs can be used for core promoter or enhancer discovery, providing insight into gene regulation. Once aggregated to promoters, the data can also be used for 5'-centric expression profiling.
Conserved Noncoding Elements (CNEs) are elements exhibiting extreme noncoding conservation in Met... more Conserved Noncoding Elements (CNEs) are elements exhibiting extreme noncoding conservation in Metazoan genomes. They cluster around developmental genes and act as long-range enhancers, yet nothing that we know about their function explains the observed conservation levels. Clusters of CNEs coincide with topologically associating domains (TADs), indicating ancient origins and stability of TAD locations. This has suggested further hypotheses about the still elusive origin of CNEs, and has provided a comparative genomics-based method of estimating the position of TADs around developmentally regulated genes in genomes where chromatin conformation capture data is missing. To enable researchers in gene regulation and chromatin biology to start deciphering this phenomenon, we developed CNEr, a R/Bioconductor toolkit for large-scale identification of CNEs and for studying their genomic properties. We apply CNEr to two novel genome comparisons - fruit fly vs tsetse fly, and two sea urchin ge...
Bioinformatics
Motivation Clusters of extremely conserved non-coding elements (CNEs) mark genomic regions devote... more Motivation Clusters of extremely conserved non-coding elements (CNEs) mark genomic regions devoted to cis-regulation of key developmental genes in Metazoa. We have recently shown that their span coincides with that of topologically associating domains (TADs), making them useful for estimating conserved TAD boundaries in the absence of Hi-C data. The standard approach—detecting CNEs in genome alignments and then establishing the boundaries of their clusters—requires tuning of several parameters and breaks down when comparing closely related genomes. Results We present a novel, kurtosis-based measure of pairwise non-coding conservation that requires no pre-set thresholds for conservation level and length of CNEs. We show that it performs robustly across a large span of evolutionary distances, including across the closely related genomes of primates for which standard approaches fail. The method is straightforward to implement and enables detection and comparison of clusters of CNEs an...
Genome Research
Cap analysis of gene expression (CAGE) is a methodology for genome-wide quantitative mapping of m... more Cap analysis of gene expression (CAGE) is a methodology for genome-wide quantitative mapping of mRNA 5′ ends to precisely capture transcription start sites at a single nucleotide resolution. In combination with high-throughput sequencing, CAGE has revolutionized our understanding of the rules of transcription initiation, led to discovery of new core promoter sequence features, and discovered transcription initiation at enhancers genome-wide. The biggest limitation of CAGE is that even the most recently improved version (nAnT-iCAGE) still requires large amounts of total cellular RNA (5 µg), preventing its application to scarce biological samples such as those from early embryonic development or rare cell types. Here, we present SLIC-CAGE, a Super-Low Input Carrier-CAGE approach to capture 5′ ends of RNA polymerase II transcripts from as little as 5–10 ng of total RNA. This dramatic increase in sensitivity is achieved by specially designed, selectively degradable carrier RNA. We demon...
Brain : a journal of neurology, Jan 9, 2018
The transcription factor BCL11B is essential for development of the nervous and the immune system... more The transcription factor BCL11B is essential for development of the nervous and the immune system, and Bcl11b deficiency results in structural brain defects, reduced learning capacity, and impaired immune cell development in mice. However, the precise role of BCL11B in humans is largely unexplored, except for a single patient with a BCL11B missense mutation, affected by multisystem anomalies and profound immune deficiency. Using massively parallel sequencing we identified 13 patients bearing heterozygous germline alterations in BCL11B. Notably, all of them are affected by global developmental delay with speech impairment and intellectual disability; however, none displayed overt clinical signs of immune deficiency. Six frameshift mutations, two nonsense mutations, one missense mutation, and two chromosomal rearrangements resulting in diminished BCL11B expression, arose de novo. A further frameshift mutation was transmitted from a similarly affected mother. Interestingly, the most se...
Nature, Mar 15, 2018
Gametes are highly specialized cells that can give rise to the next generation through their abil... more Gametes are highly specialized cells that can give rise to the next generation through their ability to generate a totipotent zygote. In mice, germ cells are first specified in the developing embryo around embryonic day (E) 6.25 as primordial germ cells (PGCs). Following subsequent migration into the developing gonad, PGCs undergo a wave of extensive epigenetic reprogramming around E10.5-E11.5, including genome-wide loss of 5-methylcytosine. The underlying molecular mechanisms of this process have remained unclear, leading to our inability to recapitulate this step of germline development in vitro. Here we show, using an integrative approach, that this complex reprogramming process involves coordinated interplay among promoter sequence characteristics, DNA (de)methylation, the polycomb (PRC1) complex and both DNA demethylation-dependent and -independent functions of TET1 to enable the activation of a critical set of germline reprogramming-responsive genes involved in gamete generati...
Nucleic acids research, Jan 17, 2017
JASPAR (http://jaspar.genereg.net) is an openaccess database of curated, non-redundant transcript... more JASPAR (http://jaspar.genereg.net) is an openaccess database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
Nucleic acids research, Jan 7, 2017
Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an... more Comparative genomics has revealed a class of non-protein-coding genomic sequences that display an extraordinary degree of conservation between two or more organisms, regularly exceeding that found within protein-coding exons. These elements, collectively referred to as conserved non-coding elements (CNEs), are non-randomly distributed across chromosomes and tend to cluster in the vicinity of genes with regulatory roles in multicellular development and differentiation. CNEs are organized into functional ensembles called genomic regulatory blocks-dense clusters of elements that collectively coordinate the expression of shared target genes, and whose span in many cases coincides with topologically associated domains. CNEs display sequence properties that set them apart from other sequences under constraint, and have recently been proposed as useful markers for the reconstruction of the evolutionary history of organisms. Disruption of several of these elements is known to contribute to ...
The EMBO journal, Nov 4, 2017
While β-catenin has been demonstrated as an essential molecule and therapeutic target for various... more While β-catenin has been demonstrated as an essential molecule and therapeutic target for various cancer stem cells (CSCs) including those driven by MLL fusions, here we show that transcriptional memory from cells of origin predicts AML patient survival and allows β-catenin-independent transformation in MLL-CSCs derived from hematopoietic stem cell (HSC)-enriched LSK population but not myeloid-granulocyte progenitors. Mechanistically, β-catenin regulates expression of downstream targets of a key transcriptional memory gene, Hoxa9 that is highly enriched in LSK-derived MLL-CSCs and helps sustain leukemic self-renewal. Suppression of Hoxa9 sensitizes LSK-derived MLL-CSCs to β-catenin inhibition resulting in abolishment of CSC transcriptional program and transformation ability. In addition, further molecular and functional analyses identified Prmt1 as a key common downstream mediator for β-catenin/Hoxa9 functions in LSK-derived MLL-CSCs. Together, these findings not only uncover an une...
Genome Biology
Background: Inactivation of one X chromosome is established early in female mammalian development... more Background: Inactivation of one X chromosome is established early in female mammalian development and can be reversed in vivo and in vitro when pluripotency factors are re-expressed. The extent of reactivation along the inactive X chromosome (Xi) and the determinants of locus susceptibility are, however, poorly understood. Here we use cell fusion-mediated pluripotent reprograming to study human Xi reactivation and allele-specific single nucleotide polymorphisms (SNPs) to identify reactivated loci. Results: We show that a subset of human Xi genes is rapidly reactivated upon re-expression of the pluripotency network. These genes lie within the most evolutionary recent segments of the human X chromosome that are depleted of LINE1 and enriched for SINE elements, predicted to impair XIST spreading. Interestingly, this cadre of genes displays stochastic Xi expression in human fibroblasts ahead of reprograming. This stochastic variability is evident between clones, by RNA-sequencing, and at the single-cell level, by RNA-FISH, and is not attributable to differences in repressive histone H3K9me3 or H3K27me3 levels. Treatment with the DNA demethylating agent 5-deoxy-azacytidine does not increase Xi expression ahead of reprograming, but instead reveals a second cadre of genes that only become susceptible to reactivation upon induction of pluripotency. Conclusions: Collectively, these data not only underscore the multiple pathways that contribute to maintaining silencing along the human Xi chromosome but also suggest that transcriptional stochasticity among human cells could be useful for predicting and engineering epigenetic strategies to achieve locus-specific or domain-specific human Xi gene reactivation.
Nature Communications
Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding e... more Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding elements (CNEs). CNEs exhibit unexplained extreme levels of sequence conservation, with many acting as developmental long-range enhancers. Clusters of CNEs define the span of regulatory inputs for many important developmental regulators and have been described previously as genomic regulatory blocks (GRBs). Their function and distribution around important regulatory genes raises the question of how they relate to 3D conformation of these loci. Here, we show that clusters of CNEs strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human and Drosophila. The set of TADs that are associated with high levels of noncoding conservation exhibit distinct properties compared to TADs devoid of extreme noncoding conservation. The close correspondence between extreme noncoding conservation and TADs suggests that these TADs are ancient, revealing a regulatory architecture conserved over hundreds of millions of years.
Seminars in cell & developmental biology, Jan 16, 2016
Core promoters are minimal regions sufficient to direct accurate initiation of transcription and ... more Core promoters are minimal regions sufficient to direct accurate initiation of transcription and are crucial for regulation of gene expression. They are highly diverse in terms of associated core promoter motifs, underlying sequence composition and patterns of transcription initiation. Distinctive features of promoters are also seen at the chromatin level, including nucleosome positioning patterns and presence of specific histone modifications. Recent advances in identifying and characterizing promoters using next-generation sequencing-based technologies have provided the basis for their classification into functional groups and have shed light on their modes of regulation, with important implications for transcriptional regulation in development. This review discusses the methodology and the results of genome-wide studies that provided insight into the diversity of RNA polymerase II promoter architectures in vertebrates and other Metazoa, and the association of these architectures ...
Core promoters are minimal regions sufficient to direct accurate initiation of transcription and ... more Core promoters are minimal regions sufficient to direct accurate initiation of transcription and are crucial for regulation of gene expression. They are highly diverse in terms of associated core promoter motifs, underlying sequence composition and patterns of transcription initiation. Distinctive features of promoters are also seen at the chromatin level, including nucleosome positioning patterns and presence of specific histone modifications. Recent advances in identifying and characterizing promoters using next-generation sequencing-based technologies have provided the basis for their classification into functional groups and have shed light on their modes of regulation, with important implications for transcriptional regulation in development. This review discusses the methodology and the results of genome-wide studies that provided insight into the diversity of RNA polymerase II promoter architectures in vertebrates and other Metazoa, and the association of these architectures ...
Journal of Biological Chemistry, 2016
Isoleucyl-tRNA synthetase (IleRS) is unusual among aminoacyl-tRNA synthetases in having a tRNA-de... more Isoleucyl-tRNA synthetase (IleRS) is unusual among aminoacyl-tRNA synthetases in having a tRNA-dependent pre-transfer editing activity. Alongside the typical bacterial IleRS (such as Escherichia coli IleRS), some bacteria also have the enzymes (eukaryote-like) that cluster with eukaryotic IleRSs and exhibit low sensitivity to the antibiotic mupirocin. Our phylogenetic analysis suggests that the ileS1 and ileS2 genes of contemporary bacteria are the descendants of genes that might have arisen by an ancient duplication event before the separation of bacteria and archaea. We present the analysis of evolutionary constraints of the synthetic and editing reactions in eukaryotic/eukaryotelike IleRSs, which share a common origin but diverged through adaptation to different cell environments. The enzyme from the yeast cytosol exhibits tRNA-dependent pre-transfer editing analogous to E. coli IleRS. This argues for the presence of this proofreading in the common ancestor of both IleRS types and an ancient origin of the synthetic site-based quality control step. Yet surprisingly, the eukaryote-like enzyme from Streptomyces griseus IleRS lacks this capacity; at the same time, its synthetic site displays the 10 3-fold drop in sensitivity to antibiotic mupirocin relative to the yeast enzyme. The discovery that pre-transfer editing is optional in IleRSs lends support to the notion that the conserved post-transfer editing domain is the main checkpoint in these enzymes. We substantiated this by showing that under error-prone conditions S. griseus IleRS is able to rescue the growth of an E. coli lacking functional IleRS, providing the first evidence that tRNA-dependent pre-transfer editing in IleRS is not essential for cell viability. Aminoacyl-tRNA synthetases (aaRS) 3 establish the genetic code through the specific attachment of amino acid to their * This work was supported by the Unity through Knowledge Fund Grant 8/13
Cancer Cell, 2016
If citing, it is advised that you check and use the publisher's definitive version for pagination... more If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections.
Nucleic acids research, Jan 15, 2015
MicroRNAs (miRNAs) play a major role in the post-transcriptional regulation of target genes, espe... more MicroRNAs (miRNAs) play a major role in the post-transcriptional regulation of target genes, especially in development and differentiation. Our understanding about the transcriptional regulation of miRNA genes is limited by inadequate annotation of primary miRNA (pri-miRNA) transcripts. Here, we used CAGE-seq and RNA-seq to provide genome-wide identification of the pri-miRNA core promoter repertoire and its dynamic usage during zebrafish embryogenesis. We assigned pri-miRNA promoters to 152 precursor-miRNAs (pre-miRNAs), the majority of which were supported by promoter associated post-translational histone modifications (H3K4me3, H2A.Z) and RNA polymerase II (RNAPII) occupancy. We validated seven miR-9 pri-miRNAs by in situ hybridization and showed similar expression patterns as mature miR-9. In addition, processing of an alternative intronic promoter of miR-9-5 was validated by 5' RACE PCR. Developmental profiling revealed a subset of pri-miRNAs that are maternally inherited. M...