Georgi Marinov - Academia.edu (original) (raw)
Papers by Georgi Marinov
Journal of Experimental Medicine, 2015
MicroRNAs have emerged as key regulators of B cell fate decisions and immune function. Deregulati... more MicroRNAs have emerged as key regulators of B cell fate decisions and immune function. Deregulation of several microRNAs in B cells leads to the development of autoimmune disease and cancer in mice. We demonstrate that the microRNA-212/132 cluster (miR-212/132) is induced in B cells in response to B cell receptor signaling. Enforced expression of miR-132 results in a block in early B cell development at the prepro-B cell to pro-B cell transition and induces apoptosis in primary bone marrow B cells. Importantly, loss of miR-212/132 results in accelerated B cell recovery after antibody-mediated B cell depletion. We find that Sox4 is a target of miR-132 in B cells. Co-expression of SOX4 with miR-132 rescues the defect in B cell development from overexpression of miR-132 alone, thus suggesting that miR-132 may regulate B lymphopoiesis through Sox4. In addition, we show that the expression of miR-132 can inhibit cancer development in cells that are prone to B cell cancers, such as B cells expressing the c-Myc oncogene. We have thus uncovered miR-132 as a novel contributor to B cell development.
Cell Reports, 2015
In developing male germ cells, prospermatogonia, two Piwi proteins, MILI and MIWI2, use Piwi-inte... more In developing male germ cells, prospermatogonia, two Piwi proteins, MILI and MIWI2, use Piwi-interacting RNA (piRNA) guides to repress transposable element (TE) expression and ensure genome stability and proper gametogenesis. In addition to their roles in post-transcriptional TE repression, both proteins are required for DNA methylation of TE sequences. Here, we analyzed the effect of Miwi2 deficiency on piRNA biogenesis and transposon repression. Miwi2 deficiency had only a minor impact on piRNA biogenesis; however, the piRNA profile of Miwi2-knockout mice indicated overexpression of several LINE1 TE families that led to activation of the ping-pong piRNA cycle. Furthermore, we found that MILI and MIWI2 have distinct functions in TE repression in the nucleus. MILI is responsible for DNA methylation of a larger subset of TE families than MIWI2 is, suggesting that the proteins have independent roles in establishing DNA methylation patterns.
Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scient... more Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health.
Genome announcements, 2015
Here, we report the genome sequence of Magnetospirillum magnetotacticum strain MS-1, which consis... more Here, we report the genome sequence of Magnetospirillum magnetotacticum strain MS-1, which consists of of 36 contigs and 4,136 protein-coding genes.
Developmental cell, Jan 23, 2015
Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) re... more Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi, a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns, as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their data and report that the underlying deep-sequencing dataset does not support the authors' genome-wide conclusions.
G3 (Bethesda, Md.), 2014
ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a geno... more ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of d...
Cell stem cell, Jan 8, 2015
Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long nonco... more Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long noncoding RNAs (lncRNAs) have emerging roles in epigenetic regulation, but their potential functions in reprogramming cell fate have been largely unexplored. We used single-cell RNA sequencing to characterize the expression patterns of over 16,000 genes, including 437 lncRNAs, during defined stages of reprogramming to pluripotency. Self-organizing maps (SOMs) were used as an intuitive way to structure and interrogate transcriptome data at the single-cell level. Early molecular events during reprogramming involved the activation of Ras signaling pathways, along with hundreds of lncRNAs. Loss-of-function studies showed that activated lncRNAs can repress lineage-specific genes, while lncRNAs activated in multiple reprogramming cell types can regulate metabolic gene expression. Our findings demonstrate that reprogramming cells activate defined sets of functionally relevant lncRNAs and provide a r...
Nature, 2014
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupan... more To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
Cell Reports, 2014
Piwi-interacting (pi)RNAs repress diverse transposable elements in germ cells of Metazoa and are ... more Piwi-interacting (pi)RNAs repress diverse transposable elements in germ cells of Metazoa and are essential for fertility in both invertebrates and vertebrates. The precursors of piRNAs are transcribed from distinct genomic regions, the so-called piRNA clusters; however, how piRNA clusters are differentiated from the rest of the genome is not known. To address this question, we studied piRNA biogenesis in two D. virilis strains that show differential ability to generate piRNAs from several genomic regions. We found that active piRNA biogenesis correlates with high levels of histone 3 lysine 9 trimethylation (H3K9me3) over genomic regions that give rise to piRNAs. Furthermore, piRNA biogenesis in the progeny requires the transgenerational inheritance of an epigenetic signal, presumably in the form of homologous piRNAs that are generated in the maternal germline and deposited into the oocyte. The inherited piRNAs enhance piRNA biogenesis through the installment of H3K9me3 on piRNA clusters.
Genes & Development, 2014
Small noncoding RNAs that associate with Piwi proteins, called piRNAs, serve as guides for repres... more Small noncoding RNAs that associate with Piwi proteins, called piRNAs, serve as guides for repression of diverse transposable elements in germ cells of metazoa. In Drosophila, the genomic regions that give rise to piRNAs, the so-called piRNA clusters, are transcribed to generate long precursor molecules that are processed into mature piRNAs. How genomic regions that give rise to piRNA precursor transcripts are differentiated from the rest of the genome and how these transcripts are specifically channeled into the piRNA biogenesis pathway are not known. We found that transgenerationally inherited piRNAs provide the critical trigger for piRNA production from homologous genomic regions in the next generation by two different mechanisms. First, inherited piRNAs enhance processing of homologous transcripts into mature piRNAs by initiating the ping-pong cycle in the cytoplasm. Second, inherited piRNAs induce installment of the histone 3 Lys9 trimethylation (H3K9me3) mark on genomic piRNA cluster sequences. The heterochromatin protein 1 (HP1) homolog Rhino binds to the H3K9me3 mark through its chromodomain and is enriched over piRNA clusters. Rhino recruits the piRNA biogenesis factor Cutoff to piRNA clusters and is required for efficient transcription of piRNA precursors. We propose that transgenerationally inherited piRNAs act as an epigenetic memory for identification of substrates for piRNA biogenesis on two levels: by inducing a permissive chromatin environment for piRNA precursor synthesis and by enhancing processing of these precursors.
Plos Biology, 2011
Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scient... more Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health.
Scientific Reports, 2014
Chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) is the major contemporary me... more Chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) is the major contemporary method for mapping in vivo protein-DNA interactions in the genome. It identifies sites of transcription factor, cofactor and RNA polymerase occupancy, as well as the distribution of histone marks. Consortia such as the ENCyclopedia Of DNA Elements (ENCODE) have produced large datasets using manual protocols. However, future measurements of hundreds of additional factors in many cell types and physiological states call for higher throughput and consistency afforded by automation. Such automation advances, when provided by multiuser facilities, could also improve the quality and efficiency of individual small-scale projects. The immunoprecipitation process has become rate-limiting, and is a source of substantial variability when performed manually. Here we report a fully automated robotic ChIP (R-ChIP) pipeline that allows up to 96 reactions. A second bottleneck is the dearth of renewable ChIP-validated immune reagents, which do not yet exist for most mammalian transcription factors. We used R-ChIP to screen new mouse monoclonal antibodies raised against p300, a histone acetylase, well-known as a marker of active enhancers, for which ChIP-competent monoclonal reagents have been lacking. We identified, validated for ChIP-seq, and made publicly available a monoclonal reagent called ENCITp300-1.
Proceedings of the National Academy of Sciences of the United States of America, Jan 29, 2014
With the completion of the human genome sequence, attention turned to identifying and annotating ... more With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the o...
Proceedings of the National Academy of Sciences of the United States of America, Jan 19, 2014
Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human ... more Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease
Nature, Jan 20, 2014
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the p... more The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biolo...
Nature, 2012
The human genome encodes the blueprint of life, but the function of the vast majority of its near... more The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Mechanisms of Development, 2009
To ensure reproducible and exhaustive Y2H results, these libraries are screened to saturation usi... more To ensure reproducible and exhaustive Y2H results, these libraries are screened to saturation using an optimized mating procedure. This allows to test on average 100 million interactions per screen, corresponding to a 10-fold coverage of the library. As a consequence, multiple, independent fragments are isolated for each interactant, enabling the immediate delineation of a minimal interacting domain and the computation of a confidence score.
Genome Research, 2013
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, ... more We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.
Journal of Experimental Medicine, 2015
MicroRNAs have emerged as key regulators of B cell fate decisions and immune function. Deregulati... more MicroRNAs have emerged as key regulators of B cell fate decisions and immune function. Deregulation of several microRNAs in B cells leads to the development of autoimmune disease and cancer in mice. We demonstrate that the microRNA-212/132 cluster (miR-212/132) is induced in B cells in response to B cell receptor signaling. Enforced expression of miR-132 results in a block in early B cell development at the prepro-B cell to pro-B cell transition and induces apoptosis in primary bone marrow B cells. Importantly, loss of miR-212/132 results in accelerated B cell recovery after antibody-mediated B cell depletion. We find that Sox4 is a target of miR-132 in B cells. Co-expression of SOX4 with miR-132 rescues the defect in B cell development from overexpression of miR-132 alone, thus suggesting that miR-132 may regulate B lymphopoiesis through Sox4. In addition, we show that the expression of miR-132 can inhibit cancer development in cells that are prone to B cell cancers, such as B cells expressing the c-Myc oncogene. We have thus uncovered miR-132 as a novel contributor to B cell development.
Cell Reports, 2015
In developing male germ cells, prospermatogonia, two Piwi proteins, MILI and MIWI2, use Piwi-inte... more In developing male germ cells, prospermatogonia, two Piwi proteins, MILI and MIWI2, use Piwi-interacting RNA (piRNA) guides to repress transposable element (TE) expression and ensure genome stability and proper gametogenesis. In addition to their roles in post-transcriptional TE repression, both proteins are required for DNA methylation of TE sequences. Here, we analyzed the effect of Miwi2 deficiency on piRNA biogenesis and transposon repression. Miwi2 deficiency had only a minor impact on piRNA biogenesis; however, the piRNA profile of Miwi2-knockout mice indicated overexpression of several LINE1 TE families that led to activation of the ping-pong piRNA cycle. Furthermore, we found that MILI and MIWI2 have distinct functions in TE repression in the nucleus. MILI is responsible for DNA methylation of a larger subset of TE families than MIWI2 is, suggesting that the proteins have independent roles in establishing DNA methylation patterns.
Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scient... more Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health.
Genome announcements, 2015
Here, we report the genome sequence of Magnetospirillum magnetotacticum strain MS-1, which consis... more Here, we report the genome sequence of Magnetospirillum magnetotacticum strain MS-1, which consists of of 36 contigs and 4,136 protein-coding genes.
Developmental cell, Jan 23, 2015
Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) re... more Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi, a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns, as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their data and report that the underlying deep-sequencing dataset does not support the authors' genome-wide conclusions.
G3 (Bethesda, Md.), 2014
ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a geno... more ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of d...
Cell stem cell, Jan 8, 2015
Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long nonco... more Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long noncoding RNAs (lncRNAs) have emerging roles in epigenetic regulation, but their potential functions in reprogramming cell fate have been largely unexplored. We used single-cell RNA sequencing to characterize the expression patterns of over 16,000 genes, including 437 lncRNAs, during defined stages of reprogramming to pluripotency. Self-organizing maps (SOMs) were used as an intuitive way to structure and interrogate transcriptome data at the single-cell level. Early molecular events during reprogramming involved the activation of Ras signaling pathways, along with hundreds of lncRNAs. Loss-of-function studies showed that activated lncRNAs can repress lineage-specific genes, while lncRNAs activated in multiple reprogramming cell types can regulate metabolic gene expression. Our findings demonstrate that reprogramming cells activate defined sets of functionally relevant lncRNAs and provide a r...
Nature, 2014
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupan... more To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
Cell Reports, 2014
Piwi-interacting (pi)RNAs repress diverse transposable elements in germ cells of Metazoa and are ... more Piwi-interacting (pi)RNAs repress diverse transposable elements in germ cells of Metazoa and are essential for fertility in both invertebrates and vertebrates. The precursors of piRNAs are transcribed from distinct genomic regions, the so-called piRNA clusters; however, how piRNA clusters are differentiated from the rest of the genome is not known. To address this question, we studied piRNA biogenesis in two D. virilis strains that show differential ability to generate piRNAs from several genomic regions. We found that active piRNA biogenesis correlates with high levels of histone 3 lysine 9 trimethylation (H3K9me3) over genomic regions that give rise to piRNAs. Furthermore, piRNA biogenesis in the progeny requires the transgenerational inheritance of an epigenetic signal, presumably in the form of homologous piRNAs that are generated in the maternal germline and deposited into the oocyte. The inherited piRNAs enhance piRNA biogenesis through the installment of H3K9me3 on piRNA clusters.
Genes & Development, 2014
Small noncoding RNAs that associate with Piwi proteins, called piRNAs, serve as guides for repres... more Small noncoding RNAs that associate with Piwi proteins, called piRNAs, serve as guides for repression of diverse transposable elements in germ cells of metazoa. In Drosophila, the genomic regions that give rise to piRNAs, the so-called piRNA clusters, are transcribed to generate long precursor molecules that are processed into mature piRNAs. How genomic regions that give rise to piRNA precursor transcripts are differentiated from the rest of the genome and how these transcripts are specifically channeled into the piRNA biogenesis pathway are not known. We found that transgenerationally inherited piRNAs provide the critical trigger for piRNA production from homologous genomic regions in the next generation by two different mechanisms. First, inherited piRNAs enhance processing of homologous transcripts into mature piRNAs by initiating the ping-pong cycle in the cytoplasm. Second, inherited piRNAs induce installment of the histone 3 Lys9 trimethylation (H3K9me3) mark on genomic piRNA cluster sequences. The heterochromatin protein 1 (HP1) homolog Rhino binds to the H3K9me3 mark through its chromodomain and is enriched over piRNA clusters. Rhino recruits the piRNA biogenesis factor Cutoff to piRNA clusters and is required for efficient transcription of piRNA precursors. We propose that transgenerationally inherited piRNAs act as an epigenetic memory for identification of substrates for piRNA biogenesis on two levels: by inducing a permissive chromatin environment for piRNA precursor synthesis and by enhancing processing of these precursors.
Plos Biology, 2011
Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scient... more Abstract The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health.
Scientific Reports, 2014
Chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) is the major contemporary me... more Chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) is the major contemporary method for mapping in vivo protein-DNA interactions in the genome. It identifies sites of transcription factor, cofactor and RNA polymerase occupancy, as well as the distribution of histone marks. Consortia such as the ENCyclopedia Of DNA Elements (ENCODE) have produced large datasets using manual protocols. However, future measurements of hundreds of additional factors in many cell types and physiological states call for higher throughput and consistency afforded by automation. Such automation advances, when provided by multiuser facilities, could also improve the quality and efficiency of individual small-scale projects. The immunoprecipitation process has become rate-limiting, and is a source of substantial variability when performed manually. Here we report a fully automated robotic ChIP (R-ChIP) pipeline that allows up to 96 reactions. A second bottleneck is the dearth of renewable ChIP-validated immune reagents, which do not yet exist for most mammalian transcription factors. We used R-ChIP to screen new mouse monoclonal antibodies raised against p300, a histone acetylase, well-known as a marker of active enhancers, for which ChIP-competent monoclonal reagents have been lacking. We identified, validated for ChIP-seq, and made publicly available a monoclonal reagent called ENCITp300-1.
Proceedings of the National Academy of Sciences of the United States of America, Jan 29, 2014
With the completion of the human genome sequence, attention turned to identifying and annotating ... more With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the o...
Proceedings of the National Academy of Sciences of the United States of America, Jan 19, 2014
Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human ... more Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease
Nature, Jan 20, 2014
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the p... more The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biolo...
Nature, 2012
The human genome encodes the blueprint of life, but the function of the vast majority of its near... more The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Mechanisms of Development, 2009
To ensure reproducible and exhaustive Y2H results, these libraries are screened to saturation usi... more To ensure reproducible and exhaustive Y2H results, these libraries are screened to saturation using an optimized mating procedure. This allows to test on average 100 million interactions per screen, corresponding to a 10-fold coverage of the library. As a consequence, multiple, independent fragments are isolated for each interactant, enabling the immediate delineation of a minimal interacting domain and the computation of a confidence score.
Genome Research, 2013
We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, ... more We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.