Laurence Ettwiller - Academia.edu (original) (raw)
Papers by Laurence Ettwiller
The filarial parasitesMansonella ozzardiandMansonella perstans, causative agents of mansonellosis... more The filarial parasitesMansonella ozzardiandMansonella perstans, causative agents of mansonellosis, infect hundreds of millions of people worldwide, yet remain among the most understudied of the human filarial pathogens.M. ozzardiis highly prevalent in Latin American countries and Caribbean Islands, whileM. perstansis predominantly found in sub-Saharan Africa as well as in a few areas in South America. In addition to the differences in their geographical distribution, the two parasites are transmitted by different insect vectors, as well as exhibit differences in their responses to commonly used anthelminthic drugs. The lack of genome information has hindered investigations into the biology and evolution ofMansonellaparasites and understanding the molecular basis of the clinical differences between species. In the current study, high quality genomes of two independent clinical isolates ofM. perstansfrom Cameroon and twoM. ozzardiisolates one from Brazil and one from Venezuela are rep...
PLOS Genetics
Phosphorothioation (PT), in which a non-bridging oxygen is replaced by a sulfur, is one of the ra... more Phosphorothioation (PT), in which a non-bridging oxygen is replaced by a sulfur, is one of the rare modifications discovered in bacteria and archaea that occurs on the sugar-phosphate backbone as opposed to the nucleobase moiety of DNA. While PT modification is widespread in the prokaryotic kingdom, how PT modifications are distributed in the genomes and their exact roles in the cell remain to be defined. In this study, we developed a simple and convenient technique called EcoWI-seq based on a modification-dependent restriction endonuclease to identify genomic positions of PT modifications. EcoWI-seq shows similar performance than other PT modification detection techniques and additionally, is easily scalable while requiring little starting material. As a proof of principle, we applied EcoWI-seq to map the PT modifications at base resolution in the genomes of both the Salmonella enterica cerro 87 and E. coli expressing the dnd+ gene cluster. Specifically, we address whether the part...
Journal of biomolecular techniques : JBT, 2020
DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcyto... more DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) gives us greater insight into potential gene regulatory mechanisms. Bisulfite sequencing (BS) is traditionally used to detect methylated Cs, however, BS does have its drawbacks. DNA is commonly damaged and degraded by the chemical bisulfite reaction resulting in libraries that demonstrate high GC bias and are enriched for methylated regions. To overcome these limitations, we developed an enzymatic approach, NEBNext Enzymatic Methyl-seq (EM-seq), for methylation detection that minimizes DNA damage, resulting in longer fragments and minimal GC bias. Illumina libraries were prepared using bisulfite and EM-seq methods. Libraries generated with NA12878 DNA inputs ranging from 10 ng to 200 ng were sequenced using Illumina's NovaSeq 6000. Reads were adapter trimmed (trimadap) and aligned to GRCh38 using BWAMeth. Aggregate metrics like GC bias an...
The intracellular endosymbiotic proteobacteria Wolbachia have evolved across the phyla nematoda a... more The intracellular endosymbiotic proteobacteria Wolbachia have evolved across the phyla nematoda and arthropoda. In Wolbachia phylogeny, supergroup F is the only clade with members from both arthropod and filarial nematode hosts and therefore can provide unique insights into their evolution and biology. In this study, 4 new supergroup F Wolbachia genomes have been assembled using a metagenomic assembly and binning approach, wMoz and wMpe from the human filarial parasites Mansonella ozzardi and Mansonella perstans, and wOcae and wMoviF from the blue mason bee Osmia caerulescens and the sheep ked Melophagus ovinus respectively. A comprehensive phylogenomic analysis revealed two independent origins of filarial Wolbachia in supergroup F from ancestral arthropod hosts. The analysis also reveals that the switch from arthropod to filarial host is accompanied by a convergent pseudogenization and loss of the bacterioferritin gene, a phenomenon found to be shared by all filarial Wolbachia, eve...
Science, 2017
When is a mutation a true genetic variant? Large-scale sequencing studies have set out to determi... more When is a mutation a true genetic variant? Large-scale sequencing studies have set out to determine the low-frequency pathogenic genetic variants in individuals and populations. However, Chen et al. demonstrate that many so-called low-frequency genetic variants in large public databases may be due to DNA damage. They scored libraries sequenced with and without a DNA damage–repairing enzymatic mix to assess the proportion of true rare variants. It remains to be seen how best to repair DNA before sequencing to provide more accurate assessments of mutation. Science , this issue p. 752
Genome Research
Covalent modifications of genomic DNA are crucial for most organisms to survive. Amplicon-based h... more Covalent modifications of genomic DNA are crucial for most organisms to survive. Amplicon-based high-throughput sequencing technologies erase all DNA modifications to retain only sequence information for the four canonical nucleobases, necessitating specialized technologies for ascertaining epigenetic information. To also capture base modification information, we developed Methyl-SNP-seq, a technology that takes advantage of the complementarity of the double helix to extract the methylation and original sequence information from a single DNA molecule. More specifically, Methyl-SNP-seq uses bisulfite conversion of one of the strands to identify cytosine methylation while retaining the original four-bases sequence information on the other strand. As both strands are locked together to link the dual readouts on a single paired-end read, Methyl-SNP-seq allows detecting the methylation status of any DNA even without a reference genome. Because one of the strands retains the original four...
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RN... more The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5ʹ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
Genome Research, 2021
Determination of eukaryotic Transcription Start Sites (TSS) has been based on methods that requir... more Determination of eukaryotic Transcription Start Sites (TSS) has been based on methods that require the cap structure at the 5-prime end of transcripts derived from Pol-II RNA polymerase. Consequently, these methods do not reveal TSS derived from the other RNA polymerases which also play critical roles in various cell functions. To address this limitation, we developed ReCappable-seq which comprehensively identifies TSS for both Pol-lI and non-Pol-II transcripts at single-nucleotide resolution. The method relies on specific enzymatic exchange of 5-prime m7G caps and 5-prime triphosphates with a selectable tag. When applied to human transcriptomes, ReCappable-seq identifies Pol-II TSS that are in agreement with orthogonal methods such as CAGE. Additionally, ReCappable-seq reveals a rich landscape of TSS associated with Pol-III transcripts which have not previously been amenable to study at genome-wide scale. Novel TSS from non-Pol-II transcription can be located in the nuclear and mit...
Molecular and Cellular Biology / Genetics, 2019
DNA isolated from blood draws (cell-free DNA (cfDNA)) or from archival material like formalin fix... more DNA isolated from blood draws (cell-free DNA (cfDNA)) or from archival material like formalin fixed paraffin embedded (FFPE) tissues have advanced the field of cancer genetics. DNA methylation (5-methylcytosines (5mC) and 5-hydroxymethylcytosines (5hmC)) is a key epigenetic factor that plays an important role in cellular processes and it’s misregulation results in diseased states like cancer. Advances in the field of sample preparation from biological matrices and genomics have enabled cancer biomarker identification based on methylation profiling. Bisulfite sequencing is the standard method to detect methylation and has been employed for both targeted and whole genome methylation analysis. However, the chemical based bisulfite conversion of cytosines to uracils also results in DNA damage which subsequently results in shorter DNA insert sizes as well as introducing bias into the data. Robust biomarker detection relies primarily on the ability to profile methylation accurately. Analysis of DNA methylation from cfDNA and FFPE DNA is challenging as the DNA is typically of low quality and quantity. To overcome the drawbacks of bisulfite sequencing, we developed an enzyme based methylation detection technology, called NEBNext Enzymatic Methyl-Seq (EM-Seq). DNA damage is minimized enabling longer insert sizes, lower duplication rates and minimal GC bias resulting in more accurate quantification of methylation in the sample DNA. Using EM-Seq, we profiled cfDNA and FFPE DNA from multiple tissue types. Results for these challenging DNA types showed that the EM-Seq libraries had longer inserts, lower duplication rates, higher percentages of mapped reads and less GC bias compared to WGBS libraries. These libraries also identified a higher number of CpG’s and the estimated global methylation levels were in good agreement with the absolute levels quantified using LC/MS. In conclusion, EM-Seq libraries have superior sequencing metrics resulting in robust methylation profiling for these types of challenging DNA samples. Citation Format: Louise Williams, V K Chaithanya Ponnaluri, Brittany S. Sexton, Lana Saleh, Katherine Marks, Mala Samaranayake, Laurence Ettwiller, Shengxi Guan, Heidi E. Church, Nan Dai, Esta Tamanaha, Erbay Yigit, Bradley Langhorst, Zhiyi Sun, Thomas C. Evans, Romualdas Vaisvila, Eileen Dimalanta, Theodore B. Davis. Enzymatic Methyl-Seq: methylome analysis of challenging DNA samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 820.
Genome research, Nov 1, 2016
The processed high-throughput sequencing used in this article was hosted initially at the Nationa... more The processed high-throughput sequencing used in this article was hosted initially at the National Institute for Medical Research (UK). Due to its migration to the new Francis Crick Institute, the URL to visualize the data has changed. The updated URL is
RNA, 2021
Nanopore sequencing devices read individual RNA strands directly. This facilitates identification... more Nanopore sequencing devices read individual RNA strands directly. This facilitates identification of exon linkages and nucleotide modifications; however, using conventional methods the 5′ and 3′ ends of poly(A) RNA cannot be identified unambiguously. This is due in part to the architecture of the nanopore/enzyme-motor complex, and in part to RNA degradation in vivo and in vitro that can obscure transcription start and end sites. In this study, we aimed to identify individual full-length human RNA isoform scaffolds among ~4 million nanopore poly(A)-selected RNA reads. First, to identify RNA strands bearing 5′ m7G caps, we exchanged the biological cap for a modified cap attached to a 45-nucleotide oligomer. This oligomer adaptation method improved 5′ end sequencing and ensured correct identification of the 5′ m7G capped ends. Second, among these 5′-capped nanopore reads, we screened for ionic current signatures consistent with a 3′ polyadenylation site. Combining these two steps, we i...
Briefings in Bioinformatics
Alternative transcription units (ATUs) are dynamically encoded under different conditions and dis... more Alternative transcription units (ATUs) are dynamically encoded under different conditions and display overlapping patterns (sharing one or more genes) under a specific condition in bacterial genomes. Genome-scale identification of ATUs is essential for studying the emergence of human diseases caused by bacterial organisms. However, it is unrealistic to identify all ATUs using experimental techniques because of the complexity and dynamic nature of ATUs. Here, we present the first-of-its-kind computational framework, named SeqATU, for genome-scale ATU prediction based on next-generation RNA-Seq data. The framework utilizes a convex quadratic programming model to seek an optimum expression combination of all of the to-be-identified ATUs. The predicted ATUs in Escherichia coli reached a precision of 0.77/0.74 and a recall of 0.75/0.76 in the two RNA-Sequencing datasets compared with the benchmarked ATUs from third-generation RNA-Seq data. In addition, the proportion of 5′- or 3′-end gen...
Molecular and Cellular Biology / Genetics
Pervasive mutations in somatic cells generate a heterogeneous genomic population within an organi... more Pervasive mutations in somatic cells generate a heterogeneous genomic population within an organism and may result in serious medical conditions. While cancer is the most studied disease associated with somatic variations, recent advances in single cell and ultra deep sequencing indicate that a number of phenotypes and pathologies are impacted by cell specific variants. Currently, the accurate identification of low allelic frequency somatic variants relies on a combination of deep sequencing coverage and multiple evidences of the presence of variants. However, in this study we show that false positive variants can account for more than 70% of identified somatic variations, rendering conventional detection methods inadequate for accurate determination of low allelic variants. Interestingly, these false positive variants primarily originate from mutagenic DNA damage which directly confounds determination of genuine somatic mutations. Furthermore, we developed and validated a simple me...
We have developed Cappable-seq that specifically captures primary RNA transcripts by enzymaticall... more We have developed Cappable-seq that specifically captures primary RNA transcripts by enzymatically modifying the 5' triphosphorylated end of RNA with a selectable tag. We first applied Cappable-seq to E. coli , achieving up to 50 fold enrichment of primary transcripts and identifying an unprecedented 16539 transcription start sites (TSS) genome-wide at single base resolution. We also applied Cappable-seq to a mouse cecum sample and for the first time identified TSS in a microbiome. Furthermore, Cappable-seq universally depletes ribosomal RNA and reduces the complexity of the transcriptome to a single quantifiable tag per TSS enabling digital profiling of gene expression in any microbiome.
transcription-associated motifs in vertebrates
The filarial parasitesMansonella ozzardiandMansonella perstans, causative agents of mansonellosis... more The filarial parasitesMansonella ozzardiandMansonella perstans, causative agents of mansonellosis, infect hundreds of millions of people worldwide, yet remain among the most understudied of the human filarial pathogens.M. ozzardiis highly prevalent in Latin American countries and Caribbean Islands, whileM. perstansis predominantly found in sub-Saharan Africa as well as in a few areas in South America. In addition to the differences in their geographical distribution, the two parasites are transmitted by different insect vectors, as well as exhibit differences in their responses to commonly used anthelminthic drugs. The lack of genome information has hindered investigations into the biology and evolution ofMansonellaparasites and understanding the molecular basis of the clinical differences between species. In the current study, high quality genomes of two independent clinical isolates ofM. perstansfrom Cameroon and twoM. ozzardiisolates one from Brazil and one from Venezuela are rep...
PLOS Genetics
Phosphorothioation (PT), in which a non-bridging oxygen is replaced by a sulfur, is one of the ra... more Phosphorothioation (PT), in which a non-bridging oxygen is replaced by a sulfur, is one of the rare modifications discovered in bacteria and archaea that occurs on the sugar-phosphate backbone as opposed to the nucleobase moiety of DNA. While PT modification is widespread in the prokaryotic kingdom, how PT modifications are distributed in the genomes and their exact roles in the cell remain to be defined. In this study, we developed a simple and convenient technique called EcoWI-seq based on a modification-dependent restriction endonuclease to identify genomic positions of PT modifications. EcoWI-seq shows similar performance than other PT modification detection techniques and additionally, is easily scalable while requiring little starting material. As a proof of principle, we applied EcoWI-seq to map the PT modifications at base resolution in the genomes of both the Salmonella enterica cerro 87 and E. coli expressing the dnd+ gene cluster. Specifically, we address whether the part...
Journal of biomolecular techniques : JBT, 2020
DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcyto... more DNA methylation is important for gene regulation. The ability to accurately identify 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) gives us greater insight into potential gene regulatory mechanisms. Bisulfite sequencing (BS) is traditionally used to detect methylated Cs, however, BS does have its drawbacks. DNA is commonly damaged and degraded by the chemical bisulfite reaction resulting in libraries that demonstrate high GC bias and are enriched for methylated regions. To overcome these limitations, we developed an enzymatic approach, NEBNext Enzymatic Methyl-seq (EM-seq), for methylation detection that minimizes DNA damage, resulting in longer fragments and minimal GC bias. Illumina libraries were prepared using bisulfite and EM-seq methods. Libraries generated with NA12878 DNA inputs ranging from 10 ng to 200 ng were sequenced using Illumina's NovaSeq 6000. Reads were adapter trimmed (trimadap) and aligned to GRCh38 using BWAMeth. Aggregate metrics like GC bias an...
The intracellular endosymbiotic proteobacteria Wolbachia have evolved across the phyla nematoda a... more The intracellular endosymbiotic proteobacteria Wolbachia have evolved across the phyla nematoda and arthropoda. In Wolbachia phylogeny, supergroup F is the only clade with members from both arthropod and filarial nematode hosts and therefore can provide unique insights into their evolution and biology. In this study, 4 new supergroup F Wolbachia genomes have been assembled using a metagenomic assembly and binning approach, wMoz and wMpe from the human filarial parasites Mansonella ozzardi and Mansonella perstans, and wOcae and wMoviF from the blue mason bee Osmia caerulescens and the sheep ked Melophagus ovinus respectively. A comprehensive phylogenomic analysis revealed two independent origins of filarial Wolbachia in supergroup F from ancestral arthropod hosts. The analysis also reveals that the switch from arthropod to filarial host is accompanied by a convergent pseudogenization and loss of the bacterioferritin gene, a phenomenon found to be shared by all filarial Wolbachia, eve...
Science, 2017
When is a mutation a true genetic variant? Large-scale sequencing studies have set out to determi... more When is a mutation a true genetic variant? Large-scale sequencing studies have set out to determine the low-frequency pathogenic genetic variants in individuals and populations. However, Chen et al. demonstrate that many so-called low-frequency genetic variants in large public databases may be due to DNA damage. They scored libraries sequenced with and without a DNA damage–repairing enzymatic mix to assess the proportion of true rare variants. It remains to be seen how best to repair DNA before sequencing to provide more accurate assessments of mutation. Science , this issue p. 752
Genome Research
Covalent modifications of genomic DNA are crucial for most organisms to survive. Amplicon-based h... more Covalent modifications of genomic DNA are crucial for most organisms to survive. Amplicon-based high-throughput sequencing technologies erase all DNA modifications to retain only sequence information for the four canonical nucleobases, necessitating specialized technologies for ascertaining epigenetic information. To also capture base modification information, we developed Methyl-SNP-seq, a technology that takes advantage of the complementarity of the double helix to extract the methylation and original sequence information from a single DNA molecule. More specifically, Methyl-SNP-seq uses bisulfite conversion of one of the strands to identify cytosine methylation while retaining the original four-bases sequence information on the other strand. As both strands are locked together to link the dual readouts on a single paired-end read, Methyl-SNP-seq allows detecting the methylation status of any DNA even without a reference genome. Because one of the strands retains the original four...
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RN... more The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested sub genomic RNAs used to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5ʹ cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
Genome Research, 2021
Determination of eukaryotic Transcription Start Sites (TSS) has been based on methods that requir... more Determination of eukaryotic Transcription Start Sites (TSS) has been based on methods that require the cap structure at the 5-prime end of transcripts derived from Pol-II RNA polymerase. Consequently, these methods do not reveal TSS derived from the other RNA polymerases which also play critical roles in various cell functions. To address this limitation, we developed ReCappable-seq which comprehensively identifies TSS for both Pol-lI and non-Pol-II transcripts at single-nucleotide resolution. The method relies on specific enzymatic exchange of 5-prime m7G caps and 5-prime triphosphates with a selectable tag. When applied to human transcriptomes, ReCappable-seq identifies Pol-II TSS that are in agreement with orthogonal methods such as CAGE. Additionally, ReCappable-seq reveals a rich landscape of TSS associated with Pol-III transcripts which have not previously been amenable to study at genome-wide scale. Novel TSS from non-Pol-II transcription can be located in the nuclear and mit...
Molecular and Cellular Biology / Genetics, 2019
DNA isolated from blood draws (cell-free DNA (cfDNA)) or from archival material like formalin fix... more DNA isolated from blood draws (cell-free DNA (cfDNA)) or from archival material like formalin fixed paraffin embedded (FFPE) tissues have advanced the field of cancer genetics. DNA methylation (5-methylcytosines (5mC) and 5-hydroxymethylcytosines (5hmC)) is a key epigenetic factor that plays an important role in cellular processes and it’s misregulation results in diseased states like cancer. Advances in the field of sample preparation from biological matrices and genomics have enabled cancer biomarker identification based on methylation profiling. Bisulfite sequencing is the standard method to detect methylation and has been employed for both targeted and whole genome methylation analysis. However, the chemical based bisulfite conversion of cytosines to uracils also results in DNA damage which subsequently results in shorter DNA insert sizes as well as introducing bias into the data. Robust biomarker detection relies primarily on the ability to profile methylation accurately. Analysis of DNA methylation from cfDNA and FFPE DNA is challenging as the DNA is typically of low quality and quantity. To overcome the drawbacks of bisulfite sequencing, we developed an enzyme based methylation detection technology, called NEBNext Enzymatic Methyl-Seq (EM-Seq). DNA damage is minimized enabling longer insert sizes, lower duplication rates and minimal GC bias resulting in more accurate quantification of methylation in the sample DNA. Using EM-Seq, we profiled cfDNA and FFPE DNA from multiple tissue types. Results for these challenging DNA types showed that the EM-Seq libraries had longer inserts, lower duplication rates, higher percentages of mapped reads and less GC bias compared to WGBS libraries. These libraries also identified a higher number of CpG’s and the estimated global methylation levels were in good agreement with the absolute levels quantified using LC/MS. In conclusion, EM-Seq libraries have superior sequencing metrics resulting in robust methylation profiling for these types of challenging DNA samples. Citation Format: Louise Williams, V K Chaithanya Ponnaluri, Brittany S. Sexton, Lana Saleh, Katherine Marks, Mala Samaranayake, Laurence Ettwiller, Shengxi Guan, Heidi E. Church, Nan Dai, Esta Tamanaha, Erbay Yigit, Bradley Langhorst, Zhiyi Sun, Thomas C. Evans, Romualdas Vaisvila, Eileen Dimalanta, Theodore B. Davis. Enzymatic Methyl-Seq: methylome analysis of challenging DNA samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 820.
Genome research, Nov 1, 2016
The processed high-throughput sequencing used in this article was hosted initially at the Nationa... more The processed high-throughput sequencing used in this article was hosted initially at the National Institute for Medical Research (UK). Due to its migration to the new Francis Crick Institute, the URL to visualize the data has changed. The updated URL is
RNA, 2021
Nanopore sequencing devices read individual RNA strands directly. This facilitates identification... more Nanopore sequencing devices read individual RNA strands directly. This facilitates identification of exon linkages and nucleotide modifications; however, using conventional methods the 5′ and 3′ ends of poly(A) RNA cannot be identified unambiguously. This is due in part to the architecture of the nanopore/enzyme-motor complex, and in part to RNA degradation in vivo and in vitro that can obscure transcription start and end sites. In this study, we aimed to identify individual full-length human RNA isoform scaffolds among ~4 million nanopore poly(A)-selected RNA reads. First, to identify RNA strands bearing 5′ m7G caps, we exchanged the biological cap for a modified cap attached to a 45-nucleotide oligomer. This oligomer adaptation method improved 5′ end sequencing and ensured correct identification of the 5′ m7G capped ends. Second, among these 5′-capped nanopore reads, we screened for ionic current signatures consistent with a 3′ polyadenylation site. Combining these two steps, we i...
Briefings in Bioinformatics
Alternative transcription units (ATUs) are dynamically encoded under different conditions and dis... more Alternative transcription units (ATUs) are dynamically encoded under different conditions and display overlapping patterns (sharing one or more genes) under a specific condition in bacterial genomes. Genome-scale identification of ATUs is essential for studying the emergence of human diseases caused by bacterial organisms. However, it is unrealistic to identify all ATUs using experimental techniques because of the complexity and dynamic nature of ATUs. Here, we present the first-of-its-kind computational framework, named SeqATU, for genome-scale ATU prediction based on next-generation RNA-Seq data. The framework utilizes a convex quadratic programming model to seek an optimum expression combination of all of the to-be-identified ATUs. The predicted ATUs in Escherichia coli reached a precision of 0.77/0.74 and a recall of 0.75/0.76 in the two RNA-Sequencing datasets compared with the benchmarked ATUs from third-generation RNA-Seq data. In addition, the proportion of 5′- or 3′-end gen...
Molecular and Cellular Biology / Genetics
Pervasive mutations in somatic cells generate a heterogeneous genomic population within an organi... more Pervasive mutations in somatic cells generate a heterogeneous genomic population within an organism and may result in serious medical conditions. While cancer is the most studied disease associated with somatic variations, recent advances in single cell and ultra deep sequencing indicate that a number of phenotypes and pathologies are impacted by cell specific variants. Currently, the accurate identification of low allelic frequency somatic variants relies on a combination of deep sequencing coverage and multiple evidences of the presence of variants. However, in this study we show that false positive variants can account for more than 70% of identified somatic variations, rendering conventional detection methods inadequate for accurate determination of low allelic variants. Interestingly, these false positive variants primarily originate from mutagenic DNA damage which directly confounds determination of genuine somatic mutations. Furthermore, we developed and validated a simple me...
We have developed Cappable-seq that specifically captures primary RNA transcripts by enzymaticall... more We have developed Cappable-seq that specifically captures primary RNA transcripts by enzymatically modifying the 5' triphosphorylated end of RNA with a selectable tag. We first applied Cappable-seq to E. coli , achieving up to 50 fold enrichment of primary transcripts and identifying an unprecedented 16539 transcription start sites (TSS) genome-wide at single base resolution. We also applied Cappable-seq to a mouse cecum sample and for the first time identified TSS in a microbiome. Furthermore, Cappable-seq universally depletes ribosomal RNA and reduces the complexity of the transcriptome to a single quantifiable tag per TSS enabling digital profiling of gene expression in any microbiome.
transcription-associated motifs in vertebrates