Zsolt Boldogkoi | University of Szeged (original) (raw)
Papers by Zsolt Boldogkoi
Biochimica et biophysica acta (N), Sep 13, 1994
We determined the entire DNA sequence of two adjacent open reading frames of Aujeszkry's disease ... more We determined the entire DNA sequence of two adjacent open reading frames of Aujeszkry's disease virus encoding ribonucleotide reductase genes with the intergenic sequence of 9 bp. From the sequence analysis we deduce that ORFs encode large and small subunits, with sizes of 835 and 303 amino acids, respectively. Amino acid sequence comparison of ADV RR2 with that of equine herpesvirus type 1, bovine herpesvirus type 1, HSV-1 and varicella zoster virus revealed that 48% of amino acids represent clusters of residues conserved in all compared sequences. In the N-terminal part ADV RR1 shows low homology to the RR1 of other herpesviruses. Rest of the RR1 protein contains highly conserved amino acid sequences divided by blocks of low homology.
The Journal of Neuroscience, Jun 9, 2004
Sympathetic premotor neurons directly control sympathetic preganglionic neurons (SPNs) in the int... more Sympathetic premotor neurons directly control sympathetic preganglionic neurons (SPNs) in the intermediolateral cell column (IML) of the thoracic spinal cord, and many of these premotor neurons are localized in the medulla oblongata. The rostral ventrolateral medulla contains premotor neurons controlling the cardiovascular conditions, whereas rostral medullary raphe regions are a candidate source of sympathetic premotor neurons for thermoregulatory functions. Here, we show that these medullary raphe regions contain putative glutamatergic neurons and that these neurons directly control thermoregulatory SPNs. Neurons expressing vesicular glutamate transporter 3 (VGLUT3) were distributed in the rat medullary raphe regions, including the raphe magnus and rostral raphe pallidus nuclei, and mostly lacked serotonin immunoreactivity. These VGLUT3-positive neurons expressed Fos in response to cold exposure or to central administration of prostaglandin E 2 , a pyrogenic mediator. Transneuronal retrograde labeling after inoculation of pseudorabies virus into the interscapular brown adipose tissue (BAT) or the tail indicated that those VGLUT3-expressing medullary raphe neurons innervated these thermoregulatory effector organs multisynaptically through SPNs of specific thoracic segments, and microinjection of glutamate into the IML of the BAT-controlling segments produced BAT thermogenesis. An anterograde tracing study further showed a direct projection of those VGLUT3-expressing medullary raphe neurons to the dendrites of SPNs. Furthermore, intra-IML application of glutamate receptor antagonists blocked BAT thermogenesis triggered by disinhibition of the medullary raphe regions. The present results suggest that VGLUT3-expressing neurons in the medullary raphe regions constitute excitatory neurons that could be categorized as a novel group of sympathetic premotor neurons for thermoregulatory functions, including fever.
Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is know... more Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming. Results: Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a ltering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when ltering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming lters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads. Conclusions: Our ndings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous ltering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.
Characterization of global transcriptomes using conventional short-read sequencing is challenging... more Characterization of global transcriptomes using conventional short-read sequencing is challenging because of the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps, etc. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the dynamic vaccinia virus (VACV) transcriptome and .
BMC Genomics, Oct 29, 2018
Background: Understanding the underlying genetic structure of human populations is of fundamental... more Background: Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which remains costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs). Results: To unbiasedly compare the effect of SNP selection strategies in population genetic analysis we subsampled the variants of the same highly curated 1 K Genome dataset to mimic genome, exome sequencing and array data in order to eliminate the effect of different chemistry and error profiles of these different approaches. Next we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods. Conclusions: Our results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.
Scientific Reports, Jun 5, 2018
The Autographa californica multiple nucleopolyhedrovirus (AcMNPV) is an insect-pathogen baculovir... more The Autographa californica multiple nucleopolyhedrovirus (AcMNPV) is an insect-pathogen baculovirus. In this study, we applied the Oxford Nanopore Technologies platform for the analysis of the polyadenylated fraction of the viral transcriptome using both cDNA and direct RNA sequencing methods. We identified and annotated altogether 132 novel transcripts and transcript isoforms, including 4 coding and 4 non-coding RNA molecules, 47 length variants, 5 splice isoforms, as well as 23 polycistronic and 49 complex transcripts. All of the identified novel protein-coding genes were 5′-truncated forms of longer host genes. In this work, we demonstrated that in the case of transcript start site isoforms, the promoters and the initiator sequence of the longer and shorter variants belong to the same kinetic class. Long-read sequencing also revealed a complex meshwork of transcriptional overlaps, the function of which needs to be clarified. Additionally, we developed bioinformatics methods to improve the transcript annotation and to eliminate the non-specific transcription reads generated by template switching and false priming.
Scientific Reports, Aug 2, 2017
We carried out whole-exome ultra-high throughput sequencing in brain samples of suicide victims w... more We carried out whole-exome ultra-high throughput sequencing in brain samples of suicide victims who had suffered from major depressive disorder and control subjects who had died from other causes. This study aimed to reveal the selective accumulation of rare variants in the coding and the UTR sequences within the genes of suicide victims. We also analysed the potential effect of STR and CNV variations, as well as the infection of the brain with neurovirulent viruses in this behavioural disorder. As a result, we have identified several candidate genes, among others three calcium channel genes that may potentially contribute to completed suicide. We also explored the potential implication of the TGF-β signalling pathway in the pathogenesis of suicidal behaviour. To our best knowledge, this is the first study that uses whole-exome sequencing for the investigation of suicide. Close to 20 million suicides are attempted annually worldwide, of which more than one million are completed 1. Suicide is the 10 th leading cause of mortality in the world, which supports the importance of better defining the genetic causes and social basis of this disorder, and to identify individuals at risk. Suicide is a complex behaviour, determined by the interaction between proximal and distant risk factors. The proximal factors include recent life events, substance abuse and mental disorders, such as major depressive disorder (MDD), bipolar disorder and schizophrenia. The most important distal factors are the genetic and epigenetic factors, family history, early-life adversity and personality 2, 3 The most common underlying disorder is MDD, which is the leading cause of disability worldwide 4 ; more than 50% of suicide victims suffer from this disease, which increases the risk of suicide by up to twentyfold 5. A number of studies have shown a familial accumulation of suicidal behaviour including suicide completion and attempt 6 .Twin and adoption studies have revealed that the heritability of suicide ranges between 30-55% 7. According to the current consensus, depression is etiologically a heterogeneous disease with overlapping causal pathways 8 , but logically, completed suicide with MDD may have a much less diverse genetic background. The heritable components of suicidal behaviour have until recently only been investigated either by hypothesis-driven research that focuses on preselected candidate genes 9-11 , or by the comparison of the frequencies of common genetic variants 12, 13. Neurobiological evidence implicates the dysfunction of the HPA axis 14, 15 , as well as the serotonergic 16-18 , the dopaminergic 19, 20 and other systems in suicidality. The candidate gene approach has to date yielded very few results with general consensus. Genome-wide association studies (GWASs), in spite of their large sample sizes have not explored any association signals in depression 21, 22 , which may be in connection with the heterogeneous genetic background of MDD or it may also be possible that the causative genetic factors of depression could lie outside of the scope of these studies. In contrast to candidate gene and GWASs, whole-exome studies (WES) or whole-genome studies (WGS) allow for the
Nature Methods, Jan 4, 2009
bioRxiv (Cold Spring Harbor Laboratory), Mar 27, 2023
In the last couple of years, the rapid advances and decreasing costs of sequencing technologies h... more In the last couple of years, the rapid advances and decreasing costs of sequencing technologies have revolutionized transcriptomic research. Long-read sequencing (LRS) techniques are able to detect full-length RNA molecules in a single run without the need for additional assembly steps. LRS studies have revealed an unexpected transcriptomic complexity in a variety of organisms, including viruses. A number of transcripts with proven or putative regulatory role, mapping close to or overlapping the replication origins (Oris) and the nearby transcription activator genes, have been described in herpesviruses. In this study, we applied both newly generated and previously published LRS and short-read sequencing datasets to discover additional Ori-proximal transcripts in nine herpesviruses belonging to all of the three subfamilies (alpha, beta and gamma). We identified novel long non-coding RNAs (lncRNAs), as well as splice and length isoforms of mRNAs and lncRNAs. Furthermore, our analysis disclosed an intricate meshwork of transcriptional overlaps at the examined genomic regions. Our results suggest the existence of a 'super regulatory center', which controls both the replication and the global transcription through multilevel interactions between the molecular machineries.
Research Square (Research Square), Oct 23, 2019
Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is know... more Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming. Results: Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a filtering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when filtering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming filters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads. Conclusions: Our findings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous filtering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.
In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 ... more In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 (HSV-1) transcriptome using direct RNA sequencing (dRNA-Seq) on nanopore arrays. The authors provided a useful dataset on full-length viral and host RNA molecules. In this study, we reanalyzed the published dataset and compared it with data generated by our group and others. Our comparative study clearly demonstrated the need for multiplatform and meta-analytic approaches for transcriptome profiling to obtain reliable results. Taken together, employing multiplatform approaches with distinct library preparation methods is especially important in transcriptome research because of the high error-rate and the variances in the results obtained using miscellaneous library preparation, sequencing and annotation methods. Furthermore, meta-analyses can control the potential errors derived from using different kits and protocols, as well as from dissimilar working styles and conditions in different laboratories. Methods Datasets The datasets generated by Depledge et al. 7 and five other datasets (Tombácz et al. 11,13 ; Tang et al. 8 ; Rutkowski et al. 9 , and Whisnant et al. 10) were reanalyzed in order to define the complete HSV transcriptome. Data analysis The adapter sequences from the raw reads of each SRS run were removed by using Cutadapt v2.6 software. The fastp tool was used for validation. Further, we aligned the sequencing reads to the HSV-1 reference genome (GenBank: X14112.1) using minimap2 or STAR mapper for the LRS or the SRS data, respectively. The LoRTIA tool was used to annotate introns and TSSs, and TESs from the LRS data, whereas we used the STAR software was used to detect introns from the SRS samples. The previously published introns (Tang et al. 8 , Wishnant et al. 10 , and Tombácz et al. 11,13) were compared with each other, reanalyzed, and validated by using the datasets from all of the aforementioned publications.
Pathogens
Viral transcriptomes that are determined using first- and second-generation sequencing techniques... more Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse tra...
GigaScience, Oct 17, 2022
Background: Recent studies have disclosed the genome, transcriptome, and epigenetic compositions ... more Background: Recent studies have disclosed the genome, transcriptome, and epigenetic compositions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the effect of viral infection on gene expression of the host cells. It has been demonstrated that, besides the major canonical transcripts, the viral genome also codes for noncanonical RNA molecules. While the structural characterizations have revealed a detailed transcriptomic architecture of the virus, the kinetic studies provided poor and often misleading results on the dynamics of both the viral and host transcripts due to the low temporal resolution of the infection event and the low virus/cell ratio (multiplicity of infection [MOI] = 0.1) applied for the infection. It has never been tested whether the alteration in the host gene expressions is caused by aging of the cells or by the viral infection. Findings: In this study, we used Oxford Nanopore's direct cDNA and direct RNA sequencing methods for the generation of a highcoverage, high temporal resolution transcriptomic dataset of SARS-CoV-2 and of the primate host cells, using a high infection titer (MOI = 5). Sixteen sampling time points ranging from 1 to 96 hours with a varying time resolution and 3 biological replicates were used in the experiment. In addition, for each infected sample, corresponding noninfected samples were employed. The raw reads were mapped to the viral and to the host reference genomes, resulting in 49,661,499 mapped reads (54,62 Gbs). The genome of the viral isolate was also sequenced and phylogenetically classified. Conclusions: This dataset can serve as a valuable resource for profiling the SARS-CoV-2 transcriptome dynamics, the virus-host interactions, and the RNA base modifications. Comparison of expression profiles of the host gene in the virally infected and in noninfected cells at different time points allows making a distinction between the effect of the aging of cells in culture and the viral infection. These data can provide useful information for potential novel gene annotations and can also be used for studying the currently available bioinformatics pipelines.
NeuroSci
In rats, some parvocellular paraventricular neurons project to spinal autonomic centers. Using th... more In rats, some parvocellular paraventricular neurons project to spinal autonomic centers. Using the virus tracing technique, we have demonstrated that some magnocellular paraventricular neurons, but not supraoptic neurons, also project to autonomic preganglionic centers of the mammary gland, gingiva, or lip. A part of these neurons has shown oxytocin immunoreactivity. In the present experiment, we have examined whether the same magnocellular neuron that sends fibers to the retina or autonomic preganglionic centers of the eye also projects to the posterior pituitary. Double neurotropic viral labeling and oxytocin immunohistochemistry were used. After inoculation of the posterior pituitary and the eye with viruses, spreading in a retrograde direction and expressing different fluorescence proteins, we looked for double-labeled neurons in paraventricular and supraoptic nuclei. Double-labeled neurons were observed in non-sympathectomized and cervical-sympathectomized animals. Some double-...
Viruses
In this work, a long-read sequencing (LRS) technique based on the Oxford Nanopore Technology MinI... more In this work, a long-read sequencing (LRS) technique based on the Oxford Nanopore Technology MinION platform was used for quantifying and kinetic characterization of the poly(A) fraction of bovine alphaherpesvirus type 1 (BoHV-1) lytic transcriptome across a 12-h infection period. Amplification-based LRS techniques frequently generate artefactual transcription reads and are biased towards the production of shorter amplicons. To avoid these undesired effects, we applied direct cDNA sequencing, an amplification-free technique. Here, we show that a single promoter can produce multiple transcription start sites whose distribution patterns differ among the viral genes but are similar in the same gene at different timepoints. Our investigations revealed that the circ gene is expressed with immediate–early (IE) kinetics by utilizing a special mechanism based on the use of the promoter of another IE gene (bicp4) for the transcriptional control. Furthermore, we detected an overlap between th...
In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 ... more In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 (HSV-1) transcriptome using direct RNA sequencing (dRNA-Seq) on nanopore arrays. The authors provided a useful dataset on full-length viral and host RNA molecules. In this study, we reanalyzed the published dataset and compared it with data generated by our group and others. Our comparative study clearly demonstrated the need for multiplatform and meta-analytic approaches for transcriptome profiling to obtain reliable results.
Pathogens, 2021
Vesicular stomatitis Indiana virus (VSIV) of genus Vesiculovirus, species IndianaVesiculovirus (f... more Vesicular stomatitis Indiana virus (VSIV) of genus Vesiculovirus, species IndianaVesiculovirus (formerly as Vesicular stomatitis virus, VSV) causes a disease in livestock that is very similar to the foot and mouth disease, thereby an outbreak may lead to significant economic loss. Long-read sequencing (LRS) -based approaches already reveal a hidden complexity of the transcriptomes in several viruses. This technique has been utilized for the sequencing of the VSIV genome, but our study is the first for the application of this technique for the profiling of the VSIV transcriptome. Since LRS is able to sequence full-length RNA molecules, it thereby provides more accurate annotation of the transcriptomes than the traditional short-read sequencing methods. The objectives of this study were to assemble the complete transcriptome of using nanopore sequencing, to ascertain cell-type specificity and dynamics of viral gene expression, and to evaluate host gene expression changes induced by th...
In the last couple of years, the implementation of long-read sequencing (LRS) technologies for tr... more In the last couple of years, the implementation of long-read sequencing (LRS) technologies for transcriptome profiling has uncovered an extreme complexity of viral gene expression. In this study, we carried out a systematic analysis on the pseudorabies virus transcriptome by combining our current data obtained by using Pacific Biosciences Sequel and Oxford Nanopore Technologies MinION sequencings with our earlier data generated by other LRS and short-read sequencing techniques. As a result, we identified a number of novel genes, transcripts, and transcript isoforms, including splice and length variants, and also confirmed earlier annotated RNA molecules. One of the major findings of this study is the discovery of a large number of 5’-truncated putative mRNAs embedded into larger host mRNAs. A large fraction of these RNA molecules contain in-frame ORFs, which may encode N-terminally truncated polypeptides. These study demonstrates that the PRV transcriptome is much more complex than ...
SUMMARYLong-read sequencing (LRS) has become a standard approach for transcriptome analysis in re... more SUMMARYLong-read sequencing (LRS) has become a standard approach for transcriptome analysis in recent years. This technology is also used for the identification and annotation of genes of various organisms, including viruses. Bovine herpesvirus type 1 (BoHV-1) is an important pathogen of cattle worldwide. However, the transcriptome of this virus is still largely unannotated. This study reports the profiling of the dynamic lytic transcriptome of BoHV-1 using two long-read sequencing (LRS) techniques, the Oxford Nanopore Technology (ONT) MinION, and the Illumina LoopSeq synthetic LRS methods, using multiple library preparation protocols. In this work, we annotated viral mRNAs and non-coding transcripts, and a large number of transcript isoforms, including transcription start and end sites, as well as splice variants of BoHV-1. Very long polycistronic and complex viral transcripts were also detected. Our analysis demonstrated an extremely complex pattern of transcriptional overlaps for...
African swine fever virus (ASFV) is an important animal pathogen causing substantial economic los... more African swine fever virus (ASFV) is an important animal pathogen causing substantial economic losses in the swine industry globally. At present, little is known about the molecular biology of ASFV, including its transcriptome organization. In this study, we applied cutting-edge sequencing approaches, namely the Illumina short-read sequencing (SRS) and the Oxford Nanopore Technologies long-read sequencing (LRS) techniques, together with several library preparation chemistries to analyze the ASFV dynamic transcriptome. SRS can generate a large amount of high-precision sequencing reads, but it is inefficient for identifying long RNA molecules, transcript isoforms and overlapping transcripts. LRS can overcome these limitations, but this approach also has shortcomings, such as its high error rate and the low coverage. Amplification-based LRS techniques produce relatively high read counts but also high levels of spurious transcripts, whereas the non-amplified cDNA and direct RNA sequencin...
Biochimica et biophysica acta (N), Sep 13, 1994
We determined the entire DNA sequence of two adjacent open reading frames of Aujeszkry's disease ... more We determined the entire DNA sequence of two adjacent open reading frames of Aujeszkry's disease virus encoding ribonucleotide reductase genes with the intergenic sequence of 9 bp. From the sequence analysis we deduce that ORFs encode large and small subunits, with sizes of 835 and 303 amino acids, respectively. Amino acid sequence comparison of ADV RR2 with that of equine herpesvirus type 1, bovine herpesvirus type 1, HSV-1 and varicella zoster virus revealed that 48% of amino acids represent clusters of residues conserved in all compared sequences. In the N-terminal part ADV RR1 shows low homology to the RR1 of other herpesviruses. Rest of the RR1 protein contains highly conserved amino acid sequences divided by blocks of low homology.
The Journal of Neuroscience, Jun 9, 2004
Sympathetic premotor neurons directly control sympathetic preganglionic neurons (SPNs) in the int... more Sympathetic premotor neurons directly control sympathetic preganglionic neurons (SPNs) in the intermediolateral cell column (IML) of the thoracic spinal cord, and many of these premotor neurons are localized in the medulla oblongata. The rostral ventrolateral medulla contains premotor neurons controlling the cardiovascular conditions, whereas rostral medullary raphe regions are a candidate source of sympathetic premotor neurons for thermoregulatory functions. Here, we show that these medullary raphe regions contain putative glutamatergic neurons and that these neurons directly control thermoregulatory SPNs. Neurons expressing vesicular glutamate transporter 3 (VGLUT3) were distributed in the rat medullary raphe regions, including the raphe magnus and rostral raphe pallidus nuclei, and mostly lacked serotonin immunoreactivity. These VGLUT3-positive neurons expressed Fos in response to cold exposure or to central administration of prostaglandin E 2 , a pyrogenic mediator. Transneuronal retrograde labeling after inoculation of pseudorabies virus into the interscapular brown adipose tissue (BAT) or the tail indicated that those VGLUT3-expressing medullary raphe neurons innervated these thermoregulatory effector organs multisynaptically through SPNs of specific thoracic segments, and microinjection of glutamate into the IML of the BAT-controlling segments produced BAT thermogenesis. An anterograde tracing study further showed a direct projection of those VGLUT3-expressing medullary raphe neurons to the dendrites of SPNs. Furthermore, intra-IML application of glutamate receptor antagonists blocked BAT thermogenesis triggered by disinhibition of the medullary raphe regions. The present results suggest that VGLUT3-expressing neurons in the medullary raphe regions constitute excitatory neurons that could be categorized as a novel group of sympathetic premotor neurons for thermoregulatory functions, including fever.
Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is know... more Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming. Results: Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a ltering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when ltering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming lters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads. Conclusions: Our ndings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous ltering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.
Characterization of global transcriptomes using conventional short-read sequencing is challenging... more Characterization of global transcriptomes using conventional short-read sequencing is challenging because of the insensitivity of these platforms to transcripts isoforms, multigenic RNA molecules, and transcriptional overlaps, etc. Long-read sequencing (LRS) can overcome these limitations by reading full-length transcripts. Employment of these technologies has led to the redefinition of transcriptional complexities in reported organisms. In this study, we applied LRS platforms from Pacific Biosciences and Oxford Nanopore Technologies to profile the dynamic vaccinia virus (VACV) transcriptome and .
BMC Genomics, Oct 29, 2018
Background: Understanding the underlying genetic structure of human populations is of fundamental... more Background: Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which remains costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs). Results: To unbiasedly compare the effect of SNP selection strategies in population genetic analysis we subsampled the variants of the same highly curated 1 K Genome dataset to mimic genome, exome sequencing and array data in order to eliminate the effect of different chemistry and error profiles of these different approaches. Next we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods. Conclusions: Our results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.
Scientific Reports, Jun 5, 2018
The Autographa californica multiple nucleopolyhedrovirus (AcMNPV) is an insect-pathogen baculovir... more The Autographa californica multiple nucleopolyhedrovirus (AcMNPV) is an insect-pathogen baculovirus. In this study, we applied the Oxford Nanopore Technologies platform for the analysis of the polyadenylated fraction of the viral transcriptome using both cDNA and direct RNA sequencing methods. We identified and annotated altogether 132 novel transcripts and transcript isoforms, including 4 coding and 4 non-coding RNA molecules, 47 length variants, 5 splice isoforms, as well as 23 polycistronic and 49 complex transcripts. All of the identified novel protein-coding genes were 5′-truncated forms of longer host genes. In this work, we demonstrated that in the case of transcript start site isoforms, the promoters and the initiator sequence of the longer and shorter variants belong to the same kinetic class. Long-read sequencing also revealed a complex meshwork of transcriptional overlaps, the function of which needs to be clarified. Additionally, we developed bioinformatics methods to improve the transcript annotation and to eliminate the non-specific transcription reads generated by template switching and false priming.
Scientific Reports, Aug 2, 2017
We carried out whole-exome ultra-high throughput sequencing in brain samples of suicide victims w... more We carried out whole-exome ultra-high throughput sequencing in brain samples of suicide victims who had suffered from major depressive disorder and control subjects who had died from other causes. This study aimed to reveal the selective accumulation of rare variants in the coding and the UTR sequences within the genes of suicide victims. We also analysed the potential effect of STR and CNV variations, as well as the infection of the brain with neurovirulent viruses in this behavioural disorder. As a result, we have identified several candidate genes, among others three calcium channel genes that may potentially contribute to completed suicide. We also explored the potential implication of the TGF-β signalling pathway in the pathogenesis of suicidal behaviour. To our best knowledge, this is the first study that uses whole-exome sequencing for the investigation of suicide. Close to 20 million suicides are attempted annually worldwide, of which more than one million are completed 1. Suicide is the 10 th leading cause of mortality in the world, which supports the importance of better defining the genetic causes and social basis of this disorder, and to identify individuals at risk. Suicide is a complex behaviour, determined by the interaction between proximal and distant risk factors. The proximal factors include recent life events, substance abuse and mental disorders, such as major depressive disorder (MDD), bipolar disorder and schizophrenia. The most important distal factors are the genetic and epigenetic factors, family history, early-life adversity and personality 2, 3 The most common underlying disorder is MDD, which is the leading cause of disability worldwide 4 ; more than 50% of suicide victims suffer from this disease, which increases the risk of suicide by up to twentyfold 5. A number of studies have shown a familial accumulation of suicidal behaviour including suicide completion and attempt 6 .Twin and adoption studies have revealed that the heritability of suicide ranges between 30-55% 7. According to the current consensus, depression is etiologically a heterogeneous disease with overlapping causal pathways 8 , but logically, completed suicide with MDD may have a much less diverse genetic background. The heritable components of suicidal behaviour have until recently only been investigated either by hypothesis-driven research that focuses on preselected candidate genes 9-11 , or by the comparison of the frequencies of common genetic variants 12, 13. Neurobiological evidence implicates the dysfunction of the HPA axis 14, 15 , as well as the serotonergic 16-18 , the dopaminergic 19, 20 and other systems in suicidality. The candidate gene approach has to date yielded very few results with general consensus. Genome-wide association studies (GWASs), in spite of their large sample sizes have not explored any association signals in depression 21, 22 , which may be in connection with the heterogeneous genetic background of MDD or it may also be possible that the causative genetic factors of depression could lie outside of the scope of these studies. In contrast to candidate gene and GWASs, whole-exome studies (WES) or whole-genome studies (WGS) allow for the
Nature Methods, Jan 4, 2009
bioRxiv (Cold Spring Harbor Laboratory), Mar 27, 2023
In the last couple of years, the rapid advances and decreasing costs of sequencing technologies h... more In the last couple of years, the rapid advances and decreasing costs of sequencing technologies have revolutionized transcriptomic research. Long-read sequencing (LRS) techniques are able to detect full-length RNA molecules in a single run without the need for additional assembly steps. LRS studies have revealed an unexpected transcriptomic complexity in a variety of organisms, including viruses. A number of transcripts with proven or putative regulatory role, mapping close to or overlapping the replication origins (Oris) and the nearby transcription activator genes, have been described in herpesviruses. In this study, we applied both newly generated and previously published LRS and short-read sequencing datasets to discover additional Ori-proximal transcripts in nine herpesviruses belonging to all of the three subfamilies (alpha, beta and gamma). We identified novel long non-coding RNAs (lncRNAs), as well as splice and length isoforms of mRNAs and lncRNAs. Furthermore, our analysis disclosed an intricate meshwork of transcriptional overlaps at the examined genomic regions. Our results suggest the existence of a 'super regulatory center', which controls both the replication and the global transcription through multilevel interactions between the molecular machineries.
Research Square (Research Square), Oct 23, 2019
Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is know... more Background: Alternative polyadenylation is commonly examined using cDNA sequencing, which is known to be affected by template-switching artifacts. However, the effects of such template-switching artifacts on alternative polyadenylation are generally disregarded, while alternative polyadenylation artifacts are attributed to internal priming. Results: Here, we analyzed both long-read cDNA sequencing and direct RNA sequencing data of two organisms, generated by different sequencing platforms. We developed a filtering algorithm which takes into consideration that template-switching can be a source of artifactual polyadenylation when filtering out spurious polyadenylation sites. The algorithm outperformed the conventional internal priming filters based on comparison to direct RNA sequencing data. We also showed that the polyadenylation artifacts arise in cDNA sequencing at consecutive stretches of as few as three adenines. There was no substantial difference between the lengths of poly(A) tails at the artifactual and the true transcriptional end sites even though it is expected that internal priming artifacts have shorter poly(A) tails than genuine polyadenylated reads. Conclusions: Our findings suggest that template switching plays an important role in the generation of spurious polyadenylation and support the need for more rigorous filtering of artifactual polyadenylation sites in cDNA data, or that alternative polyadenylation should be annotated using native RNA sequencing.
In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 ... more In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 (HSV-1) transcriptome using direct RNA sequencing (dRNA-Seq) on nanopore arrays. The authors provided a useful dataset on full-length viral and host RNA molecules. In this study, we reanalyzed the published dataset and compared it with data generated by our group and others. Our comparative study clearly demonstrated the need for multiplatform and meta-analytic approaches for transcriptome profiling to obtain reliable results. Taken together, employing multiplatform approaches with distinct library preparation methods is especially important in transcriptome research because of the high error-rate and the variances in the results obtained using miscellaneous library preparation, sequencing and annotation methods. Furthermore, meta-analyses can control the potential errors derived from using different kits and protocols, as well as from dissimilar working styles and conditions in different laboratories. Methods Datasets The datasets generated by Depledge et al. 7 and five other datasets (Tombácz et al. 11,13 ; Tang et al. 8 ; Rutkowski et al. 9 , and Whisnant et al. 10) were reanalyzed in order to define the complete HSV transcriptome. Data analysis The adapter sequences from the raw reads of each SRS run were removed by using Cutadapt v2.6 software. The fastp tool was used for validation. Further, we aligned the sequencing reads to the HSV-1 reference genome (GenBank: X14112.1) using minimap2 or STAR mapper for the LRS or the SRS data, respectively. The LoRTIA tool was used to annotate introns and TSSs, and TESs from the LRS data, whereas we used the STAR software was used to detect introns from the SRS samples. The previously published introns (Tang et al. 8 , Wishnant et al. 10 , and Tombácz et al. 11,13) were compared with each other, reanalyzed, and validated by using the datasets from all of the aforementioned publications.
Pathogens
Viral transcriptomes that are determined using first- and second-generation sequencing techniques... more Viral transcriptomes that are determined using first- and second-generation sequencing techniques are incomplete. Due to the short read length, these methods are inefficient or fail to distinguish between transcript isoforms, polycistronic RNAs, and transcriptional overlaps and readthroughs. Additionally, these approaches are insensitive for the identification of splice and transcriptional start sites (TSSs) and, in most cases, transcriptional end sites (TESs), especially in transcript isoforms with varying transcript ends, and in multi-spliced transcripts. Long-read sequencing is able to read full-length nucleic acids and can therefore be used to assemble complete transcriptome atlases. Although vaccinia virus (VACV) does not produce spliced RNAs, its transcriptome has a high diversity of TSSs and TESs, and a high degree of polycistronism that leads to enormous complexity. We applied single-molecule, real-time, and nanopore-based sequencing methods to investigate the time-lapse tra...
GigaScience, Oct 17, 2022
Background: Recent studies have disclosed the genome, transcriptome, and epigenetic compositions ... more Background: Recent studies have disclosed the genome, transcriptome, and epigenetic compositions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the effect of viral infection on gene expression of the host cells. It has been demonstrated that, besides the major canonical transcripts, the viral genome also codes for noncanonical RNA molecules. While the structural characterizations have revealed a detailed transcriptomic architecture of the virus, the kinetic studies provided poor and often misleading results on the dynamics of both the viral and host transcripts due to the low temporal resolution of the infection event and the low virus/cell ratio (multiplicity of infection [MOI] = 0.1) applied for the infection. It has never been tested whether the alteration in the host gene expressions is caused by aging of the cells or by the viral infection. Findings: In this study, we used Oxford Nanopore's direct cDNA and direct RNA sequencing methods for the generation of a highcoverage, high temporal resolution transcriptomic dataset of SARS-CoV-2 and of the primate host cells, using a high infection titer (MOI = 5). Sixteen sampling time points ranging from 1 to 96 hours with a varying time resolution and 3 biological replicates were used in the experiment. In addition, for each infected sample, corresponding noninfected samples were employed. The raw reads were mapped to the viral and to the host reference genomes, resulting in 49,661,499 mapped reads (54,62 Gbs). The genome of the viral isolate was also sequenced and phylogenetically classified. Conclusions: This dataset can serve as a valuable resource for profiling the SARS-CoV-2 transcriptome dynamics, the virus-host interactions, and the RNA base modifications. Comparison of expression profiles of the host gene in the virally infected and in noninfected cells at different time points allows making a distinction between the effect of the aging of cells in culture and the viral infection. These data can provide useful information for potential novel gene annotations and can also be used for studying the currently available bioinformatics pipelines.
NeuroSci
In rats, some parvocellular paraventricular neurons project to spinal autonomic centers. Using th... more In rats, some parvocellular paraventricular neurons project to spinal autonomic centers. Using the virus tracing technique, we have demonstrated that some magnocellular paraventricular neurons, but not supraoptic neurons, also project to autonomic preganglionic centers of the mammary gland, gingiva, or lip. A part of these neurons has shown oxytocin immunoreactivity. In the present experiment, we have examined whether the same magnocellular neuron that sends fibers to the retina or autonomic preganglionic centers of the eye also projects to the posterior pituitary. Double neurotropic viral labeling and oxytocin immunohistochemistry were used. After inoculation of the posterior pituitary and the eye with viruses, spreading in a retrograde direction and expressing different fluorescence proteins, we looked for double-labeled neurons in paraventricular and supraoptic nuclei. Double-labeled neurons were observed in non-sympathectomized and cervical-sympathectomized animals. Some double-...
Viruses
In this work, a long-read sequencing (LRS) technique based on the Oxford Nanopore Technology MinI... more In this work, a long-read sequencing (LRS) technique based on the Oxford Nanopore Technology MinION platform was used for quantifying and kinetic characterization of the poly(A) fraction of bovine alphaherpesvirus type 1 (BoHV-1) lytic transcriptome across a 12-h infection period. Amplification-based LRS techniques frequently generate artefactual transcription reads and are biased towards the production of shorter amplicons. To avoid these undesired effects, we applied direct cDNA sequencing, an amplification-free technique. Here, we show that a single promoter can produce multiple transcription start sites whose distribution patterns differ among the viral genes but are similar in the same gene at different timepoints. Our investigations revealed that the circ gene is expressed with immediate–early (IE) kinetics by utilizing a special mechanism based on the use of the promoter of another IE gene (bicp4) for the transcriptional control. Furthermore, we detected an overlap between th...
In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 ... more In a recent article, Depledge and colleagues reported a study of the herpes simplex virus type 1 (HSV-1) transcriptome using direct RNA sequencing (dRNA-Seq) on nanopore arrays. The authors provided a useful dataset on full-length viral and host RNA molecules. In this study, we reanalyzed the published dataset and compared it with data generated by our group and others. Our comparative study clearly demonstrated the need for multiplatform and meta-analytic approaches for transcriptome profiling to obtain reliable results.
Pathogens, 2021
Vesicular stomatitis Indiana virus (VSIV) of genus Vesiculovirus, species IndianaVesiculovirus (f... more Vesicular stomatitis Indiana virus (VSIV) of genus Vesiculovirus, species IndianaVesiculovirus (formerly as Vesicular stomatitis virus, VSV) causes a disease in livestock that is very similar to the foot and mouth disease, thereby an outbreak may lead to significant economic loss. Long-read sequencing (LRS) -based approaches already reveal a hidden complexity of the transcriptomes in several viruses. This technique has been utilized for the sequencing of the VSIV genome, but our study is the first for the application of this technique for the profiling of the VSIV transcriptome. Since LRS is able to sequence full-length RNA molecules, it thereby provides more accurate annotation of the transcriptomes than the traditional short-read sequencing methods. The objectives of this study were to assemble the complete transcriptome of using nanopore sequencing, to ascertain cell-type specificity and dynamics of viral gene expression, and to evaluate host gene expression changes induced by th...
In the last couple of years, the implementation of long-read sequencing (LRS) technologies for tr... more In the last couple of years, the implementation of long-read sequencing (LRS) technologies for transcriptome profiling has uncovered an extreme complexity of viral gene expression. In this study, we carried out a systematic analysis on the pseudorabies virus transcriptome by combining our current data obtained by using Pacific Biosciences Sequel and Oxford Nanopore Technologies MinION sequencings with our earlier data generated by other LRS and short-read sequencing techniques. As a result, we identified a number of novel genes, transcripts, and transcript isoforms, including splice and length variants, and also confirmed earlier annotated RNA molecules. One of the major findings of this study is the discovery of a large number of 5’-truncated putative mRNAs embedded into larger host mRNAs. A large fraction of these RNA molecules contain in-frame ORFs, which may encode N-terminally truncated polypeptides. These study demonstrates that the PRV transcriptome is much more complex than ...
SUMMARYLong-read sequencing (LRS) has become a standard approach for transcriptome analysis in re... more SUMMARYLong-read sequencing (LRS) has become a standard approach for transcriptome analysis in recent years. This technology is also used for the identification and annotation of genes of various organisms, including viruses. Bovine herpesvirus type 1 (BoHV-1) is an important pathogen of cattle worldwide. However, the transcriptome of this virus is still largely unannotated. This study reports the profiling of the dynamic lytic transcriptome of BoHV-1 using two long-read sequencing (LRS) techniques, the Oxford Nanopore Technology (ONT) MinION, and the Illumina LoopSeq synthetic LRS methods, using multiple library preparation protocols. In this work, we annotated viral mRNAs and non-coding transcripts, and a large number of transcript isoforms, including transcription start and end sites, as well as splice variants of BoHV-1. Very long polycistronic and complex viral transcripts were also detected. Our analysis demonstrated an extremely complex pattern of transcriptional overlaps for...
African swine fever virus (ASFV) is an important animal pathogen causing substantial economic los... more African swine fever virus (ASFV) is an important animal pathogen causing substantial economic losses in the swine industry globally. At present, little is known about the molecular biology of ASFV, including its transcriptome organization. In this study, we applied cutting-edge sequencing approaches, namely the Illumina short-read sequencing (SRS) and the Oxford Nanopore Technologies long-read sequencing (LRS) techniques, together with several library preparation chemistries to analyze the ASFV dynamic transcriptome. SRS can generate a large amount of high-precision sequencing reads, but it is inefficient for identifying long RNA molecules, transcript isoforms and overlapping transcripts. LRS can overcome these limitations, but this approach also has shortcomings, such as its high error rate and the low coverage. Amplification-based LRS techniques produce relatively high read counts but also high levels of spurious transcripts, whereas the non-amplified cDNA and direct RNA sequencin...