Paras Sehgal - Academia.edu (original) (raw)
Papers by Paras Sehgal
approach to connect the long tail for zebrafish gene annotation
Briefings in functional genomics, 2021
The utility of model organisms to understand the function of a novel transcript/genes has allowed... more The utility of model organisms to understand the function of a novel transcript/genes has allowed us to delineate their molecular mechanisms in maintaining cellular homeostasis. Organisms such as zebrafish have contributed a lot in the field of developmental and disease biology. Attributable to advancement and deep transcriptomics, many new transcript isoforms and non-coding RNAs such as long noncoding RNA (lncRNA) and circular RNAs (circRNAs) have been identified and cataloged in multiple databases and many more are yet to be identified. Various methods and tools have been utilized to identify lncRNAs/circRNAs in zebrafish using deep sequencing of transcriptomes as templates. Functional analysis of a few candidates such as tie1-AS, ECAL1 and CDR1as in zebrafish provides a prospective outline to approach other known or novel lncRNA/circRNA. New genetic alteration tools like TALENS and CRISPRs have helped in probing for the molecular function of lncRNA/circRNA in zebrafish. Further l...
BackgroundCircular RNAs are a novel class of non-coding RNAs that backsplice from 5’ donor site a... more BackgroundCircular RNAs are a novel class of non-coding RNAs that backsplice from 5’ donor site and 3’ acceptor site to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circular RNAs in rat, a well studied model organism. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages and 2 genders motivated us to explore the landscape of circular RNAs in the organism.MethodologyIn order to understand the difference among different pipelines, we have used the same bodymap RNA sequencing dataset. A number of pipelines have been published to identify the backsplice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circular RNAs. We employed 5 different combinations of tools including toph...
The EMBO Journal, 2021
Long non‐coding RNAs (lncRNAs) are emerging as key regulators of endothelial cell function. Here,... more Long non‐coding RNAs (lncRNAs) are emerging as key regulators of endothelial cell function. Here, we investigated the role of a novel vascular endothelial‐associated lncRNA (VEAL2) in regulating endothelial permeability. Precise editing of veal2 loci in zebrafish (veal2gib005Δ8/+) induced cranial hemorrhage. In vitro and in vivo studies revealed that veal2 competes with diacylglycerol for interaction with protein kinase C beta‐b (Prkcbb) and regulates its kinase activity. Using PRKCB2 as bait, we identified functional ortholog of veal2 in humans from HUVECs and named it as VEAL2. Overexpression and knockdown of VEAL2 affected tubulogenesis and permeability in HUVECs. VEAL2 was differentially expressed in choroid tissue in eye and blood from patients with diabetic retinopathy, a disease where PRKCB2 is known to be hyperactivated. Further, VEAL2 could rescue the effects of PRKCB2‐mediated turnover of endothelial junctional proteins thus reducing hyperpermeability in hyperglycemic HUVE...
Human Genomics
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. ... more Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India.Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. Th...
Long noncoding RNAs (lncRNAs) belong to a class of RNA transcripts that do not have the potential... more Long noncoding RNAs (lncRNAs) belong to a class of RNA transcripts that do not have the potential to code for proteins. LncRNAs were largely discovered in the transcriptomes of human and several model organisms, using next-generation sequencing (NGS) approaches, which have enabled a comprehensive genome scale annotation of transcripts. LncRNAs are known to have dynamic expression status and have the potential to orchestrate gene regulation at the epigenetic, transcriptional, and posttranscriptional levels. Here we describe the experimental methods involved in the discovery of lncRNAs from the transcriptome of a popular model organism zebrafish (Danio rerio). A structured and well-designed computational analysis pipeline subsequent to the RNA sequencing can be instrumental in revealing the diversity of the lncRNA transcripts. We describe one such computational pipeline used for the discovery of novel lncRNA transcripts in zebrafish. We also detail the validation of the putative novel...
Coronavirus disease (COVID-19) emerged from a city in China and has now spread as a global pandem... more Coronavirus disease (COVID-19) emerged from a city in China and has now spread as a global pandemic affecting millions of individuals. The causative agent, SARS-CoV-2 is being extensively studied in terms of its genetic epidemiology using genomic approaches. Andhra Pradesh is one of the major states of India with the third-largest number of COVID-19 cases with limited understanding of its genetic epidemiology. In this study, we have sequenced 293 SARS-CoV-2 genome isolates from Andhra Pradesh with a mean coverage of 13,324X. We identified 564 high-quality SARS-CoV-2 variants, out of which 15 are novel. A total of 18 variants mapped to RT-PCR primer/probe sites, and 4 variants are known to be associated with an increase in infectivity. Phylogenetic analysis of the genomes revealed the circulating SARS-CoV-2 in Andhra Pradesh majorly clustered under the clade A2a (94%), while 6% fall under the I/A3i clade, a clade previously defined to be present in large numbers in India. To the best...
Many antibody and immune escape variants in SARS-CoV-2 are now documented in literature. The avai... more Many antibody and immune escape variants in SARS-CoV-2 are now documented in literature. The availability of SARS-CoV-2 genome sequences enabled us to investigate the occurrence and genetic epidemiology of the variants globally. Our analysis suggests that a number of genetic variants associated with immune escape have emerged in global populations.
ABSTRACTCoronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every c... more ABSTRACTCoronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. Genomic approaches have been extensively used to understand the evolution and epidemiology of SARS-CoV-2 across the world. Kerala is a unique state in India well connected with the rest of the world through a large number of expatriates, trade, and tourism. The first case of COVID-19 in India was reported in Kerala in January 2020, during the initial days of the pandemic. The rapid increase in the COVID-19 cases in the state of Kerala has necessitated the understanding of the genetic epidemiology of circulating virus, evolution, and mutations in SARS-CoV-2. We sequenced a total of 200 samples from patients at a tertiary hospital in Kerala using COVIDSeq protocol at a mean coverage of 7,755X. The analysis identified 166 unique high-quality variants encompassing 4 novel variants and 89 new variants identified for the first time in SAR...
Computational Biology of Non-Coding RNA
Reinfection of SARS-CoV-2 is an apparently rare entity and only a few cases have been reported fr... more Reinfection of SARS-CoV-2 is an apparently rare entity and only a few cases have been reported from across the world with the genetic characterization of the virus, differentiating reinfection from persistent virus shedding. These cases, therefore, provide unique insights into the long term protective immunity to SARS-CoV-2. The earlier reports suggest that patients were symptomatic in either one or both the episodes of infection. Here we report a unique case of asymptomatic SARS-CoV-2 reinfection in two healthcare workers from India identified in routine surveillance. Genome sequencing of the virus suggests that genetically distinct SARS-CoV-2 caused the infections. Our analysis demonstrates that asymptomatic reinfection could potentially be an under-reported entity with implications in long term surveillance of SARS-CoV-2 infections. This report also highlights the need for genomic surveillance of healthcare workers who are potentially not only at higher risk for primary infection...
PLOS ONE
The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting million... more The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance, and determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could dete...
Clinical Infectious Diseases
ABSTRACTRapid detection of pathogenic sequences or variants in DNA and RNA through a point-of-car... more ABSTRACTRapid detection of pathogenic sequences or variants in DNA and RNA through a point-of-care diagnostic approach is valuable for accelerated clinical prognosis as has been witnessed during the recent COVID-19 outbreak. Traditional methods relying on qPCR or sequencing are difficult to implement in settings with limited resources necessitating the development of accurate alternative testing strategies that perform robustly. Here, we present FnCas9 Editor Linked Uniform Detection Assay (FELUDA) that employs a direct Cas9 based enzymatic readout for detecting nucleotide sequences and identifying nucleobase identity without the requirement of trans-cleavage activity of reporter molecules. We demonstrate that FELUDA is 100% accurate in detecting single nucleotide variants (SNVs) including heterozygous carriers of a mutation and present a simple design strategy in the form of a web-tool, JATAYU, for its implementation. FELUDA is semi quantitative, can be adapted to multiple signal d...
Scientific Reports
Circular RNAs (circRNAs) are transcript isoforms generated by back-splicing of exons and circular... more Circular RNAs (circRNAs) are transcript isoforms generated by back-splicing of exons and circularisation of the transcript. Recent genome-wide maps created for circular RNAs in humans and other model organisms have motivated us to explore the repertoire of circular RNAs in zebrafish, a popular model organism. We generated RNA-seq data for five major zebrafish tissues-Blood, Brain, Heart, Gills and Muscle. The repertoire RNA sequence reads left over after reference mapping to linear transcripts were used to identify unique back-spliced exons utilizing a split-mapping algorithm. Our analysis revealed 3,428 novel circRNAs in zebrafish. Further in-depth analysis suggested that majority of the circRNAs were derived from previously well-annotated protein-coding and long noncoding RNA gene loci. In addition, many of the circular RNAs showed extensive tissue specificity. We independently validated a subset of circRNAs using polymerase chain reaction (PCR) and divergent set of primers. Expression analysis using quantitative real time PCR recapitulate selected tissue specificity in the candidates studied. This study provides a comprehensive genome-wide map of circular RNAs in zebrafish tissues.
Database : the journal of biological databases and curation, 2018
South Asia is home to sim\sim sim20% of the world population and characterized by distinct ethnic, li... more South Asia is home to sim\sim sim20% of the world population and characterized by distinct ethnic, linguistic, cultural and genetic lineages. Only limited representative samples from the region have found its place in large population-scale international genome projects. The recent availability of genome scale data from multiple populations and datasets from South Asian countries in public domain motivated us to integrate the data into a comprehensive resource. In the present study, we have integrated a total of six datasets encompassing 1213 human exomes and genomes to create a compendium of 154 814 557 genetic variants and adding a total of 69 059 255 novel variants. The variants were systematically annotated using public resources and along with the allele frequencies are available as a browsable-online resource South Asian genomes and exomes. As a proof of principle application of the data and resource for genetic epidemiology, we have analyzed the pathogenic genetic variants causin...
Pharmacogenomics, Feb 14, 2017
Adverse drug reactions to 5-Fluorouracil(5-FU) is frequent and largely attributable to genetic va... more Adverse drug reactions to 5-Fluorouracil(5-FU) is frequent and largely attributable to genetic variations in the DPYD gene, a rate limiting enzyme that clears 5-FU. The study aims at understanding the pharmacogenetic landscape of DPYD variants in south Asian populations. Systematic analysis of population scale genome wide datasets of over 3000 south Asians was performed. Independent evaluation was performed in a small cohort of patients. Our analysis revealed significant differences in the the allelic distribution of variants in different ethnicities. This is the first and largest genetic map the DPYD variants associated with adverse drug reaction to 5-FU in south Asian population. Our study highlights ethnic differences in allelic frequencies.
Biology Methods and Protocols
Circular RNAs are a novel class of non-coding RNAs that backsplice from 5' donor site and 3&#... more Circular RNAs are a novel class of non-coding RNAs that backsplice from 5' donor site and 3' acceptor sites to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circular RNAs in rat, a well-studied model organism as well. A number of pipelines have been published to identify the back splice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circular RNAs. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages and 2 genders motivated us to explore the landscape of circular RNAs in the organism in this context. In order to understand the difference among different pipelines, we employed 5 different combinations of tools to identify circular RNAs from the dataset. We compare...
Chemical communications (Cambridge, England), Jan 16, 2018
Originating as a component of prokaryotic adaptive immunity, the type II CRISPR/Cas9 system has b... more Originating as a component of prokaryotic adaptive immunity, the type II CRISPR/Cas9 system has been repurposed for targeted genome editing in various organisms. Although Cas9 can bind and cleave DNA efficiently under in vitro conditions, its activity inside a cell can vary dramatically between targets owing to the differences between genomic loci and the availability of enough Cas9/sgRNA (single guide RNA) complex molecules for cleavage. Most methods have so far relied on Cas9 protein engineering or base modifications in the sgRNA sequence to improve CRISPR/Cas9 activity. Here we demonstrate that a structure based rational design of sgRNAs can enhance the efficiency of Cas9 cleavage in vivo. By appending a naturally forming RNA G-quadruplex motif to the 3' end of sgRNAs we can improve its stability and target cleavage efficiency in zebrafish embryos without inducing off-target activity, thereby underscoring its value in the design of better and optimized genome editing triggers.
approach to connect the long tail for zebrafish gene annotation
Briefings in functional genomics, 2021
The utility of model organisms to understand the function of a novel transcript/genes has allowed... more The utility of model organisms to understand the function of a novel transcript/genes has allowed us to delineate their molecular mechanisms in maintaining cellular homeostasis. Organisms such as zebrafish have contributed a lot in the field of developmental and disease biology. Attributable to advancement and deep transcriptomics, many new transcript isoforms and non-coding RNAs such as long noncoding RNA (lncRNA) and circular RNAs (circRNAs) have been identified and cataloged in multiple databases and many more are yet to be identified. Various methods and tools have been utilized to identify lncRNAs/circRNAs in zebrafish using deep sequencing of transcriptomes as templates. Functional analysis of a few candidates such as tie1-AS, ECAL1 and CDR1as in zebrafish provides a prospective outline to approach other known or novel lncRNA/circRNA. New genetic alteration tools like TALENS and CRISPRs have helped in probing for the molecular function of lncRNA/circRNA in zebrafish. Further l...
BackgroundCircular RNAs are a novel class of non-coding RNAs that backsplice from 5’ donor site a... more BackgroundCircular RNAs are a novel class of non-coding RNAs that backsplice from 5’ donor site and 3’ acceptor site to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circular RNAs in rat, a well studied model organism. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages and 2 genders motivated us to explore the landscape of circular RNAs in the organism.MethodologyIn order to understand the difference among different pipelines, we have used the same bodymap RNA sequencing dataset. A number of pipelines have been published to identify the backsplice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circular RNAs. We employed 5 different combinations of tools including toph...
The EMBO Journal, 2021
Long non‐coding RNAs (lncRNAs) are emerging as key regulators of endothelial cell function. Here,... more Long non‐coding RNAs (lncRNAs) are emerging as key regulators of endothelial cell function. Here, we investigated the role of a novel vascular endothelial‐associated lncRNA (VEAL2) in regulating endothelial permeability. Precise editing of veal2 loci in zebrafish (veal2gib005Δ8/+) induced cranial hemorrhage. In vitro and in vivo studies revealed that veal2 competes with diacylglycerol for interaction with protein kinase C beta‐b (Prkcbb) and regulates its kinase activity. Using PRKCB2 as bait, we identified functional ortholog of veal2 in humans from HUVECs and named it as VEAL2. Overexpression and knockdown of VEAL2 affected tubulogenesis and permeability in HUVECs. VEAL2 was differentially expressed in choroid tissue in eye and blood from patients with diabetic retinopathy, a disease where PRKCB2 is known to be hyperactivated. Further, VEAL2 could rescue the effects of PRKCB2‐mediated turnover of endothelial junctional proteins thus reducing hyperpermeability in hyperglycemic HUVE...
Human Genomics
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. ... more Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India.Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. Th...
Long noncoding RNAs (lncRNAs) belong to a class of RNA transcripts that do not have the potential... more Long noncoding RNAs (lncRNAs) belong to a class of RNA transcripts that do not have the potential to code for proteins. LncRNAs were largely discovered in the transcriptomes of human and several model organisms, using next-generation sequencing (NGS) approaches, which have enabled a comprehensive genome scale annotation of transcripts. LncRNAs are known to have dynamic expression status and have the potential to orchestrate gene regulation at the epigenetic, transcriptional, and posttranscriptional levels. Here we describe the experimental methods involved in the discovery of lncRNAs from the transcriptome of a popular model organism zebrafish (Danio rerio). A structured and well-designed computational analysis pipeline subsequent to the RNA sequencing can be instrumental in revealing the diversity of the lncRNA transcripts. We describe one such computational pipeline used for the discovery of novel lncRNA transcripts in zebrafish. We also detail the validation of the putative novel...
Coronavirus disease (COVID-19) emerged from a city in China and has now spread as a global pandem... more Coronavirus disease (COVID-19) emerged from a city in China and has now spread as a global pandemic affecting millions of individuals. The causative agent, SARS-CoV-2 is being extensively studied in terms of its genetic epidemiology using genomic approaches. Andhra Pradesh is one of the major states of India with the third-largest number of COVID-19 cases with limited understanding of its genetic epidemiology. In this study, we have sequenced 293 SARS-CoV-2 genome isolates from Andhra Pradesh with a mean coverage of 13,324X. We identified 564 high-quality SARS-CoV-2 variants, out of which 15 are novel. A total of 18 variants mapped to RT-PCR primer/probe sites, and 4 variants are known to be associated with an increase in infectivity. Phylogenetic analysis of the genomes revealed the circulating SARS-CoV-2 in Andhra Pradesh majorly clustered under the clade A2a (94%), while 6% fall under the I/A3i clade, a clade previously defined to be present in large numbers in India. To the best...
Many antibody and immune escape variants in SARS-CoV-2 are now documented in literature. The avai... more Many antibody and immune escape variants in SARS-CoV-2 are now documented in literature. The availability of SARS-CoV-2 genome sequences enabled us to investigate the occurrence and genetic epidemiology of the variants globally. Our analysis suggests that a number of genetic variants associated with immune escape have emerged in global populations.
ABSTRACTCoronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every c... more ABSTRACTCoronavirus disease 2019 (COVID-19) rapidly spread from a city in China to almost every country in the world, affecting millions of individuals. Genomic approaches have been extensively used to understand the evolution and epidemiology of SARS-CoV-2 across the world. Kerala is a unique state in India well connected with the rest of the world through a large number of expatriates, trade, and tourism. The first case of COVID-19 in India was reported in Kerala in January 2020, during the initial days of the pandemic. The rapid increase in the COVID-19 cases in the state of Kerala has necessitated the understanding of the genetic epidemiology of circulating virus, evolution, and mutations in SARS-CoV-2. We sequenced a total of 200 samples from patients at a tertiary hospital in Kerala using COVIDSeq protocol at a mean coverage of 7,755X. The analysis identified 166 unique high-quality variants encompassing 4 novel variants and 89 new variants identified for the first time in SAR...
Computational Biology of Non-Coding RNA
Reinfection of SARS-CoV-2 is an apparently rare entity and only a few cases have been reported fr... more Reinfection of SARS-CoV-2 is an apparently rare entity and only a few cases have been reported from across the world with the genetic characterization of the virus, differentiating reinfection from persistent virus shedding. These cases, therefore, provide unique insights into the long term protective immunity to SARS-CoV-2. The earlier reports suggest that patients were symptomatic in either one or both the episodes of infection. Here we report a unique case of asymptomatic SARS-CoV-2 reinfection in two healthcare workers from India identified in routine surveillance. Genome sequencing of the virus suggests that genetically distinct SARS-CoV-2 caused the infections. Our analysis demonstrates that asymptomatic reinfection could potentially be an under-reported entity with implications in long term surveillance of SARS-CoV-2 infections. This report also highlights the need for genomic surveillance of healthcare workers who are potentially not only at higher risk for primary infection...
PLOS ONE
The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting million... more The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance, and determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding, and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could dete...
Clinical Infectious Diseases
ABSTRACTRapid detection of pathogenic sequences or variants in DNA and RNA through a point-of-car... more ABSTRACTRapid detection of pathogenic sequences or variants in DNA and RNA through a point-of-care diagnostic approach is valuable for accelerated clinical prognosis as has been witnessed during the recent COVID-19 outbreak. Traditional methods relying on qPCR or sequencing are difficult to implement in settings with limited resources necessitating the development of accurate alternative testing strategies that perform robustly. Here, we present FnCas9 Editor Linked Uniform Detection Assay (FELUDA) that employs a direct Cas9 based enzymatic readout for detecting nucleotide sequences and identifying nucleobase identity without the requirement of trans-cleavage activity of reporter molecules. We demonstrate that FELUDA is 100% accurate in detecting single nucleotide variants (SNVs) including heterozygous carriers of a mutation and present a simple design strategy in the form of a web-tool, JATAYU, for its implementation. FELUDA is semi quantitative, can be adapted to multiple signal d...
Scientific Reports
Circular RNAs (circRNAs) are transcript isoforms generated by back-splicing of exons and circular... more Circular RNAs (circRNAs) are transcript isoforms generated by back-splicing of exons and circularisation of the transcript. Recent genome-wide maps created for circular RNAs in humans and other model organisms have motivated us to explore the repertoire of circular RNAs in zebrafish, a popular model organism. We generated RNA-seq data for five major zebrafish tissues-Blood, Brain, Heart, Gills and Muscle. The repertoire RNA sequence reads left over after reference mapping to linear transcripts were used to identify unique back-spliced exons utilizing a split-mapping algorithm. Our analysis revealed 3,428 novel circRNAs in zebrafish. Further in-depth analysis suggested that majority of the circRNAs were derived from previously well-annotated protein-coding and long noncoding RNA gene loci. In addition, many of the circular RNAs showed extensive tissue specificity. We independently validated a subset of circRNAs using polymerase chain reaction (PCR) and divergent set of primers. Expression analysis using quantitative real time PCR recapitulate selected tissue specificity in the candidates studied. This study provides a comprehensive genome-wide map of circular RNAs in zebrafish tissues.
Database : the journal of biological databases and curation, 2018
South Asia is home to sim\sim sim20% of the world population and characterized by distinct ethnic, li... more South Asia is home to sim\sim sim20% of the world population and characterized by distinct ethnic, linguistic, cultural and genetic lineages. Only limited representative samples from the region have found its place in large population-scale international genome projects. The recent availability of genome scale data from multiple populations and datasets from South Asian countries in public domain motivated us to integrate the data into a comprehensive resource. In the present study, we have integrated a total of six datasets encompassing 1213 human exomes and genomes to create a compendium of 154 814 557 genetic variants and adding a total of 69 059 255 novel variants. The variants were systematically annotated using public resources and along with the allele frequencies are available as a browsable-online resource South Asian genomes and exomes. As a proof of principle application of the data and resource for genetic epidemiology, we have analyzed the pathogenic genetic variants causin...
Pharmacogenomics, Feb 14, 2017
Adverse drug reactions to 5-Fluorouracil(5-FU) is frequent and largely attributable to genetic va... more Adverse drug reactions to 5-Fluorouracil(5-FU) is frequent and largely attributable to genetic variations in the DPYD gene, a rate limiting enzyme that clears 5-FU. The study aims at understanding the pharmacogenetic landscape of DPYD variants in south Asian populations. Systematic analysis of population scale genome wide datasets of over 3000 south Asians was performed. Independent evaluation was performed in a small cohort of patients. Our analysis revealed significant differences in the the allelic distribution of variants in different ethnicities. This is the first and largest genetic map the DPYD variants associated with adverse drug reaction to 5-FU in south Asian population. Our study highlights ethnic differences in allelic frequencies.
Biology Methods and Protocols
Circular RNAs are a novel class of non-coding RNAs that backsplice from 5' donor site and 3&#... more Circular RNAs are a novel class of non-coding RNAs that backsplice from 5' donor site and 3' acceptor sites to form a circular structure. A number of circRNAs have been discovered in model organisms including human, mouse, Drosophila, among other organisms. There are a few candidate-based studies on circular RNAs in rat, a well-studied model organism as well. A number of pipelines have been published to identify the back splice junctions for the discovery of circRNAs but studies comparing these tools have suggested that a combination of tools would be a better approach to identify high-confidence circular RNAs. The availability of a recent dataset of transcriptomes encompassing 11 tissues, 4 developmental stages and 2 genders motivated us to explore the landscape of circular RNAs in the organism in this context. In order to understand the difference among different pipelines, we employed 5 different combinations of tools to identify circular RNAs from the dataset. We compare...
Chemical communications (Cambridge, England), Jan 16, 2018
Originating as a component of prokaryotic adaptive immunity, the type II CRISPR/Cas9 system has b... more Originating as a component of prokaryotic adaptive immunity, the type II CRISPR/Cas9 system has been repurposed for targeted genome editing in various organisms. Although Cas9 can bind and cleave DNA efficiently under in vitro conditions, its activity inside a cell can vary dramatically between targets owing to the differences between genomic loci and the availability of enough Cas9/sgRNA (single guide RNA) complex molecules for cleavage. Most methods have so far relied on Cas9 protein engineering or base modifications in the sgRNA sequence to improve CRISPR/Cas9 activity. Here we demonstrate that a structure based rational design of sgRNAs can enhance the efficiency of Cas9 cleavage in vivo. By appending a naturally forming RNA G-quadruplex motif to the 3' end of sgRNAs we can improve its stability and target cleavage efficiency in zebrafish embryos without inducing off-target activity, thereby underscoring its value in the design of better and optimized genome editing triggers.