Sarath Janga | Indiana University Indianapolis (original) (raw)
Papers by Sarath Janga
PLoS biology, Jan 1, 2009
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannota... more One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a ''systems-wide'' functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, Jan 18, 2015
Although some methods are proposed for automatic ontology generation, none of them address the is... more Although some methods are proposed for automatic ontology generation, none of them address the issue of integrating large-scale heterogeneous biomedical ontologies. We propose a novel approach for integrating various types of ontologies efficiently and apply it to integrate International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9CM) and Gene Ontologies (GO). This approach is one of the early attempts to quantify the associations among clinical terms (e.g. ICD9 codes) based on their corresponding genomic relationships. We reconstructed a merged tree for a partial set of GO and ICD9 codes and measured the performance of this tree in terms of associations' relevance by comparing them with two well-known disease-gene datasets (i.e. MalaCards and Disease Ontology). Furthermore, we compared the genomic-based ICD9 associations to temporal relationships between them from electronic health records. Our analysis shows promising associations supported by both c...
BioMed Research International, 2015
The integration of ontologies builds knowledge structures which brings new understanding on exist... more The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.
BioMed Research International, 2015
Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into... more Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.
Database, 2015
RNA Binding Protein (RBP) Expression and Disease Dynamics database (READ DB) is a non-redundant, ... more RNA Binding Protein (RBP) Expression and Disease Dynamics database (READ DB) is a non-redundant, curated database of human RBPs. RBPs curated from different experimental studies are reported with their annotation, tissue-wide RNA and protein expression levels, evolutionary conservation, disease associations, protein-protein interactions, microRNA predictions, their known RNA recognition sequence motifs as well as predicted binding targets and associated functional themes, providing a one stop portal for understanding the expression, evolutionary trajectories and disease dynamics of RBPs in the context of post-transcriptional regulatory networks.
Journal of Proteomics, 2015
Significance RNA Binding proteins (RBPs) play a central role in mediating post transcriptional re... more Significance RNA Binding proteins (RBPs) play a central role in mediating post transcriptional regulation of genes.
Veterinary Research, 2015
Peste des petits ruminants (PPR), is an acute transboundary viral disease of economic importance,... more Peste des petits ruminants (PPR), is an acute transboundary viral disease of economic importance, affecting goats and sheep. Mass vaccination programs around the world resulted in the decline of PPR outbreaks. Sungri 96 is a live attenuated vaccine, widely used in Northern India against PPR. This vaccine virus, isolated from goat works efficiently both in sheep and goat. Global gene expression changes under PPR vaccine virus infection are not yet well defined. Therefore, in this study we investigated the host-vaccine virus interactions by infecting the peripheral blood mononuclear cells isolated from goat with PPRV (Sungri 96 vaccine virus), to quantify the global changes in the transcriptomic signature by RNA-sequencing. Viral genome of Sungri 96 vaccine virus was assembled from the PPRV infected transcriptome confirming the infection and demonstrating the feasibility of building a complete non-host genome from the blood transcriptome. Comparison of infected transcriptome with control transcriptome revealed 985 differentially expressed genes. Functional analysis showed enrichment of immune regulatory pathways under PPRV infection. Key genes involved in immune system regulation, spliceosomal and apoptotic pathways were identified to be dysregulated. Network analysis revealed that the protein -protein interaction network among differentially expressed genes is significantly disrupted in infected state. Several genes encoding TFs that govern immune regulatory pathways were identified to co-regulate the differentially expressed genes. These data provide insights into the host -PPRV vaccine virus interactome for the first time. Our findings suggested dysregulation of immune regulatory pathways and genes encoding Transcription Factors (TFs) that govern these pathways in response to viral infection.
Briefings in Functional Genomics, 2014
Most of the mammalian genome including a large fraction of the non-protein coding transcripts has... more Most of the mammalian genome including a large fraction of the non-protein coding transcripts has been shown to be transcribed. Studies related to these non-coding RNA molecules have predominantly focused on smaller molecules like microRNAs. In contrast, long non-coding RNAs (lncRNAs) have long been considered to be transcriptional noise. Accumulating evidence suggests that lncRNAs are involved in key cellular and developmental processes. Several critical questions regarding functions and properties of lncRNAs and their circular forms remain to be answered. Increasing evidence from high-throughput sequencing screens also suggests the involvement of lncRNAs in diseases such as cancer, although the underlying mechanisms still need to be elucidated. Here, we discuss the current state of research in the field of lncRNAs, questions that need to be addressed in light of recent genome-wide studies documenting the landscape of lncRNAs, their functional roles and involvement in diseases. We posit that with the availability of high-throughput data sets it is not only possible to improve methods for predicting lncRNAs but will also facilitate our ability to elucidate their functions and phenotypes by using integrative approaches.
Regulation of gene expression occurs at several levels in eukaryotic organisms and is a highly co... more Regulation of gene expression occurs at several levels in eukaryotic organisms and is a highly controlled process. Although RNAs have been traditionally viewed as passive molecules in the pathway from transcription to translation, there is mounting evidence that their metabolism is controlled by a class of proteins called RNA-binding proteins (RBPs), as well as a number of small RNAs. In this review, I provide an overview of the recent developments in our understanding of the repertoire of RBPs across diverse model systems, and discuss the computational and experimental approaches currently available for the construction of posttranscriptional networks governed by them. I also present an overview of the different roles played by RBPs in the cellular context, based on their cis-regulatory modules identified in the literature and discuss how their interplay can result in the dynamic, spatial and tissue-specific expression maps of RNAs. I finally present the concept of posttranscriptional network of RBPs and their cognate RNA targets and discuss their cross-talk with other important posttranscriptional regulatory molecules such as microRNAs s, resulting in diverse functional network motifs. I argue that with rapid developments in the genome-wide elucidation of posttranscriptional networks it would not only be possible to gain a deeper understanding of regulation at a level that has been under-appreciated in the past, but would also allow us to use the newly developed high-throughput approaches to interrogate the prevalence of these phenomena in different states, and thereby study their relevance to physiology and disease across organisms.
Phosphonates, molecules containing direct carbon-phosphorus bonds, compose a structurally diverse... more Phosphonates, molecules containing direct carbon-phosphorus bonds, compose a structurally diverse class of natural products with interesting and useful biological properties. Although their synthesis in protozoa was discovered more than 50 y ago, the extent and diversity of phosphonate production in nature remains poorly characterized. The rearrangement of phosphoenolpyruvate (PEP) to phosphonopyruvate, catalyzed by the enzyme PEP mutase (PepM), is shared by the vast majority of known phosphonate biosynthetic pathways. Thus, the pepM gene can be used as a molecular marker to examine the occurrence and abundance of phosphonate-producing organisms. Based on the presence of this gene, phosphonate biosynthesis is common in microbes, with ∼5% of sequenced bacterial genomes and 7% of genome equivalents in metagenomic datasets carrying pepM homologs. Similarly, we detected the pepM gene in ∼5% of random actinomycete isolates. The pepM-containing gene neighborhoods from 25 of these isolates were cloned, sequenced, and compared with those found in sequenced genomes. PEP mutase sequence conservation is strongly correlated with conservation of other nearby genes, suggesting that the diversity of phosphonate biosynthetic pathways can be predicted by examining PEP mutase diversity. We used this approach to estimate the range of phosphonate biosynthetic pathways in nature, revealing dozens of discrete groups in pepM amplicons from local soils, whereas hundreds were observed in metagenomic datasets. Collectively, our analyses show that phosphonate biosynthesis is both diverse and relatively common in nature, suggesting that the role of phosphonate molecules in the biosphere may be more important than is often recognized.
Background: RNA-binding proteins (RBPs) play important roles in cellular homeostasis by controlli... more Background: RNA-binding proteins (RBPs) play important roles in cellular homeostasis by controlling gene expression at the post-transcriptional level. Results: We explore the expression of more than 800 RBPs in sixteen healthy human tissues and their patterns of dysregulation in cancer genomes from The Cancer Genome Atlas project. We show that genes encoding RBPs are consistently and significantly highly expressed compared with other classes of genes, including those encoding regulatory components such as transcription factors, miRNAs and long non-coding RNAs. We also demonstrate that a set of RBPs, numbering approximately 30, are strongly upregulated (SUR) across at least two-thirds of the nine cancers profiled in this study. Analysis of the protein-protein interaction network properties for the SUR and non-SUR groups of RBPs suggests that path length distributions between SUR RBPs is significantly lower than those observed for non-SUR RBPs. We further find that the mean path lengths between SUR RBPs increases in proportion to their contribution to prognostic impact. We also note that RBPs exhibiting higher variability in the extent of dysregulation across breast cancer patients have a higher number of protein-protein interactions. We propose that fluctuating RBP levels might result in an increase in non-specific protein interactions, potentially leading to changes in the functional consequences of RBP binding. Finally, we show that the expression variation of a gene within a patient group is inversely correlated with prognostic impact. Conclusions: Overall, our results provide a roadmap for understanding the impact of RBPs on cancer pathogenesis.
PLoS ONE, 2013
Liver cirrhosis is associated with decreased hepatic cytochrome P4503A (CYP3A) activity but the p... more Liver cirrhosis is associated with decreased hepatic cytochrome P4503A (CYP3A) activity but the pathogenesis of this phenomenon is not well elucidated. In this study, we examined if certain microRNAs (miRNA) are associated with decreased hepatic CYP3A activity in cirrhosis.
Current Synthetic and Systems Biology, 2014
Proteins: Structure, Function, and Bioinformatics, 2014
Detecting protein-RNA interactions is challenging both experimentally and computationally because... more Detecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation.
PLoS ONE, 2010
Hundreds of RNA-binding proteins (RBPs) control diverse aspects of post-transcriptional gene regu... more Hundreds of RNA-binding proteins (RBPs) control diverse aspects of post-transcriptional gene regulation. To identify novel and unconventional RBPs, we probed high-density protein microarrays with fluorescently labeled RNA and selected 200 proteins that reproducibly interacted with different types of RNA from budding yeast Saccharomyces cerevisiae. Surprisingly, more than half of these proteins represent previously known enzymes, many of them acting in metabolism, providing opportunities to directly connect intermediary metabolism with posttranscriptional gene regulation. We mapped the RNA targets for 13 proteins identified in this screen and found that they were associated with distinct groups of mRNAs, some of them coding for functionally related proteins. We also found that overexpression of the enzyme Map1 negatively affects the expression of experimentally defined mRNA targets. Our results suggest that many proteins may associate with mRNAs and possibly control their fates, providing dense connections between different layers of cellular regulation.
Molecular Cell, 2012
Transcription factors (TFs) and histone octamers are two abundant classes of DNA binding proteins... more Transcription factors (TFs) and histone octamers are two abundant classes of DNA binding proteins that coordinate the transcriptional program in cells. Detailed studies of individual TFs have shown that TFs bind to nucleosome-occluded DNA sequences and induce nucleosome disruption/repositioning, while recent global studies suggest this is not the only mechanism used by all TFs. We have analyzed to what extent the intrinsic DNA binding preferences of TFs and histones play a role in determining nucleosome occupancy, in addition to nonintrinsic factors such as the enzymatic activity of chromatin remodelers. The majority of TFs in budding yeast have an intrinsic sequence preference overlapping with nucleosomal histones. TFs with intrinsic DNA binding properties highly correlated with those of histones tend to be associated with gene activation and might compete with histones to bind to genomic DNA. Consistent with this, we show that activators induce more nucleosome disruption upon transcriptional activation than repressors.
Molecular BioSystems, 2011
Aging is a multi-factorial and complex phenomenon. Saccharomyces cerevisiae is developed as a mod... more Aging is a multi-factorial and complex phenomenon. Saccharomyces cerevisiae is developed as a model of aging and has been widely studied in order to understand the mechanism of lifespan regulation. A large number of high-throughput studies were conducted to identify the genes which modulate lifespan. These studies provide the list of genes that regulates the lifespan in yeast; however the regulation of these aging associated genes had not been fully understood. In this study, we have shown that deletion of the genes which increase the replicative lifespan (RLS) of yeast show discrete expression patterns when compared with the genes that, on deletion, cause a decrease in lifespan. Expression of longlived (LL) genes decreases as the cell progresses from mid log to stationary phase, whereas expression of shortlived (SL) genes remains unchanged. This distinct expression of LL and SL gene-sets suggests their differential gene regulation. Further analysis of transcriptional regulation by transcription factors and epigenetic regulators (acetylation and methylation) suggests that this differential expression of the two gene-sets is due to their differential epigenetic regulations, rather than regulation by transcription factors. These results accentuate the importance of epigenetic modifications in aging. We deduce that future focused studies on epigenetic modification regulation will help lead to a better understanding of the aging process.
PLoS biology, Jan 1, 2009
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannota... more One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a ''systems-wide'' functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM, Jan 18, 2015
Although some methods are proposed for automatic ontology generation, none of them address the is... more Although some methods are proposed for automatic ontology generation, none of them address the issue of integrating large-scale heterogeneous biomedical ontologies. We propose a novel approach for integrating various types of ontologies efficiently and apply it to integrate International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9CM) and Gene Ontologies (GO). This approach is one of the early attempts to quantify the associations among clinical terms (e.g. ICD9 codes) based on their corresponding genomic relationships. We reconstructed a merged tree for a partial set of GO and ICD9 codes and measured the performance of this tree in terms of associations' relevance by comparing them with two well-known disease-gene datasets (i.e. MalaCards and Disease Ontology). Furthermore, we compared the genomic-based ICD9 associations to temporal relationships between them from electronic health records. Our analysis shows promising associations supported by both c...
BioMed Research International, 2015
The integration of ontologies builds knowledge structures which brings new understanding on exist... more The integration of ontologies builds knowledge structures which brings new understanding on existing terminologies and their associations. With the steady increase in the number of ontologies, automatic integration of ontologies is preferable over manual solutions in many applications. However, available works on ontology integration are largely heuristic without guarantees on the quality of the integration results. In this work, we focus on the integration of ontologies with hierarchical structures. We identified optimal structures in this problem and proposed optimal and efficient approximation algorithms for integrating a pair of ontologies. Furthermore, we extend the basic problem to address the integration of a large number of ontologies, and correspondingly we proposed an efficient approximation algorithm for integrating multiple ontologies. The empirical study on both real ontologies and synthetic data demonstrates the effectiveness of our proposed approaches. In addition, the results of integration between gene ontology and National Drug File Reference Terminology suggest that our method provides a novel way to perform association studies between biomedical terms.
BioMed Research International, 2015
Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into... more Background. In prokaryotic organisms, a substantial fraction of adjacent genes are organized into operons-codirectionally organized genes in prokaryotic genomes with the presence of a common promoter and terminator. Although several available operon databases provide information with varying levels of reliability, very few resources provide experimentally supported results. Therefore, we believe that the biological community could benefit from having a new operon prediction database with operons predicted using next-generation RNA-seq datasets. Description. We present operomeDB, a database which provides an ensemble of all the predicted operons for bacterial genomes using available RNA-sequencing datasets across a wide range of experimental conditions. Although several studies have recently confirmed that prokaryotic operon structure is dynamic with significant alterations across environmental and experimental conditions, there are no comprehensive databases for studying such variations across prokaryotic transcriptomes. Currently our database contains nine bacterial organisms and 168 transcriptomes for which we predicted operons. User interface is simple and easy to use, in terms of visualization, downloading, and querying of data. In addition, because of its ability to load custom datasets, users can also compare their datasets with publicly available transcriptomic data of an organism. Conclusion. OperomeDB as a database should not only aid experimental groups working on transcriptome analysis of specific organisms but also enable studies related to computational and comparative operomics.
Database, 2015
RNA Binding Protein (RBP) Expression and Disease Dynamics database (READ DB) is a non-redundant, ... more RNA Binding Protein (RBP) Expression and Disease Dynamics database (READ DB) is a non-redundant, curated database of human RBPs. RBPs curated from different experimental studies are reported with their annotation, tissue-wide RNA and protein expression levels, evolutionary conservation, disease associations, protein-protein interactions, microRNA predictions, their known RNA recognition sequence motifs as well as predicted binding targets and associated functional themes, providing a one stop portal for understanding the expression, evolutionary trajectories and disease dynamics of RBPs in the context of post-transcriptional regulatory networks.
Journal of Proteomics, 2015
Significance RNA Binding proteins (RBPs) play a central role in mediating post transcriptional re... more Significance RNA Binding proteins (RBPs) play a central role in mediating post transcriptional regulation of genes.
Veterinary Research, 2015
Peste des petits ruminants (PPR), is an acute transboundary viral disease of economic importance,... more Peste des petits ruminants (PPR), is an acute transboundary viral disease of economic importance, affecting goats and sheep. Mass vaccination programs around the world resulted in the decline of PPR outbreaks. Sungri 96 is a live attenuated vaccine, widely used in Northern India against PPR. This vaccine virus, isolated from goat works efficiently both in sheep and goat. Global gene expression changes under PPR vaccine virus infection are not yet well defined. Therefore, in this study we investigated the host-vaccine virus interactions by infecting the peripheral blood mononuclear cells isolated from goat with PPRV (Sungri 96 vaccine virus), to quantify the global changes in the transcriptomic signature by RNA-sequencing. Viral genome of Sungri 96 vaccine virus was assembled from the PPRV infected transcriptome confirming the infection and demonstrating the feasibility of building a complete non-host genome from the blood transcriptome. Comparison of infected transcriptome with control transcriptome revealed 985 differentially expressed genes. Functional analysis showed enrichment of immune regulatory pathways under PPRV infection. Key genes involved in immune system regulation, spliceosomal and apoptotic pathways were identified to be dysregulated. Network analysis revealed that the protein -protein interaction network among differentially expressed genes is significantly disrupted in infected state. Several genes encoding TFs that govern immune regulatory pathways were identified to co-regulate the differentially expressed genes. These data provide insights into the host -PPRV vaccine virus interactome for the first time. Our findings suggested dysregulation of immune regulatory pathways and genes encoding Transcription Factors (TFs) that govern these pathways in response to viral infection.
Briefings in Functional Genomics, 2014
Most of the mammalian genome including a large fraction of the non-protein coding transcripts has... more Most of the mammalian genome including a large fraction of the non-protein coding transcripts has been shown to be transcribed. Studies related to these non-coding RNA molecules have predominantly focused on smaller molecules like microRNAs. In contrast, long non-coding RNAs (lncRNAs) have long been considered to be transcriptional noise. Accumulating evidence suggests that lncRNAs are involved in key cellular and developmental processes. Several critical questions regarding functions and properties of lncRNAs and their circular forms remain to be answered. Increasing evidence from high-throughput sequencing screens also suggests the involvement of lncRNAs in diseases such as cancer, although the underlying mechanisms still need to be elucidated. Here, we discuss the current state of research in the field of lncRNAs, questions that need to be addressed in light of recent genome-wide studies documenting the landscape of lncRNAs, their functional roles and involvement in diseases. We posit that with the availability of high-throughput data sets it is not only possible to improve methods for predicting lncRNAs but will also facilitate our ability to elucidate their functions and phenotypes by using integrative approaches.
Regulation of gene expression occurs at several levels in eukaryotic organisms and is a highly co... more Regulation of gene expression occurs at several levels in eukaryotic organisms and is a highly controlled process. Although RNAs have been traditionally viewed as passive molecules in the pathway from transcription to translation, there is mounting evidence that their metabolism is controlled by a class of proteins called RNA-binding proteins (RBPs), as well as a number of small RNAs. In this review, I provide an overview of the recent developments in our understanding of the repertoire of RBPs across diverse model systems, and discuss the computational and experimental approaches currently available for the construction of posttranscriptional networks governed by them. I also present an overview of the different roles played by RBPs in the cellular context, based on their cis-regulatory modules identified in the literature and discuss how their interplay can result in the dynamic, spatial and tissue-specific expression maps of RNAs. I finally present the concept of posttranscriptional network of RBPs and their cognate RNA targets and discuss their cross-talk with other important posttranscriptional regulatory molecules such as microRNAs s, resulting in diverse functional network motifs. I argue that with rapid developments in the genome-wide elucidation of posttranscriptional networks it would not only be possible to gain a deeper understanding of regulation at a level that has been under-appreciated in the past, but would also allow us to use the newly developed high-throughput approaches to interrogate the prevalence of these phenomena in different states, and thereby study their relevance to physiology and disease across organisms.
Phosphonates, molecules containing direct carbon-phosphorus bonds, compose a structurally diverse... more Phosphonates, molecules containing direct carbon-phosphorus bonds, compose a structurally diverse class of natural products with interesting and useful biological properties. Although their synthesis in protozoa was discovered more than 50 y ago, the extent and diversity of phosphonate production in nature remains poorly characterized. The rearrangement of phosphoenolpyruvate (PEP) to phosphonopyruvate, catalyzed by the enzyme PEP mutase (PepM), is shared by the vast majority of known phosphonate biosynthetic pathways. Thus, the pepM gene can be used as a molecular marker to examine the occurrence and abundance of phosphonate-producing organisms. Based on the presence of this gene, phosphonate biosynthesis is common in microbes, with ∼5% of sequenced bacterial genomes and 7% of genome equivalents in metagenomic datasets carrying pepM homologs. Similarly, we detected the pepM gene in ∼5% of random actinomycete isolates. The pepM-containing gene neighborhoods from 25 of these isolates were cloned, sequenced, and compared with those found in sequenced genomes. PEP mutase sequence conservation is strongly correlated with conservation of other nearby genes, suggesting that the diversity of phosphonate biosynthetic pathways can be predicted by examining PEP mutase diversity. We used this approach to estimate the range of phosphonate biosynthetic pathways in nature, revealing dozens of discrete groups in pepM amplicons from local soils, whereas hundreds were observed in metagenomic datasets. Collectively, our analyses show that phosphonate biosynthesis is both diverse and relatively common in nature, suggesting that the role of phosphonate molecules in the biosphere may be more important than is often recognized.
Background: RNA-binding proteins (RBPs) play important roles in cellular homeostasis by controlli... more Background: RNA-binding proteins (RBPs) play important roles in cellular homeostasis by controlling gene expression at the post-transcriptional level. Results: We explore the expression of more than 800 RBPs in sixteen healthy human tissues and their patterns of dysregulation in cancer genomes from The Cancer Genome Atlas project. We show that genes encoding RBPs are consistently and significantly highly expressed compared with other classes of genes, including those encoding regulatory components such as transcription factors, miRNAs and long non-coding RNAs. We also demonstrate that a set of RBPs, numbering approximately 30, are strongly upregulated (SUR) across at least two-thirds of the nine cancers profiled in this study. Analysis of the protein-protein interaction network properties for the SUR and non-SUR groups of RBPs suggests that path length distributions between SUR RBPs is significantly lower than those observed for non-SUR RBPs. We further find that the mean path lengths between SUR RBPs increases in proportion to their contribution to prognostic impact. We also note that RBPs exhibiting higher variability in the extent of dysregulation across breast cancer patients have a higher number of protein-protein interactions. We propose that fluctuating RBP levels might result in an increase in non-specific protein interactions, potentially leading to changes in the functional consequences of RBP binding. Finally, we show that the expression variation of a gene within a patient group is inversely correlated with prognostic impact. Conclusions: Overall, our results provide a roadmap for understanding the impact of RBPs on cancer pathogenesis.
PLoS ONE, 2013
Liver cirrhosis is associated with decreased hepatic cytochrome P4503A (CYP3A) activity but the p... more Liver cirrhosis is associated with decreased hepatic cytochrome P4503A (CYP3A) activity but the pathogenesis of this phenomenon is not well elucidated. In this study, we examined if certain microRNAs (miRNA) are associated with decreased hepatic CYP3A activity in cirrhosis.
Current Synthetic and Systems Biology, 2014
Proteins: Structure, Function, and Bioinformatics, 2014
Detecting protein-RNA interactions is challenging both experimentally and computationally because... more Detecting protein-RNA interactions is challenging both experimentally and computationally because RNAs are large in number, diverse in cellular location and function, and flexible in structure. As a result, many RNA-binding proteins (RBPs) remain to be identified. Here, a template-based, function-prediction technique SPOT-Seq for RBPs is applied to human proteome and its result is validated by a recent proteomic experimental discovery of 860 mRNA-binding proteins (mRBPs). The coverage (or sensitivity) is 42.6% for 1217 known RBPs annotated in the Gene Ontology and 43.6% for 860 newly discovered human mRBPs. Consistent sensitivity indicates the robust performance of SPOT-Seq for predicting RBPs. More importantly, SPOT-Seq detects 2418 novel RBPs in human proteome, 291 of which were validated by the newly discovered mRBP set. Among 291 validated novel RBPs, 61 are not homologous to any known RBPs. Successful validation of predicted novel RBPs permits us to further analysis of their phenotypic roles in disease pathways. The dataset of 2418 predicted novel RBPs along with confidence levels and complex structures is available at http://sparks-lab.org (in publications) for experimental confirmations and hypothesis generation.
PLoS ONE, 2010
Hundreds of RNA-binding proteins (RBPs) control diverse aspects of post-transcriptional gene regu... more Hundreds of RNA-binding proteins (RBPs) control diverse aspects of post-transcriptional gene regulation. To identify novel and unconventional RBPs, we probed high-density protein microarrays with fluorescently labeled RNA and selected 200 proteins that reproducibly interacted with different types of RNA from budding yeast Saccharomyces cerevisiae. Surprisingly, more than half of these proteins represent previously known enzymes, many of them acting in metabolism, providing opportunities to directly connect intermediary metabolism with posttranscriptional gene regulation. We mapped the RNA targets for 13 proteins identified in this screen and found that they were associated with distinct groups of mRNAs, some of them coding for functionally related proteins. We also found that overexpression of the enzyme Map1 negatively affects the expression of experimentally defined mRNA targets. Our results suggest that many proteins may associate with mRNAs and possibly control their fates, providing dense connections between different layers of cellular regulation.
Molecular Cell, 2012
Transcription factors (TFs) and histone octamers are two abundant classes of DNA binding proteins... more Transcription factors (TFs) and histone octamers are two abundant classes of DNA binding proteins that coordinate the transcriptional program in cells. Detailed studies of individual TFs have shown that TFs bind to nucleosome-occluded DNA sequences and induce nucleosome disruption/repositioning, while recent global studies suggest this is not the only mechanism used by all TFs. We have analyzed to what extent the intrinsic DNA binding preferences of TFs and histones play a role in determining nucleosome occupancy, in addition to nonintrinsic factors such as the enzymatic activity of chromatin remodelers. The majority of TFs in budding yeast have an intrinsic sequence preference overlapping with nucleosomal histones. TFs with intrinsic DNA binding properties highly correlated with those of histones tend to be associated with gene activation and might compete with histones to bind to genomic DNA. Consistent with this, we show that activators induce more nucleosome disruption upon transcriptional activation than repressors.
Molecular BioSystems, 2011
Aging is a multi-factorial and complex phenomenon. Saccharomyces cerevisiae is developed as a mod... more Aging is a multi-factorial and complex phenomenon. Saccharomyces cerevisiae is developed as a model of aging and has been widely studied in order to understand the mechanism of lifespan regulation. A large number of high-throughput studies were conducted to identify the genes which modulate lifespan. These studies provide the list of genes that regulates the lifespan in yeast; however the regulation of these aging associated genes had not been fully understood. In this study, we have shown that deletion of the genes which increase the replicative lifespan (RLS) of yeast show discrete expression patterns when compared with the genes that, on deletion, cause a decrease in lifespan. Expression of longlived (LL) genes decreases as the cell progresses from mid log to stationary phase, whereas expression of shortlived (SL) genes remains unchanged. This distinct expression of LL and SL gene-sets suggests their differential gene regulation. Further analysis of transcriptional regulation by transcription factors and epigenetic regulators (acetylation and methylation) suggests that this differential expression of the two gene-sets is due to their differential epigenetic regulations, rather than regulation by transcription factors. These results accentuate the importance of epigenetic modifications in aging. We deduce that future focused studies on epigenetic modification regulation will help lead to a better understanding of the aging process.