richa mudgal | Indian Institute of Science (original) (raw)
Papers by richa mudgal
Current Opinion in Structural Biology, Apr 1, 2016
Design of proteins has far-reaching potentials in diverse areas that span repurposing of the prot... more Design of proteins has far-reaching potentials in diverse areas that span repurposing of the protein scaffold for reactions and substrates that they were not naturally meant for, to catching a glimpse of the ephemeral proteins that nature might have sampled during evolution. These non-natural proteins, either in synthesized or virtual form have opened the scope for the design of entities that not only rival their natural counterparts but also offer a chance to visualize the protein space continuum that might help to relate proteins and understand their associations. Here, we review the recent advances in protein engineering and design, in multiple areas, with a view to drawing attention to their future potential.
homology detection methods
WORLD SCIENTIFIC eBooks, Dec 17, 2013
ABSTRACT With the amount of data deluge as a result of high-throughput sequencing techniques and ... more ABSTRACT With the amount of data deluge as a result of high-throughput sequencing techniques and structural genomics initiatives, there comes a need to leverage the large-scale data. Consequently, the role of computational methods to characterize genes and proteins solely from their sequence information becomes increasingly important. Over the past decade, development of sensitive profile-based sequence database search algorithms has improved the quality of structural and functional inferences from protein sequence. This chapter highlights the use of such sensitive approaches in recognition of evolutionary related proteins when the amino acid sequence similarity is very low. We further demonstrate the use of sequence database mining based remote homology detection methods in exploring the repertoire of functions and three dimensional structures of parasitic proteins in Trypanosoma brucei brucei, causative agent of African sleeping sickness. With an emphasis on various metabolic pathways, the sequence-function and structure-function relationships are investigated. Integrating the information of parasitic proteins in metabolic pathways along with their homology to targets of FDA-approved drugs, attractive drug targets have been proposed.
Proteins, Apr 22, 2017
Functional annotation is seldom straightforward with complexities arising due to functional diver... more Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology and designing enzymes with new functional capabilities.
Supplementary Figures S1-S4 and Supplementary Tables S1-S2.
Indian Institute of Science, 2018
Table S5. Pfam 31Â families now associated with structural folds and consensus with our applicati... more Table S5. Pfam 31Â families now associated with structural folds and consensus with our application dataset. (XLSX 21 kb)
Table S4. TM-Align scores for Pfam families with known structure information but no fold associat... more Table S4. TM-Align scores for Pfam families with known structure information but no fold association available in SCOP. (XLSX 14Â kb)
Table S3. List of structural associations by our approach: Details of the structural fold associa... more Table S3. List of structural associations by our approach: Details of the structural fold associations and the confidence of the assignments for queries from 1372 families is given. (20 families for which structures are available in Pfam 31 have been moved to Table S5). (XLSX 234 kb)
Figure S4. False vs. True positives for queries in the assessment dataset: The distribution of tr... more Figure S4. False vs. True positives for queries in the assessment dataset: The distribution of true positives vs. false positives as a function of a) Query and target coverage. b) Query and target alignment length (number of residues in the alignment). (PNG 363Â kb)
Figure S3. The normalized fold frequency of correct vs. incorrect associations for the assessment... more Figure S3. The normalized fold frequency of correct vs. incorrect associations for the assessment dataset: The preponderance of 'correct' associated folds (in green) is observed at a higher normalized fold frequency than other 'incorrect' fold associations (in red). (PNG 92 kb)
Table S2. List of Pfam family queries with structural fold annotation available and validated in ... more Table S2. List of Pfam family queries with structural fold annotation available and validated in the assessment of our approach. (XLSX 355Â kb)
Figure S2. Performance of our approach as a function of different parameters: a) Query length – P... more Figure S2. Performance of our approach as a function of different parameters: a) Query length – Performance as a function of the number of amino acids, annotated are the points above 0.8 for: Sensitivity (621*), Specificity (1110*), Precision (1077*) and MCC (782*). b) Repeat containing folds – Comparative performance of folds in our assessment dataset containing structural repeats with other folds. c) Secondary structure based SCOP classes – Performance metrics evaluated across different secondary structure based SCOP classes "a" through "g", which are as follows: a (All-α), b (All-β), c (α/β), d (α+β), e (Multi-domain proteins), f (Membrane and cell surface proteins and peptides) and g (Small proteins). (PNG 395 kb)
Figure S1. The frequency distribution of sequence query coverage for correct and incorrect fold a... more Figure S1. The frequency distribution of sequence query coverage for correct and incorrect fold associations: "Blue" represents the incorrect and "red" the correct associations respectively. The median for the distribution of "incorrect" associations corresponds to 30.13% query coverage, represented by the dotted line. 3.18% of the correct fold associations are to the left of this median value. (PNG 72 kb)
Table S1. List of Pfam families and the associated SCOP fold annotations obtained through mapping... more Table S1. List of Pfam families and the associated SCOP fold annotations obtained through mapping onto PDB entries. (XLSX 125Â kb)
nloaded helicase, the protein mutated in Bloom syndrome, is involved in signal transduction casca... more nloaded helicase, the protein mutated in Bloom syndrome, is involved in signal transduction cascades after damage. BLM is phosphorylated on multiple residues by different kinases either after stress induction ing mitosis. Here, we have provided evidence that both Chk1 and Chk2 phosphorylated the NH2al 660 amino acids of BLM. An internal region within the DExH motif of BLM negatively regulated k1/Chk2-dependent NH2-terminal phosphorylation event. Using in silico analysis involving the Chk1 re and its known substrate specificity, we predicted that Chk1 should preferentially phosphorylate BLM ine 646 (Ser). The prediction was validated in vitro by phosphopeptide analysis on BLM mutants and by usage of a newly generated phosphospecific polyclonal antibody. We showed that the phosphorylation 46 on BLM was constitutive and decreased rapidly after exposure to DNA damage. This resulted in the ished interaction of BLM with nucleolin and PML isoforms, and consequently decreased BLM accumuin t...
Journal of biomolecular structure & dynamics, 2015
Nucleic Acids Research, 2014
NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed p... more NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed proteinlike sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural 'intermediately related sequences' is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been 'enriched' with these artificial intermediary sequences. NrichD database currently contains 3 611 010 artificial sequences that have been generated between 27 882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user.
International Journal of Knowledge Discovery in Bioinformatics, 2011
In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid acc... more In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This pap...
Current Opinion in Structural Biology, Apr 1, 2016
Design of proteins has far-reaching potentials in diverse areas that span repurposing of the prot... more Design of proteins has far-reaching potentials in diverse areas that span repurposing of the protein scaffold for reactions and substrates that they were not naturally meant for, to catching a glimpse of the ephemeral proteins that nature might have sampled during evolution. These non-natural proteins, either in synthesized or virtual form have opened the scope for the design of entities that not only rival their natural counterparts but also offer a chance to visualize the protein space continuum that might help to relate proteins and understand their associations. Here, we review the recent advances in protein engineering and design, in multiple areas, with a view to drawing attention to their future potential.
homology detection methods
WORLD SCIENTIFIC eBooks, Dec 17, 2013
ABSTRACT With the amount of data deluge as a result of high-throughput sequencing techniques and ... more ABSTRACT With the amount of data deluge as a result of high-throughput sequencing techniques and structural genomics initiatives, there comes a need to leverage the large-scale data. Consequently, the role of computational methods to characterize genes and proteins solely from their sequence information becomes increasingly important. Over the past decade, development of sensitive profile-based sequence database search algorithms has improved the quality of structural and functional inferences from protein sequence. This chapter highlights the use of such sensitive approaches in recognition of evolutionary related proteins when the amino acid sequence similarity is very low. We further demonstrate the use of sequence database mining based remote homology detection methods in exploring the repertoire of functions and three dimensional structures of parasitic proteins in Trypanosoma brucei brucei, causative agent of African sleeping sickness. With an emphasis on various metabolic pathways, the sequence-function and structure-function relationships are investigated. Integrating the information of parasitic proteins in metabolic pathways along with their homology to targets of FDA-approved drugs, attractive drug targets have been proposed.
Proteins, Apr 22, 2017
Functional annotation is seldom straightforward with complexities arising due to functional diver... more Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology and designing enzymes with new functional capabilities.
Supplementary Figures S1-S4 and Supplementary Tables S1-S2.
Indian Institute of Science, 2018
Table S5. Pfam 31Â families now associated with structural folds and consensus with our applicati... more Table S5. Pfam 31Â families now associated with structural folds and consensus with our application dataset. (XLSX 21 kb)
Table S4. TM-Align scores for Pfam families with known structure information but no fold associat... more Table S4. TM-Align scores for Pfam families with known structure information but no fold association available in SCOP. (XLSX 14Â kb)
Table S3. List of structural associations by our approach: Details of the structural fold associa... more Table S3. List of structural associations by our approach: Details of the structural fold associations and the confidence of the assignments for queries from 1372 families is given. (20 families for which structures are available in Pfam 31 have been moved to Table S5). (XLSX 234 kb)
Figure S4. False vs. True positives for queries in the assessment dataset: The distribution of tr... more Figure S4. False vs. True positives for queries in the assessment dataset: The distribution of true positives vs. false positives as a function of a) Query and target coverage. b) Query and target alignment length (number of residues in the alignment). (PNG 363Â kb)
Figure S3. The normalized fold frequency of correct vs. incorrect associations for the assessment... more Figure S3. The normalized fold frequency of correct vs. incorrect associations for the assessment dataset: The preponderance of 'correct' associated folds (in green) is observed at a higher normalized fold frequency than other 'incorrect' fold associations (in red). (PNG 92 kb)
Table S2. List of Pfam family queries with structural fold annotation available and validated in ... more Table S2. List of Pfam family queries with structural fold annotation available and validated in the assessment of our approach. (XLSX 355Â kb)
Figure S2. Performance of our approach as a function of different parameters: a) Query length – P... more Figure S2. Performance of our approach as a function of different parameters: a) Query length – Performance as a function of the number of amino acids, annotated are the points above 0.8 for: Sensitivity (621*), Specificity (1110*), Precision (1077*) and MCC (782*). b) Repeat containing folds – Comparative performance of folds in our assessment dataset containing structural repeats with other folds. c) Secondary structure based SCOP classes – Performance metrics evaluated across different secondary structure based SCOP classes "a" through "g", which are as follows: a (All-α), b (All-β), c (α/β), d (α+β), e (Multi-domain proteins), f (Membrane and cell surface proteins and peptides) and g (Small proteins). (PNG 395 kb)
Figure S1. The frequency distribution of sequence query coverage for correct and incorrect fold a... more Figure S1. The frequency distribution of sequence query coverage for correct and incorrect fold associations: "Blue" represents the incorrect and "red" the correct associations respectively. The median for the distribution of "incorrect" associations corresponds to 30.13% query coverage, represented by the dotted line. 3.18% of the correct fold associations are to the left of this median value. (PNG 72 kb)
Table S1. List of Pfam families and the associated SCOP fold annotations obtained through mapping... more Table S1. List of Pfam families and the associated SCOP fold annotations obtained through mapping onto PDB entries. (XLSX 125Â kb)
nloaded helicase, the protein mutated in Bloom syndrome, is involved in signal transduction casca... more nloaded helicase, the protein mutated in Bloom syndrome, is involved in signal transduction cascades after damage. BLM is phosphorylated on multiple residues by different kinases either after stress induction ing mitosis. Here, we have provided evidence that both Chk1 and Chk2 phosphorylated the NH2al 660 amino acids of BLM. An internal region within the DExH motif of BLM negatively regulated k1/Chk2-dependent NH2-terminal phosphorylation event. Using in silico analysis involving the Chk1 re and its known substrate specificity, we predicted that Chk1 should preferentially phosphorylate BLM ine 646 (Ser). The prediction was validated in vitro by phosphopeptide analysis on BLM mutants and by usage of a newly generated phosphospecific polyclonal antibody. We showed that the phosphorylation 46 on BLM was constitutive and decreased rapidly after exposure to DNA damage. This resulted in the ished interaction of BLM with nucleolin and PML isoforms, and consequently decreased BLM accumuin t...
Journal of biomolecular structure & dynamics, 2015
Nucleic Acids Research, 2014
NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed p... more NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed proteinlike sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural 'intermediately related sequences' is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been 'enriched' with these artificial intermediary sequences. NrichD database currently contains 3 611 010 artificial sequences that have been generated between 27 882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user.
International Journal of Knowledge Discovery in Bioinformatics, 2011
In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid acc... more In the post-genomic era, biological databases are growing at a tremendous rate. Despite rapid accumulation of biological information, functions and other biological properties of many putative gene products of various organisms remain either unknown or obscure. This paper examines how strategic integration of large biological databases and combinations of various biological information helps address some of the fundamental questions on protein structure, function and interactions. New developments in function recognition by remote homology detection and strategic use of sequence databases aid recognition of functions of newly discovered proteins. Knowledge of 3-D structures and combined use of sequences and 3-D structures of homologous protein domains expands the ability of remote homology detection enormously. The authors also demonstrate how combined consideration of functions of individual domains of multi-domain proteins helps in recognizing gross biological attributes. This pap...