Silvina Fornasari | Universidad Nacional de Quilmes (original) (raw)
Papers by Silvina Fornasari
Frontiers in Bioinformatics
One of the main topics of cardiovascular research is the study of calcium (Ca2+) handling, as eve... more One of the main topics of cardiovascular research is the study of calcium (Ca2+) handling, as even small changes in Ca2+ concentration can alter cell functionality (Bers, Annu Rev Physiol, 2014, 76, 107–127). Ionic calcium (Ca2+) plays the role of a second messenger in eukaryotic cells, associated with cellular functions such as cell cycle regulation, transport, motility, gene expression, and regulation. The use of fluorometric techniques in isolated cells loaded with Ca2+-sensitive fluorescent probes allows quantitative measurement of dynamic events occurring in living, functioning cells. The Cardiomyocytes Images Analyzer Python (CardIAP) application addresses the need to analyze and retrieve information from confocal microscopy images systematically, accurately, and rapidly. Here we present CardIAP, an open-source tool developed entirely in Python, freely available and useable in an interactive web application. In addition, CardIAP can be used as a standalone Python library and f...
Motivation: Conformational diversity is a key concept in the under-standing of different issues r... more Motivation: Conformational diversity is a key concept in the under-standing of different issues related with protein function such as the study of catalytic processes in enzymes, protein-protein recognition, protein evolution and the origins of new biological functions. Here we present a database of proteins with different degrees of confor-mational diversity. CoDNaS (from Conformational Diversity of Native Sate) is a redundant collection of three-dimensional structures for the same protein derived from Protein Data Bank. Structures for the same protein obtained under different crystallographic conditions have been associated with snapshots of protein dynamism and consequently could characterize protein conformers. CoDNaS al-lows the user to explore global and local structural differences among conformers as a function of different parameters such as presence of ligand, post-translational modifications, changes in oligomeric states, differences in pH and temperature. Additionally Co...
Journal of molecular biology, 2020
Intrinsically disordered proteins (IDPs) lack stable tertiary structure under physiological condi... more Intrinsically disordered proteins (IDPs) lack stable tertiary structure under physiological conditions. The unique composition and complex dynamical behaviour of IDPs make them a challenge for structural biology and molecular evolution studies. Using NMR ensembles, we found that IDPs evolve under a strong site-specific evolutionary rate heterogeneity, mainly originated by different constraints derived from their inter-residue contacts. Evolutionary rate profiles correlate with the experimentally observed conformational diversity of the protein, allowing the description of different conformational patterns possibly related to their structure-function relationships. The correlation between evolutionary rates and contact information improves when structural information is taken not from any individual conformer or the whole ensemble, but from combining a limited number of conformers. Our results suggest that residue contacts in disordered regions constrain evolutionary rates to conserv...
After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions ap... more After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here we address the performance of AlphaFold2 predictions under this ensemble paradigm. Using a curated collection of apo-holo conformations, we found that AlphaFold2 predicts the holo form of a protein in 70% of the cases, being unable to reproduce the observed conformational diversity with an equivalent error than in the estimation of a single conformation. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairm...
Nucleic Acids Research
The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for ... more The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structur...
Motivation: Ionic calcium (Ca2+) plays the role of the second messenger in eukaryotic cells assoc... more Motivation: Ionic calcium (Ca2+) plays the role of the second messenger in eukaryotic cells associated with cellular functions of regulation of the cell cycle, such as transport, motility, gene expression, and metabolism (Permyakov and Kretsinger, 2009). The use of fluorometric techniques in isolated cells, loaded with Ca2+ sensitive fluorescent probes allows the quantitative measurement of dynamic events that occur in living, functioning cells. The Cardiomyocytes Images Analyzer Application (CardIAP) covers the need for tools to analyze and retrieve information from confocal microscopy images, in a systematic, accurate, and fast way. Results: Here we present the CardIAP web app, an automated method for the identification of spatio-temporal patterns in a calcium fluorescence imaging sequence. Through this tool, users can analyze single or multiple Ca2+ transients from confocal line-scan images and obtain quantitative information on the dynamic response of the stimulated myocyte. Our...
Current Research in Structural Biology
Conformational changes in RNA native ensembles are central to fulfill many of their biological ro... more Conformational changes in RNA native ensembles are central to fulfill many of their biological roles. Systematic knowledge of the extent and possible modulators of this conformational diversity is desirable to better understand the relationship between RNA dynamics and function.We have developed CoDNaS-RNA as the first database of conformational diversity in RNA molecules. Known RNA structures are retrieved and clustered to identify alternative conformers of each molecule. Pairwise structural comparisons within each cluster allows to measure the variability of the molecule. Additional data on structural features, molecular interactions and functional annotations are provided. CoDNaS-RNA is implemented as a public resource that can be of much interest for computational and bench scientists alike.AvailabilityCoDNaS-RNA is freely accessible at http://ufq.unq.edu.ar/codnasrnaContactnpalopoli@unq.edu.ar
Database
Revenant is a database of resurrected proteins coming from extinct organisms. Currently, it conta... more Revenant is a database of resurrected proteins coming from extinct organisms. Currently, it contains a manually curated collection of 84 resurrected proteins derived from bibliographic data. Each protein is extensively annotated, including structural, biochemical and biophysical information. Revenant contains a browse capability designed as a timeline from where the different proteins can be accessed. The oldest Revenant entries are between 4200 and 3500 million years ago, while the younger entries are between 8.8 and 6.3 million years ago. These proteins have been resurrected using computational tools called ancestral sequence reconstruction techniques combined with wet-laboratory synthesis and expression. Resurrected proteins are commonly used, with a noticeable increase during the past years, to explore and test different evolutionary hypotheses such as protein stability, to explore the origin of new functions, to get biochemical insights into past metabolisms and to explore spec...
PLOS ONE
In the Author Contributions section, María Silvina Fornasari should be listed as one of the perso... more In the Author Contributions section, María Silvina Fornasari should be listed as one of the persons involved in conceptualization.
PLOS Computational Biology
The dynamic nature of technological developments invites us to rethink the learning spaces. In th... more The dynamic nature of technological developments invites us to rethink the learning spaces. In this context, science education can be enriched by the contribution of new computational resources, making the educational process more up-to-date, challenging, and attractive. Bioinformatics is a key interdisciplinary field, contributing to the understanding of biological processes that is often underrated in secondary schools. As a useful resource in learning activities, bioinformatics could help in engaging students to integrate multiple fields of knowledge (logical-mathematical, biological, computational, etc.) and generate an enriched and long-lasting learning environment. Here, we report our recent project in which high school students learned basic concepts of programming applied to solving biological problems. The students were taught the Python syntax, and they coded simple tools to answer biological questions using resources at hand. Notably, these were built mostly on the students' own smartphones, which proved to be capable, readily available, and relevant complementary tools for teaching. This project resulted in an empowering and inclusive experience that challenged differences in social background and technological accessibility.
Inter-residue contacts determine the structural properties for each conformer in the ensembles de... more Inter-residue contacts determine the structural properties for each conformer in the ensembles describing the native state of proteins. Structural constraints during evolution could then provide biologically relevant information about the conformational ensembles and their relationship with protein function. Here, we studied the proportion of sites evolving under structural constraints in two very different types of ensembles, those coming from ordered or disordered proteins. Using a structurally constrained model of protein evolution we found that both types of ensembles show comparable, near 40%, number of positions evolving under structural constraints. Among these sites, ~68% are in disordered regions and ~57% of them show long-range inter-residue contacts. Also, we found that disordered ensembles are redundant in reference to their structurally constrained evolutionary information and could be described on average with ~11 conformers. Despite the different complexity of the stu...
Journal of Computational Chemistry
European journal of haematology, Jan 10, 2018
Hemoglobinopathies are the most common autosomal recessive disorders and are mostly inherited in ... more Hemoglobinopathies are the most common autosomal recessive disorders and are mostly inherited in a recessive manner. However, certain mutations can affect the globin chain stability, leading to dominant forms of thalassemia. The aim of this work was the molecular and structural characterization of two heterozygous in-frame deletions, leading to β-globin variants in pediatric patients in Argentina. The HBB gene of the probands and their parents was sequenced, and other markers of globin chain imbalance were analyzed. Several structural analyses were performed and the effect of the mutations on the globin chain stability was analyzed. In Hb JC-Paz, HBB:c.29_37delCTGCCGTTA (p.Ala10_Thr12del), detected in an Argentinean boy, one α-helix turn is expected to be lost. In Hb Tavapy, HBB:c.182_187delTGAAGG (p.Val60_Lys61del), the deleted residues are close to distal histidine (His63) in the heme pocket. Both mutations are predicted to have a destabilizing effect. The development of computati...
PloS one, 2017
Epidermal Growth Factor Receptor (EGFR), a tyrosine kinase receptor, is one of the main tumor mar... more Epidermal Growth Factor Receptor (EGFR), a tyrosine kinase receptor, is one of the main tumor markers in different types of cancers. The kinase native state is mainly composed of two populations of conformers: active and inactive. Several sequence variations in EGFR kinase region promote the differential enrichment of conformers with higher activity. Some structural characteristics have been proposed to differentiate kinase conformations, but these considerations could lead to ambiguous classifications. We present a structural characterisation of EGFR kinase conformers, focused on active site pocket comparisons, and the mapping of known pathological sequence variations. A structural based clustering of this pocket accurately discriminates active from inactive, well-characterised conformations. Furthermore, this main pocket contains, or is in close contact with, ≈65% of cancer-related variation positions. Although the relevance of protein dynamics to explain biological function has b...
Briefings in Bioinformatics
Major scientific challenges that are beyond the capability of individuals need to be addressed by... more Major scientific challenges that are beyond the capability of individuals need to be addressed by multi-disciplinary and multi-institutional consortia. Examples of these endeavours include the Human Genome Project, and more recently, the Structural Genomics (SG) initiative. The SG initiative pursues the expansion of structural coverage to include at least one structural representative for each protein family to derive the remaining structures using homology modelling. However, biological function is inherently connected with protein dynamics that can be studied by knowing different structures of the same protein. This ensemble of structures provides snapshots of protein conformational diversity under native conditions. Thus, sequence redundancy in the Protein Data Bank (PDB) (i.e. crystallization of the same protein under different conditions) is therefore an essential input contributing to experimentally based studies of protein dynamics and providing insights into protein function. In this work, we show that sequence redundancy, a key concept for exploring protein dynamics, is highly biased and fundamentally incomplete in the PDB. Additionally, our results show that dynamical behaviour of proteins cannot be inferred using homologous proteins. Minor to moderate changes in sequence can produce great differences in dynamical behaviour. Nonetheless, the structural and dynamical incompleteness of the PDB is apparently unrelated concepts in SG. While the first could be reversed by promoting the extension of the structural coverage, we would like to emphasize that further focused efforts will be needed to amend the incompleteness of the PDB in terms of dynamical information content, essential to fully understand protein function.
PLOS Computational Biology, 2017
Protein motions are a key feature to understand biological function. Recently, a large-scale anal... more Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of 5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call "rigid" (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions.
Frontiers in Bioinformatics
One of the main topics of cardiovascular research is the study of calcium (Ca2+) handling, as eve... more One of the main topics of cardiovascular research is the study of calcium (Ca2+) handling, as even small changes in Ca2+ concentration can alter cell functionality (Bers, Annu Rev Physiol, 2014, 76, 107–127). Ionic calcium (Ca2+) plays the role of a second messenger in eukaryotic cells, associated with cellular functions such as cell cycle regulation, transport, motility, gene expression, and regulation. The use of fluorometric techniques in isolated cells loaded with Ca2+-sensitive fluorescent probes allows quantitative measurement of dynamic events occurring in living, functioning cells. The Cardiomyocytes Images Analyzer Python (CardIAP) application addresses the need to analyze and retrieve information from confocal microscopy images systematically, accurately, and rapidly. Here we present CardIAP, an open-source tool developed entirely in Python, freely available and useable in an interactive web application. In addition, CardIAP can be used as a standalone Python library and f...
Motivation: Conformational diversity is a key concept in the under-standing of different issues r... more Motivation: Conformational diversity is a key concept in the under-standing of different issues related with protein function such as the study of catalytic processes in enzymes, protein-protein recognition, protein evolution and the origins of new biological functions. Here we present a database of proteins with different degrees of confor-mational diversity. CoDNaS (from Conformational Diversity of Native Sate) is a redundant collection of three-dimensional structures for the same protein derived from Protein Data Bank. Structures for the same protein obtained under different crystallographic conditions have been associated with snapshots of protein dynamism and consequently could characterize protein conformers. CoDNaS al-lows the user to explore global and local structural differences among conformers as a function of different parameters such as presence of ligand, post-translational modifications, changes in oligomeric states, differences in pH and temperature. Additionally Co...
Journal of molecular biology, 2020
Intrinsically disordered proteins (IDPs) lack stable tertiary structure under physiological condi... more Intrinsically disordered proteins (IDPs) lack stable tertiary structure under physiological conditions. The unique composition and complex dynamical behaviour of IDPs make them a challenge for structural biology and molecular evolution studies. Using NMR ensembles, we found that IDPs evolve under a strong site-specific evolutionary rate heterogeneity, mainly originated by different constraints derived from their inter-residue contacts. Evolutionary rate profiles correlate with the experimentally observed conformational diversity of the protein, allowing the description of different conformational patterns possibly related to their structure-function relationships. The correlation between evolutionary rates and contact information improves when structural information is taken not from any individual conformer or the whole ensemble, but from combining a limited number of conformers. Our results suggest that residue contacts in disordered regions constrain evolutionary rates to conserv...
After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions ap... more After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here we address the performance of AlphaFold2 predictions under this ensemble paradigm. Using a curated collection of apo-holo conformations, we found that AlphaFold2 predicts the holo form of a protein in 70% of the cases, being unable to reproduce the observed conformational diversity with an equivalent error than in the estimation of a single conformation. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairm...
Nucleic Acids Research
The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for ... more The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structur...
Motivation: Ionic calcium (Ca2+) plays the role of the second messenger in eukaryotic cells assoc... more Motivation: Ionic calcium (Ca2+) plays the role of the second messenger in eukaryotic cells associated with cellular functions of regulation of the cell cycle, such as transport, motility, gene expression, and metabolism (Permyakov and Kretsinger, 2009). The use of fluorometric techniques in isolated cells, loaded with Ca2+ sensitive fluorescent probes allows the quantitative measurement of dynamic events that occur in living, functioning cells. The Cardiomyocytes Images Analyzer Application (CardIAP) covers the need for tools to analyze and retrieve information from confocal microscopy images, in a systematic, accurate, and fast way. Results: Here we present the CardIAP web app, an automated method for the identification of spatio-temporal patterns in a calcium fluorescence imaging sequence. Through this tool, users can analyze single or multiple Ca2+ transients from confocal line-scan images and obtain quantitative information on the dynamic response of the stimulated myocyte. Our...
Current Research in Structural Biology
Conformational changes in RNA native ensembles are central to fulfill many of their biological ro... more Conformational changes in RNA native ensembles are central to fulfill many of their biological roles. Systematic knowledge of the extent and possible modulators of this conformational diversity is desirable to better understand the relationship between RNA dynamics and function.We have developed CoDNaS-RNA as the first database of conformational diversity in RNA molecules. Known RNA structures are retrieved and clustered to identify alternative conformers of each molecule. Pairwise structural comparisons within each cluster allows to measure the variability of the molecule. Additional data on structural features, molecular interactions and functional annotations are provided. CoDNaS-RNA is implemented as a public resource that can be of much interest for computational and bench scientists alike.AvailabilityCoDNaS-RNA is freely accessible at http://ufq.unq.edu.ar/codnasrnaContactnpalopoli@unq.edu.ar
Database
Revenant is a database of resurrected proteins coming from extinct organisms. Currently, it conta... more Revenant is a database of resurrected proteins coming from extinct organisms. Currently, it contains a manually curated collection of 84 resurrected proteins derived from bibliographic data. Each protein is extensively annotated, including structural, biochemical and biophysical information. Revenant contains a browse capability designed as a timeline from where the different proteins can be accessed. The oldest Revenant entries are between 4200 and 3500 million years ago, while the younger entries are between 8.8 and 6.3 million years ago. These proteins have been resurrected using computational tools called ancestral sequence reconstruction techniques combined with wet-laboratory synthesis and expression. Resurrected proteins are commonly used, with a noticeable increase during the past years, to explore and test different evolutionary hypotheses such as protein stability, to explore the origin of new functions, to get biochemical insights into past metabolisms and to explore spec...
PLOS ONE
In the Author Contributions section, María Silvina Fornasari should be listed as one of the perso... more In the Author Contributions section, María Silvina Fornasari should be listed as one of the persons involved in conceptualization.
PLOS Computational Biology
The dynamic nature of technological developments invites us to rethink the learning spaces. In th... more The dynamic nature of technological developments invites us to rethink the learning spaces. In this context, science education can be enriched by the contribution of new computational resources, making the educational process more up-to-date, challenging, and attractive. Bioinformatics is a key interdisciplinary field, contributing to the understanding of biological processes that is often underrated in secondary schools. As a useful resource in learning activities, bioinformatics could help in engaging students to integrate multiple fields of knowledge (logical-mathematical, biological, computational, etc.) and generate an enriched and long-lasting learning environment. Here, we report our recent project in which high school students learned basic concepts of programming applied to solving biological problems. The students were taught the Python syntax, and they coded simple tools to answer biological questions using resources at hand. Notably, these were built mostly on the students' own smartphones, which proved to be capable, readily available, and relevant complementary tools for teaching. This project resulted in an empowering and inclusive experience that challenged differences in social background and technological accessibility.
Inter-residue contacts determine the structural properties for each conformer in the ensembles de... more Inter-residue contacts determine the structural properties for each conformer in the ensembles describing the native state of proteins. Structural constraints during evolution could then provide biologically relevant information about the conformational ensembles and their relationship with protein function. Here, we studied the proportion of sites evolving under structural constraints in two very different types of ensembles, those coming from ordered or disordered proteins. Using a structurally constrained model of protein evolution we found that both types of ensembles show comparable, near 40%, number of positions evolving under structural constraints. Among these sites, ~68% are in disordered regions and ~57% of them show long-range inter-residue contacts. Also, we found that disordered ensembles are redundant in reference to their structurally constrained evolutionary information and could be described on average with ~11 conformers. Despite the different complexity of the stu...
Journal of Computational Chemistry
European journal of haematology, Jan 10, 2018
Hemoglobinopathies are the most common autosomal recessive disorders and are mostly inherited in ... more Hemoglobinopathies are the most common autosomal recessive disorders and are mostly inherited in a recessive manner. However, certain mutations can affect the globin chain stability, leading to dominant forms of thalassemia. The aim of this work was the molecular and structural characterization of two heterozygous in-frame deletions, leading to β-globin variants in pediatric patients in Argentina. The HBB gene of the probands and their parents was sequenced, and other markers of globin chain imbalance were analyzed. Several structural analyses were performed and the effect of the mutations on the globin chain stability was analyzed. In Hb JC-Paz, HBB:c.29_37delCTGCCGTTA (p.Ala10_Thr12del), detected in an Argentinean boy, one α-helix turn is expected to be lost. In Hb Tavapy, HBB:c.182_187delTGAAGG (p.Val60_Lys61del), the deleted residues are close to distal histidine (His63) in the heme pocket. Both mutations are predicted to have a destabilizing effect. The development of computati...
PloS one, 2017
Epidermal Growth Factor Receptor (EGFR), a tyrosine kinase receptor, is one of the main tumor mar... more Epidermal Growth Factor Receptor (EGFR), a tyrosine kinase receptor, is one of the main tumor markers in different types of cancers. The kinase native state is mainly composed of two populations of conformers: active and inactive. Several sequence variations in EGFR kinase region promote the differential enrichment of conformers with higher activity. Some structural characteristics have been proposed to differentiate kinase conformations, but these considerations could lead to ambiguous classifications. We present a structural characterisation of EGFR kinase conformers, focused on active site pocket comparisons, and the mapping of known pathological sequence variations. A structural based clustering of this pocket accurately discriminates active from inactive, well-characterised conformations. Furthermore, this main pocket contains, or is in close contact with, ≈65% of cancer-related variation positions. Although the relevance of protein dynamics to explain biological function has b...
Briefings in Bioinformatics
Major scientific challenges that are beyond the capability of individuals need to be addressed by... more Major scientific challenges that are beyond the capability of individuals need to be addressed by multi-disciplinary and multi-institutional consortia. Examples of these endeavours include the Human Genome Project, and more recently, the Structural Genomics (SG) initiative. The SG initiative pursues the expansion of structural coverage to include at least one structural representative for each protein family to derive the remaining structures using homology modelling. However, biological function is inherently connected with protein dynamics that can be studied by knowing different structures of the same protein. This ensemble of structures provides snapshots of protein conformational diversity under native conditions. Thus, sequence redundancy in the Protein Data Bank (PDB) (i.e. crystallization of the same protein under different conditions) is therefore an essential input contributing to experimentally based studies of protein dynamics and providing insights into protein function. In this work, we show that sequence redundancy, a key concept for exploring protein dynamics, is highly biased and fundamentally incomplete in the PDB. Additionally, our results show that dynamical behaviour of proteins cannot be inferred using homologous proteins. Minor to moderate changes in sequence can produce great differences in dynamical behaviour. Nonetheless, the structural and dynamical incompleteness of the PDB is apparently unrelated concepts in SG. While the first could be reversed by promoting the extension of the structural coverage, we would like to emphasize that further focused efforts will be needed to amend the incompleteness of the PDB in terms of dynamical information content, essential to fully understand protein function.
PLOS Computational Biology, 2017
Protein motions are a key feature to understand biological function. Recently, a large-scale anal... more Protein motions are a key feature to understand biological function. Recently, a large-scale analysis of protein conformational diversity showed a positively skewed distribution with a peak at 0.5 Å C-alpha root-mean-square-deviation (RMSD). To understand this distribution in terms of structure-function relationships, we studied a well curated and large dataset of 5,000 proteins with experimentally determined conformational diversity. We searched for global behaviour patterns studying how structure-based features change among the available conformer population for each protein. This procedure allowed us to describe the RMSD distribution in terms of three main protein classes sharing given properties. The largest of these protein subsets (~60%), which we call "rigid" (average RMSD = 0.83 Å), has no disordered regions, shows low conformational diversity, the largest tunnels and smaller and buried cavities. The two additional subsets contain disordered regions, but with differential sequence composition and behaviour. Partially disordered proteins have on average 67% of their conformers with disordered regions, average RMSD = 1.1 Å, the highest number of hinges and the longest disordered regions. In contrast, malleable proteins have on average only 25% of disordered conformers and average RMSD = 1.3 Å, flexible cavities affected in size by the presence of disordered regions and show the highest diversity of cognate ligands. Proteins in each set are mostly non-homologous to each other, share no given fold class, nor functional similarity but do share features derived from their conformer population. These shared features could represent conformational mechanisms related with biological functions.