Elodie Laine - Academia.edu (original) (raw)
Papers by Elodie Laine
Trends in Pharmacological Sciences, 2021
Solute carrier (SLC) transporters are emerging drug targets. Identifying the molecular determinan... more Solute carrier (SLC) transporters are emerging drug targets. Identifying the molecular determinants responsible for their specific and selective transport activities and describing key interactions with their ligands are crucial steps towards the design of potential new drugs. A general functional mapping across more than 400 human SLC transporters would pave the way to the rational and systematic design of molecules modulating cellular transport. Challenging Drug Targets SLC transporters mediate the transport of a broad range of solutes, such as ions, nutrients, and metabolites across biological membranes. In human, dysregulation of the homeostasis of the transported substrates, has been associated with multiple diseases and disorders, such as cancers. Additionally, SLCs play an essential role in the absorption, distribution, metabolism, and elimination, of therapeutic drugs. Thus, SLCs are key drug targets [1,2], that remained understudied until recently [3]. Understanding these complex biological systems requires the description of many aspects of their functioning (e.g., interactions with ligands and with protein partners, conformational changes and kinetics of transport, response to cofactors, and differential expression in different cell types). These different aspects can be probed by various technologies, including structural determination, genetic editing, metabolomics, various animal models, chemical biology, basic biochemistry, etc. Among them, structurebased techniques are commonly used to Horizon 2020 research and innovation program and EFPIA.
The complexity underlying protein-protein interaction (PPI) networks calls for the development of... more The complexity underlying protein-protein interaction (PPI) networks calls for the development of comprehensive knowledge bases organizing PPI-related data. The constant growth and high reliability of structural data make them a suitable source of evidence for the determination of PPI. We present LEVELNET, a fully-automated and scalable environment designed to integrate, explore, and infer protein interactions and non-interactions based on physical contacts and other PPI sources, including user-defined annotations. LEVELNET helps to break down the complexity of PPI networks by representing them as multi-layered graphs and allowing the selection of subnetworks and their direct comparison. LEVELNET proposes an interactive visualisation based on a user-friendly web interface. LEVELNET applications are multiple. It allows to explore PPIs of biological processes, identify co-localised partners, assess PPI predictions from computational or experimental sources, unravel cross-interactions,...
Proteins ensure their biological functions by interacting with each other, and with other molecul... more Proteins ensure their biological functions by interacting with each other, and with other molecules. Determining the relative position and orientation of protein partners in a complex remains challenging. Here, we address the problem of ranking candidate complex conformations toward identifying near-native conformations. We propose a deep learning approach relying on a local representation of the protein interface with an explicit account of its geometry. We show that the method is able to recognise certain pattern distributions in specific locations of the interface. We compare and combine it with a physics-based scoring function and a statistical pair potential.
Structure, 2018
Several models estimating the strength of the interaction between proteins in a complex have been... more Several models estimating the strength of the interaction between proteins in a complex have been proposed. By exploring the geometry of contact distribution at protein-protein interfaces, we provide an improved model of binding energy. Local Interaction Signal Analysis (LISA) is a radial function based on terms describing favorable and non-favorable contacts obtained by Density Functional Theory, the Support-Core-Rim interface residue distribution, non-interacting charged residues and secondary structures contribution. The three-dimensional organisation of the contacts and their contribution on localised hot-sites over the entire interaction surface were numerically evaluated. LISA achieves a correlation of 0.81 (and RMSE of 2.35 ± 0.38 kcal/mol) when tested on 125 complexes for which experimental measurements were realised. LISA's performance is stable for subsets defined by functional composition and extent of conformational changes upon complex formation. A large-scale comparison with 17 other functions demonstrated the power of the geometrical model in the understanding of complex binding.
PLoS Computational Biology, 2013
Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simu... more Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simulations and binding site predictions resulting from evolutionary sequence analysis, are possible and realizable on hundreds of proteins with variate structures and interfaces. We demonstrated this on the 168 proteins of the Mintseris Benchmark 2.0. On the one hand, we evaluated the quality of the interaction signal and the contribution of docking information compared to evolutionary information showing that the combination of the two improves partner identification. On the other hand, since protein interactions usually occur in crowded environments with several competing partners, we realized a thorough analysis of the interactions of proteins with true partners but also with non-partners to evaluate whether proteins in the environment, competing with the true partner, affect its identification. We found three populations of proteins: strongly competing, never competing, and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behavior of a protein in the crowded environment. We showed that partner identification, to some extent, does not depend on the competing partners present in the environment, that certain biochemical classes of proteins are intrinsically easier to analyze than others, and that small proteins are not more promiscuous than large ones. Our approach brings to light that the knowledge of the binding site can be used to reduce the high computational cost of docking simulations with no consequence in the quality of the results, demonstrating the possibility to apply coarse-grain docking to datasets made of thousands of proteins. Comparison with all available largescale analyses aimed to partner predictions is realized. We release the complete decoys set issued by coarse-grain docking simulations of both true and false interacting partners, and their evolutionary sequence analysis leading to binding site predictions.
Proteins: Structure, Function, and Bioinformatics, 2021
PLOS Computational Biology, 2022
Proteins ensure their biological functions by interacting with each other. Hence, characterising ... more Proteins ensure their biological functions by interacting with each other. Hence, characterising protein interactions is fundamental for our understanding of the cellular machinery, and for improving medicine and bioengineering. Over the past years, a large body of experimental data has been accumulated on who interacts with whom and in what manner. However, these data are highly heterogeneous and sometimes contradictory, noisy, and biased. Ab initio methods provide a means to a “blind” protein-protein interaction network reconstruction. Here, we report on a molecular cross-docking-based approach for the identification of protein partners. The docking algorithm uses a coarse-grained representation of the protein structures and treats them as rigid bodies. We applied the approach to a few hundred of proteins, in the unbound conformations, and we systematically investigated the influence of several key ingredients, such as the size and quality of the interfaces, and the scoring functi...
La virulence de la bacterie Gram+ Bacillus anthracis, responsable de la maladie du charbon, est d... more La virulence de la bacterie Gram+ Bacillus anthracis, responsable de la maladie du charbon, est due a la presence d'une capsule et deux toxines. Chaque toxine resulte de l'assemblage de l'antigene protecteur (PA) avec l'un des deux facteurs, letal (LF) ou oedemateux (EF), dans le cytoplasme de la cellule hote. EF est une adenylyl cyclase, qui transforme l'ATP en AMPc de maniere incontrolee, provoquant des dereglements cellulaires. Elle est activee par la calmoduline (CaM), impliquee dans de nombreuses voies de signalisation du calcium. Des structures cristallographiques et une etude par RMN ont montre que la stabilite du complexe EF-CaM depend du niveau de calcium fixe a CAM. Des simulations de dynamique moleculaire du complexe, avec 0, 2 ou 4 ions calcium, ont permis de caracteriser l'effet du calcium sur la plasticite conformationnelle des deux partenaires et de proposer un modele de l'interaction EF-CaM. L'analyse conjointe des correlations dynamiq...
BMC Bioinformatics, 2020
Background Coiled-coils are described as stable structural motifs, where two or more helices wind... more Background Coiled-coils are described as stable structural motifs, where two or more helices wind around each other. However, coiled-coils are associated with local mobility and intrinsic disorder. Intrinsically disordered regions in proteins are characterized by lack of stable secondary and tertiary structure under physiological conditions in vitro. They are increasingly recognized as important for protein function. However, characterizing their behaviour in solution and determining precisely the extent of disorder of a protein region remains challenging, both experimentally and computationally. Results In this work, we propose a computational framework to quantify the extent of disorder within a coiled-coil in solution and to help design substitutions modulating such disorder. Our method relies on the analysis of conformational ensembles generated by relatively short all-atom Molecular Dynamics (MD) simulations. We apply it to the phosphoprotein multimerisation domains (PMD) of Me...
Understanding how protein function has evolved and diversified is of great importance for human g... more Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalising the definition of splicing graph. We provide a practical solution to building parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue-regulation and conservation of AS events on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousands of genes where alternative splicing modulates the number and composition of pseudo-repeats. We have implemented our approach in ThorAxe, an efficient, versatile, and robust computational tool freely available at https://github.com/PhyloSofS-Team/thoraxe. The results are accessible and can be browsed interactively at http://www.lcqb.upmc.fr/ThorAxe.
The development and application of formal methods to biological problems imply a nonstandard asse... more The development and application of formal methods to biological problems imply a nonstandard assessment of their validity. Indeed, in this context, the verifiers or checkers of the methods are the biological data. Formal models are used to understand the functioning and emergent properties of biological systems and they are revised/improved based on their agreement with the empirical observations obtained on these systems. They also produce data that predict what has not been yet observed and that cannot be obtained by current experimental methods. This mini-symposium highlights recents mathematical developments toward elucidating biological questions or modeling complex biological systems from this perspective.
The Journal of Physical Chemistry B
PLOS Computational Biology
Interactions between proteins and nucleic acids are at the heart of many essential biological pro... more Interactions between proteins and nucleic acids are at the heart of many essential biological processes. Despite increasing structural information about how these interactions may take place, our understanding of the usage made of protein surfaces by nucleic acids is still very limited. This is in part due to the inherent complexity associated to protein surface deformability and evolution. In this work, we present a method that contributes to decipher such complexity by predicting protein-DNA interfaces and characterizing their properties. It relies on three biologically and physically meaningful descriptors, namely evolutionary conservation, physico-chemical properties and surface geometry. We carefully assessed its performance on several hundreds of protein structures and compared it to several machinelearning state-of-the-art methods. Our approach achieves a higher sensitivity compared to the other methods, with a similar precision. Importantly, we show that it is able to unravel 'hidden' binding sites by applying it to unbound protein structures and to proteins binding to DNA via multiple sites and in different conformations. It is also applicable to the detection of RNA-binding sites, without significant loss of performance. This confirms that DNA and RNA-binding sites share similar properties. Our method is implemented as a fully automated tool, JET 2 DNA , freely accessible at: http://www.lcqb.upmc.fr/JET2DNA. We also provide a new dataset of 187 protein-DNA complex structures, along with a subset of 82 associated unbound structures. The set represents the largest body of high-resolution crystallographic structures of protein-DNA complexes, use biological protein assemblies as DNA-binding units, and covers all major types of protein-DNA interactions.
Scientific Reports
Characterizing a protein mutational landscape is a very challenging problem in Biology. Many dise... more Characterizing a protein mutational landscape is a very challenging problem in Biology. Many diseaseassociated mutations do not seem to produce any effect on the global shape nor motions of the protein. Here, we use relatively short all-atom biomolecular simulations to predict mutational outcomes and we quantitatively assess the predictions on several hundreds of mutants. We perform simulations of the wild type and 175 mutants of PSD95's third PDZ domain in complex with its cognate ligand. By recording residue displacements correlations and interactions, we identify "communication pathways" and quantify them to predict the severity of the mutations. Moreover, we show that by exploiting simulations of the wild type, one can detect 80% of the positions highly sensitive to mutations with a precision of 89%. Importantly, our analysis describes the role of these positions in the inter-residue communication and dynamical architecture of the complex. We assess our approach on three different systems using data from deep mutational scanning experiments and high-throughput exome sequencing. We refer to our analysis as "infostery", from "info"-information-and "steric"arrangement of residues in space. We provide a fully automated tool, COMMA2 (www.lcqb.upmc.fr/ COMMA2), that can be used to guide medicinal research by selecting important positions/mutations. The question of which and how amino acid sequence variations (re-)shape the conformational landscape of proteins and impact their function is one of outstanding importance in Biology. Disease-associated mutations can impair protein function in various ways, by destabilizing the protein structure, by shifting the equilibrium of conformation populations, or by modulating the binding affinity of the protein for its cellular partner(s), to name a few. Recent biotechnological advances have opened the way to systematically estimating the functional consequences of single-point mutations at every position in a protein, through deep mutational scanning 1. So far, such analysis has been conducted on less than twenty proteins (see 2 for a list of proteins and associated experiments), including the third PDZ domain of the brain synaptic protein PSD-95 (PSD95 pdz3) 3 and the β-lactamase TEM-1 4,5. These experiments have revealed that a relatively small number of positions in a protein are highly sensitive to mutations 3,4 : a substitution of the amino acid at any of these highly sensitive positions by almost any other amino acid produces a deleterious phenotype. They also have stimulated the development of sequence analysis based methods to predict mutational outcomes at large scale (Fig. 1a, black arrow), some of them being much more accurate than widely used methods combining sequence and structure information 2,6. Even though sequence based methods can yield very accurate predictions of mutational phenotypic outcomes, they cannot shed light on the molecular mechanisms underlying them. Structure based methods provide a way to do so, and many studies have investigated the global and/or local effect of mutations on protein thermodynamic stability, hydrogen bond network and conformational dynamics 7-22. There are few reported cases where crystallized protein mutants provide clear insights on the effects of the mutations (e.g. p53 cancer mutations affecting the arginines in contact with DNA 21,22). However, in the vast majority of cases, the global shape of the protein remains unchanged upon mutations, even when the latter result in deleterious phenotypes 9. This is very well exemplified
Proteins: Structure, Function, and Bioinformatics
This article has been accepted for publication and undergone full peer review but has not been th... more This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as
Blood
Context : Germline mutations in genes involved in the hypoxia sensing pathway (VHL, PHD2/EGLN1, H... more Context : Germline mutations in genes involved in the hypoxia sensing pathway (VHL, PHD2/EGLN1, HIF2A/EPAS1) predispose patients to erythrocytosis associated with normal or high serum erythropoietin level. The more frequent mutation, VHL-R200W (R200W), has been identified in homozygous carriers with a congenital erythrocytosis named Chuvash polycythemia. Survival in the Chuvash patients was found to be reduced compared to control groups due to higher rates of arterial and venous thromboses, and to haemorrhagic events. Noteworthily, a characteristic of these patients and their parents, heterozygous for the mutation, is the total absence of tumor development as the opposite of heterozygous carriers of other VHL mutations. Indeed, VHL is a tumor suppressor gene and heterozygous carriers of VHLmutations have von Hippel-Lindau disease and are at high risk of multiple tumors (e.g. CNS hemangioblastomas, pheochromocytoma, renal cell carcinoma). The absence of tumor development in patients ...
Journal of Chemical Information and Modeling
INTerface Builder (INTBuilder) is a fast, easy-to-use program to compute protein−protein interfac... more INTerface Builder (INTBuilder) is a fast, easy-to-use program to compute protein−protein interfaces. It is designed to retrieve interfaces from molecular docking software outputs in an empirically determined linear complexity. INTBuilder directly reads the output formats of popular docking programs like ATTRACT, HEX, MAXDo, and ZDOCK, as well as a more generic format and Protein Data Bank (PDB) files. It identifies interacting surfaces at both residue and atom resolutions.
Large macromolecules, including proteins and their complexes, very often adopt multiple conformat... more Large macromolecules, including proteins and their complexes, very often adopt multiple conformations. Some of them can be seen experimentally, for example with X-ray crystallography or cryo-electron microscopy. This structural heterogeneity is not occasional and is frequently linked with specific biological function. Thus, the accurate description of macromolecular conformational transitions is crucial for understanding fundamental mechanisms of life's machinery. We report on a real-time method to predict such transitions by extrapolating from instantaneous eigen-motions, computed using the normal mode analysis, to a series of twists. We demonstrate the applicability of our approach to the prediction of a wide range of motions, including large collective opening-closing transitions and conformational changes induced by partner binding. We also highlight particularly difficult cases of very small transitions between crystal and solution structures. Our method guaranties preserva...
The systematic and accurate description of protein mutational landscapes is a question of utmost ... more The systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling inter-site dependencies within biological sequences. However, state-of-the-art methods require numerous highly variable sequences and remain time consuming. Here, we present GEMME (www.lcqb.upmc.fr/GEMME), a method that overcomes these limitations by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. Assessed against 41 experimental high-throughput mutational scans, GEMME overall performs similarly or better than existing methods and runs faster by several orders of magnitude. It greatly improves predictions for viral sequences and, more generally, for very conserved families. It uses only a few biologically meaningf...
Trends in Pharmacological Sciences, 2021
Solute carrier (SLC) transporters are emerging drug targets. Identifying the molecular determinan... more Solute carrier (SLC) transporters are emerging drug targets. Identifying the molecular determinants responsible for their specific and selective transport activities and describing key interactions with their ligands are crucial steps towards the design of potential new drugs. A general functional mapping across more than 400 human SLC transporters would pave the way to the rational and systematic design of molecules modulating cellular transport. Challenging Drug Targets SLC transporters mediate the transport of a broad range of solutes, such as ions, nutrients, and metabolites across biological membranes. In human, dysregulation of the homeostasis of the transported substrates, has been associated with multiple diseases and disorders, such as cancers. Additionally, SLCs play an essential role in the absorption, distribution, metabolism, and elimination, of therapeutic drugs. Thus, SLCs are key drug targets [1,2], that remained understudied until recently [3]. Understanding these complex biological systems requires the description of many aspects of their functioning (e.g., interactions with ligands and with protein partners, conformational changes and kinetics of transport, response to cofactors, and differential expression in different cell types). These different aspects can be probed by various technologies, including structural determination, genetic editing, metabolomics, various animal models, chemical biology, basic biochemistry, etc. Among them, structurebased techniques are commonly used to Horizon 2020 research and innovation program and EFPIA.
The complexity underlying protein-protein interaction (PPI) networks calls for the development of... more The complexity underlying protein-protein interaction (PPI) networks calls for the development of comprehensive knowledge bases organizing PPI-related data. The constant growth and high reliability of structural data make them a suitable source of evidence for the determination of PPI. We present LEVELNET, a fully-automated and scalable environment designed to integrate, explore, and infer protein interactions and non-interactions based on physical contacts and other PPI sources, including user-defined annotations. LEVELNET helps to break down the complexity of PPI networks by representing them as multi-layered graphs and allowing the selection of subnetworks and their direct comparison. LEVELNET proposes an interactive visualisation based on a user-friendly web interface. LEVELNET applications are multiple. It allows to explore PPIs of biological processes, identify co-localised partners, assess PPI predictions from computational or experimental sources, unravel cross-interactions,...
Proteins ensure their biological functions by interacting with each other, and with other molecul... more Proteins ensure their biological functions by interacting with each other, and with other molecules. Determining the relative position and orientation of protein partners in a complex remains challenging. Here, we address the problem of ranking candidate complex conformations toward identifying near-native conformations. We propose a deep learning approach relying on a local representation of the protein interface with an explicit account of its geometry. We show that the method is able to recognise certain pattern distributions in specific locations of the interface. We compare and combine it with a physics-based scoring function and a statistical pair potential.
Structure, 2018
Several models estimating the strength of the interaction between proteins in a complex have been... more Several models estimating the strength of the interaction between proteins in a complex have been proposed. By exploring the geometry of contact distribution at protein-protein interfaces, we provide an improved model of binding energy. Local Interaction Signal Analysis (LISA) is a radial function based on terms describing favorable and non-favorable contacts obtained by Density Functional Theory, the Support-Core-Rim interface residue distribution, non-interacting charged residues and secondary structures contribution. The three-dimensional organisation of the contacts and their contribution on localised hot-sites over the entire interaction surface were numerically evaluated. LISA achieves a correlation of 0.81 (and RMSE of 2.35 ± 0.38 kcal/mol) when tested on 125 complexes for which experimental measurements were realised. LISA's performance is stable for subsets defined by functional composition and extent of conformational changes upon complex formation. A large-scale comparison with 17 other functions demonstrated the power of the geometrical model in the understanding of complex binding.
PLoS Computational Biology, 2013
Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simu... more Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simulations and binding site predictions resulting from evolutionary sequence analysis, are possible and realizable on hundreds of proteins with variate structures and interfaces. We demonstrated this on the 168 proteins of the Mintseris Benchmark 2.0. On the one hand, we evaluated the quality of the interaction signal and the contribution of docking information compared to evolutionary information showing that the combination of the two improves partner identification. On the other hand, since protein interactions usually occur in crowded environments with several competing partners, we realized a thorough analysis of the interactions of proteins with true partners but also with non-partners to evaluate whether proteins in the environment, competing with the true partner, affect its identification. We found three populations of proteins: strongly competing, never competing, and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behavior of a protein in the crowded environment. We showed that partner identification, to some extent, does not depend on the competing partners present in the environment, that certain biochemical classes of proteins are intrinsically easier to analyze than others, and that small proteins are not more promiscuous than large ones. Our approach brings to light that the knowledge of the binding site can be used to reduce the high computational cost of docking simulations with no consequence in the quality of the results, demonstrating the possibility to apply coarse-grain docking to datasets made of thousands of proteins. Comparison with all available largescale analyses aimed to partner predictions is realized. We release the complete decoys set issued by coarse-grain docking simulations of both true and false interacting partners, and their evolutionary sequence analysis leading to binding site predictions.
Proteins: Structure, Function, and Bioinformatics, 2021
PLOS Computational Biology, 2022
Proteins ensure their biological functions by interacting with each other. Hence, characterising ... more Proteins ensure their biological functions by interacting with each other. Hence, characterising protein interactions is fundamental for our understanding of the cellular machinery, and for improving medicine and bioengineering. Over the past years, a large body of experimental data has been accumulated on who interacts with whom and in what manner. However, these data are highly heterogeneous and sometimes contradictory, noisy, and biased. Ab initio methods provide a means to a “blind” protein-protein interaction network reconstruction. Here, we report on a molecular cross-docking-based approach for the identification of protein partners. The docking algorithm uses a coarse-grained representation of the protein structures and treats them as rigid bodies. We applied the approach to a few hundred of proteins, in the unbound conformations, and we systematically investigated the influence of several key ingredients, such as the size and quality of the interfaces, and the scoring functi...
La virulence de la bacterie Gram+ Bacillus anthracis, responsable de la maladie du charbon, est d... more La virulence de la bacterie Gram+ Bacillus anthracis, responsable de la maladie du charbon, est due a la presence d'une capsule et deux toxines. Chaque toxine resulte de l'assemblage de l'antigene protecteur (PA) avec l'un des deux facteurs, letal (LF) ou oedemateux (EF), dans le cytoplasme de la cellule hote. EF est une adenylyl cyclase, qui transforme l'ATP en AMPc de maniere incontrolee, provoquant des dereglements cellulaires. Elle est activee par la calmoduline (CaM), impliquee dans de nombreuses voies de signalisation du calcium. Des structures cristallographiques et une etude par RMN ont montre que la stabilite du complexe EF-CaM depend du niveau de calcium fixe a CAM. Des simulations de dynamique moleculaire du complexe, avec 0, 2 ou 4 ions calcium, ont permis de caracteriser l'effet du calcium sur la plasticite conformationnelle des deux partenaires et de proposer un modele de l'interaction EF-CaM. L'analyse conjointe des correlations dynamiq...
BMC Bioinformatics, 2020
Background Coiled-coils are described as stable structural motifs, where two or more helices wind... more Background Coiled-coils are described as stable structural motifs, where two or more helices wind around each other. However, coiled-coils are associated with local mobility and intrinsic disorder. Intrinsically disordered regions in proteins are characterized by lack of stable secondary and tertiary structure under physiological conditions in vitro. They are increasingly recognized as important for protein function. However, characterizing their behaviour in solution and determining precisely the extent of disorder of a protein region remains challenging, both experimentally and computationally. Results In this work, we propose a computational framework to quantify the extent of disorder within a coiled-coil in solution and to help design substitutions modulating such disorder. Our method relies on the analysis of conformational ensembles generated by relatively short all-atom Molecular Dynamics (MD) simulations. We apply it to the phosphoprotein multimerisation domains (PMD) of Me...
Understanding how protein function has evolved and diversified is of great importance for human g... more Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalising the definition of splicing graph. We provide a practical solution to building parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue-regulation and conservation of AS events on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousands of genes where alternative splicing modulates the number and composition of pseudo-repeats. We have implemented our approach in ThorAxe, an efficient, versatile, and robust computational tool freely available at https://github.com/PhyloSofS-Team/thoraxe. The results are accessible and can be browsed interactively at http://www.lcqb.upmc.fr/ThorAxe.
The development and application of formal methods to biological problems imply a nonstandard asse... more The development and application of formal methods to biological problems imply a nonstandard assessment of their validity. Indeed, in this context, the verifiers or checkers of the methods are the biological data. Formal models are used to understand the functioning and emergent properties of biological systems and they are revised/improved based on their agreement with the empirical observations obtained on these systems. They also produce data that predict what has not been yet observed and that cannot be obtained by current experimental methods. This mini-symposium highlights recents mathematical developments toward elucidating biological questions or modeling complex biological systems from this perspective.
The Journal of Physical Chemistry B
PLOS Computational Biology
Interactions between proteins and nucleic acids are at the heart of many essential biological pro... more Interactions between proteins and nucleic acids are at the heart of many essential biological processes. Despite increasing structural information about how these interactions may take place, our understanding of the usage made of protein surfaces by nucleic acids is still very limited. This is in part due to the inherent complexity associated to protein surface deformability and evolution. In this work, we present a method that contributes to decipher such complexity by predicting protein-DNA interfaces and characterizing their properties. It relies on three biologically and physically meaningful descriptors, namely evolutionary conservation, physico-chemical properties and surface geometry. We carefully assessed its performance on several hundreds of protein structures and compared it to several machinelearning state-of-the-art methods. Our approach achieves a higher sensitivity compared to the other methods, with a similar precision. Importantly, we show that it is able to unravel 'hidden' binding sites by applying it to unbound protein structures and to proteins binding to DNA via multiple sites and in different conformations. It is also applicable to the detection of RNA-binding sites, without significant loss of performance. This confirms that DNA and RNA-binding sites share similar properties. Our method is implemented as a fully automated tool, JET 2 DNA , freely accessible at: http://www.lcqb.upmc.fr/JET2DNA. We also provide a new dataset of 187 protein-DNA complex structures, along with a subset of 82 associated unbound structures. The set represents the largest body of high-resolution crystallographic structures of protein-DNA complexes, use biological protein assemblies as DNA-binding units, and covers all major types of protein-DNA interactions.
Scientific Reports
Characterizing a protein mutational landscape is a very challenging problem in Biology. Many dise... more Characterizing a protein mutational landscape is a very challenging problem in Biology. Many diseaseassociated mutations do not seem to produce any effect on the global shape nor motions of the protein. Here, we use relatively short all-atom biomolecular simulations to predict mutational outcomes and we quantitatively assess the predictions on several hundreds of mutants. We perform simulations of the wild type and 175 mutants of PSD95's third PDZ domain in complex with its cognate ligand. By recording residue displacements correlations and interactions, we identify "communication pathways" and quantify them to predict the severity of the mutations. Moreover, we show that by exploiting simulations of the wild type, one can detect 80% of the positions highly sensitive to mutations with a precision of 89%. Importantly, our analysis describes the role of these positions in the inter-residue communication and dynamical architecture of the complex. We assess our approach on three different systems using data from deep mutational scanning experiments and high-throughput exome sequencing. We refer to our analysis as "infostery", from "info"-information-and "steric"arrangement of residues in space. We provide a fully automated tool, COMMA2 (www.lcqb.upmc.fr/ COMMA2), that can be used to guide medicinal research by selecting important positions/mutations. The question of which and how amino acid sequence variations (re-)shape the conformational landscape of proteins and impact their function is one of outstanding importance in Biology. Disease-associated mutations can impair protein function in various ways, by destabilizing the protein structure, by shifting the equilibrium of conformation populations, or by modulating the binding affinity of the protein for its cellular partner(s), to name a few. Recent biotechnological advances have opened the way to systematically estimating the functional consequences of single-point mutations at every position in a protein, through deep mutational scanning 1. So far, such analysis has been conducted on less than twenty proteins (see 2 for a list of proteins and associated experiments), including the third PDZ domain of the brain synaptic protein PSD-95 (PSD95 pdz3) 3 and the β-lactamase TEM-1 4,5. These experiments have revealed that a relatively small number of positions in a protein are highly sensitive to mutations 3,4 : a substitution of the amino acid at any of these highly sensitive positions by almost any other amino acid produces a deleterious phenotype. They also have stimulated the development of sequence analysis based methods to predict mutational outcomes at large scale (Fig. 1a, black arrow), some of them being much more accurate than widely used methods combining sequence and structure information 2,6. Even though sequence based methods can yield very accurate predictions of mutational phenotypic outcomes, they cannot shed light on the molecular mechanisms underlying them. Structure based methods provide a way to do so, and many studies have investigated the global and/or local effect of mutations on protein thermodynamic stability, hydrogen bond network and conformational dynamics 7-22. There are few reported cases where crystallized protein mutants provide clear insights on the effects of the mutations (e.g. p53 cancer mutations affecting the arginines in contact with DNA 21,22). However, in the vast majority of cases, the global shape of the protein remains unchanged upon mutations, even when the latter result in deleterious phenotypes 9. This is very well exemplified
Proteins: Structure, Function, and Bioinformatics
This article has been accepted for publication and undergone full peer review but has not been th... more This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process which may lead to differences between this version and the Version of Record. Please cite this article as
Blood
Context : Germline mutations in genes involved in the hypoxia sensing pathway (VHL, PHD2/EGLN1, H... more Context : Germline mutations in genes involved in the hypoxia sensing pathway (VHL, PHD2/EGLN1, HIF2A/EPAS1) predispose patients to erythrocytosis associated with normal or high serum erythropoietin level. The more frequent mutation, VHL-R200W (R200W), has been identified in homozygous carriers with a congenital erythrocytosis named Chuvash polycythemia. Survival in the Chuvash patients was found to be reduced compared to control groups due to higher rates of arterial and venous thromboses, and to haemorrhagic events. Noteworthily, a characteristic of these patients and their parents, heterozygous for the mutation, is the total absence of tumor development as the opposite of heterozygous carriers of other VHL mutations. Indeed, VHL is a tumor suppressor gene and heterozygous carriers of VHLmutations have von Hippel-Lindau disease and are at high risk of multiple tumors (e.g. CNS hemangioblastomas, pheochromocytoma, renal cell carcinoma). The absence of tumor development in patients ...
Journal of Chemical Information and Modeling
INTerface Builder (INTBuilder) is a fast, easy-to-use program to compute protein−protein interfac... more INTerface Builder (INTBuilder) is a fast, easy-to-use program to compute protein−protein interfaces. It is designed to retrieve interfaces from molecular docking software outputs in an empirically determined linear complexity. INTBuilder directly reads the output formats of popular docking programs like ATTRACT, HEX, MAXDo, and ZDOCK, as well as a more generic format and Protein Data Bank (PDB) files. It identifies interacting surfaces at both residue and atom resolutions.
Large macromolecules, including proteins and their complexes, very often adopt multiple conformat... more Large macromolecules, including proteins and their complexes, very often adopt multiple conformations. Some of them can be seen experimentally, for example with X-ray crystallography or cryo-electron microscopy. This structural heterogeneity is not occasional and is frequently linked with specific biological function. Thus, the accurate description of macromolecular conformational transitions is crucial for understanding fundamental mechanisms of life's machinery. We report on a real-time method to predict such transitions by extrapolating from instantaneous eigen-motions, computed using the normal mode analysis, to a series of twists. We demonstrate the applicability of our approach to the prediction of a wide range of motions, including large collective opening-closing transitions and conformational changes induced by partner binding. We also highlight particularly difficult cases of very small transitions between crystal and solution structures. Our method guaranties preserva...
The systematic and accurate description of protein mutational landscapes is a question of utmost ... more The systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling inter-site dependencies within biological sequences. However, state-of-the-art methods require numerous highly variable sequences and remain time consuming. Here, we present GEMME (www.lcqb.upmc.fr/GEMME), a method that overcomes these limitations by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. Assessed against 41 experimental high-throughput mutational scans, GEMME overall performs similarly or better than existing methods and runs faster by several orders of magnitude. It greatly improves predictions for viral sequences and, more generally, for very conserved families. It uses only a few biologically meaningf...