Shoshana Wodak - Academia.edu (original) (raw)

Papers by Shoshana Wodak

Research paper thumbnail of Transcriptional regulation of protein complexes in yeast

HAL (Le Centre pour la Communication Scientifique Directe), 2003

Transcriptional regulation of protein complexes in yeast

Multiprotein complexes play an essent... more Transcriptional regulation of protein complexes in yeast

Multiprotein complexes play an essential role in many cellular processes. But our knowledge of the mechanism of their formation, regulation and lifetimes is very limited. We investigated transcriptional regulation of protein complexes in yeast using two approaches. First, known regulons, manually curated or identified by genome-wide screens, were mapped onto the components of multiprotein complexes. The complexes comprised manually curated ones and those characterized by high-throughput analyses. Second, putative regulatory sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the basis of these motifs.

Research paper thumbnail of Submit a Topic Page to PLOS Computational Biology and Wikipedia

PLOS Computational Biology, 2018

Back in March 2012, PLOS Computational Biology launched its 'Topic Pages' project as a way to hel... more Back in March 2012, PLOS Computational Biology launched its 'Topic Pages' project as a way to help fill important gaps in Wikipedia's coverage of computational biology content and to credit authors for their contributions. Topic Pages are written in the style of a Wikipedia article and are openly and publicly peer reviewed on the PLOS Wiki before being published in our PLOS journals, with a second, editable version posted to Wikipedia. Six years on, PLOS Computational Biology has published 11 Topic Pages covering a good range of subjects, from the Hypercycle to Approximate Bayesian Computation. The published articles have been widely viewed on Wikipedia as well as in the journal and well received by the community. We are welcoming submissions for further PLOS Computational Biology Topic Pages. We are looking for topics in computational biology that are of interest to our readership, the broader scientific community, and the public at large and that are not yet covered or insufficiently covered (i.e., exist as a 'stub') in Wikipedia. Last year, PLOS Genetics joined the Topic Pages initiative, as detailed in this blog post. We are also exploring how the Topic Pages approach could be extended to include Wikidata, the community-curated database connecting concepts covered in any Wikipedia article with the Semantic Web [1]. For instance, data from more and more research-related databases are being integrated with Wikidata or its semantic core, Wikibase. This creates the need to formalize data models: How should concepts like a disease outbreak, a cell-cycle checkpoint, a sequencer, biomineralization, or a functional magnetic resonance imaging (fMRI) data set be modelled in Wikidata or Wikibase? Conversely, what workflows allow us to collect information about such concepts in Wikidata, to interlink it with related information, to validate it, and to keep it up to date? Or, how can the data from Wikidata be explored or put to use in other contexts relevant to computational biology? We are working on establishing the editorial workflows to handle such Wikidata-focused Topic Pages and would welcome submissions to test these waters. For some inspiration, we suggest taking a look at Wikidata-based tools for browsing microbial genomes [2], scholarly publications [3], or software and file formats [4]. The Author Guidelines for Wikipedia-focused Topic Pages are available here. If you've noticed a gap in Wikipedia's coverage of particular computational biology topics, we want to hear from you! Please send ideas for Topic Pages to ploscompbiol@plos.org.

Research paper thumbnail of Faculty of 1000 evaluation for Extracting insight from noisy cellular networks

F1000 - Post-publication peer review of the biomedical literature, 2018

Research paper thumbnail of Blind predictions of protein interfaces by docking calculations in CAPRI

Proteins: Structure, Function, and Bioinformatics, 2010

Reliable prediction of the amino acid residues involved in protein–protein interfaces can provide... more Reliable prediction of the amino acid residues involved in protein–protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast‐growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein–protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each...

Research paper thumbnail of Protein structure prediction by threading methods: Evaluation of current techniques

Proteins: Structure, Function, and Bioinformatics, 1995

This paper evaluates the results of a protein structure prediction contest. The predictions were ... more This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these prot...

Research paper thumbnail of Sesam: A relational database for structure and sequence of macromolecules

Proteins: Structure, Function, and Bioinformatics, 1991

A system is described that provides ways of integrating data on protein structure, sequence, and ... more A system is described that provides ways of integrating data on protein structure, sequence, and survey results, with molecular graphics and molecular mechanics software. Its major component is the relational database SESAM, presently implemented under the commercial package SYBASE. By desin, the database allows full integration—within the same data organization—of raw data on protein structure, sequence, ligands, and heterogroups, obtained from the Brookhaven Protein Databank, with pure sequence information available from other databanks such as SWISS‐PROT. It contains in addition higher level descriptions of structural and topological properties, as well as survey results, obtained by executing specialized computer programs. Aside from the very useful attribute of closely combining structural and nonstructural information, other important features distinguish it from analogous systems developed elsewhere. It includes a molecular dictionary with complete description of geometric pr...

Research paper thumbnail of The design of idealized α/β‐barrels: Analysis of β‐sheet closure requirements

Proteins: Structure, Function, and Bioinformatics, 1990

The 8‐old parallel α/β‐barrel topology is encountered in proteins that display an impressive vari... more The 8‐old parallel α/β‐barrel topology is encountered in proteins that display an impressive variety of functions, suggesting that this topology may be a rather nonspecific and stable folding motif. Consequently, this motif can be considered as an interesting framework to design novel proteins. It has been shown that the shape of the β‐sheet portion of the barrel can be approximated by a hyperboloid. This geometric object may therefore be used as a scaffold to construct an idealized eight‐standard β‐barrel. To facilitate the de novo design of such structures, a collection of modelling tools has been developed allowing secondary structure elements to be mapped onto the scaffold surface and rotation and translation operations to be performed about user defined axes while evaluating their contribution to the conformational energy of the system. These tools have been applied in a systematic study assessing the ϕ, ψ requirements to design symmetric eight standard β barrels with optimal h...

Research paper thumbnail of Recurrent αβ loop structures in TIM barrel motifs show a distinct pattern of conserved structural features

Proteins: Structure, Function, and Bioinformatics, 1992

A systematic survey of seven parallel α/β barrel protein domains, based on exhaustive structural ... more A systematic survey of seven parallel α/β barrel protein domains, based on exhaustive structural comparisons, reveals that a sizable proportion of the αβ loops in these proteins—20 out of a total of 49—belong to either one of two loop types previously described by Thornton and co‐workers. Six loops are of the αβ1 type, with one residue between the α‐helix and β‐strand, and 13 are of the αβ3 type, with three residues between the helix and the strand. Protein fragments embedding the identified loops, and termed αβ connections since they contain parts of the flanking helix and strand, have been analyzed in detail revealing that each type of connection has a distinct set of conserved structural features. The orientation of the β‐strand relative to the helix and loop portions is different owing to a very localized difference in backbone conformation. In αβ1 connections, the chain enters the β‐strand via a residue adopting an extended conformation, while in αβ3 it does so via a residue in...

Research paper thumbnail of Modelling the polypeptide backbone with ‘spare parts’ from known protein structures

"Protein Engineering, Design and Selection", 1989

An automatic procedure for building a protein polyalanine backbone from C alpha positions and &am... more An automatic procedure for building a protein polyalanine backbone from C alpha positions and 'spare parts' retrieved from a data base of 66 high-resolution protein structures is described. Protein backbones are constructed from overlapping fragments of variable length, which allows the backbone of regular secondary structure elements to be built in one block. The procedure is shown to yield backbones which compare very favourably with those from highly refined X-ray structures (r.m.s. deviation between generated and crystal structures less than 1A). The method is furthermore quite insensitive to experimental errors in C alpha positions as well as to the size of the data base, and is seen to yield valuable insight into the relationships between sequence and 3-D structure: one example on triose phosphate isomerase, a beta-barrel protein, shows that beta alpha loops can be considered as structurally more uncommon than alpha beta loops. The 'spare parts' approach is also found to be useful for general-purpose modelling of local structural changes produced by insertion or deletion of residues. It should, however, be used with caution. Crude selection criteria based solely on fragment length and geometric fit to the loop base regions yield realistic backbones in about two-thirds of the test cases (r.m.s. deviations from refined crystal structure approximately 1A). In the remaining cases, sequence information, in particular the presence of glycine residues which tend to adopt more unusual backbone conformations, must be considered to obtain comparable results.

Research paper thumbnail of Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins

Protein Engineering, Design and Selection, 1995

A fully automatic procedure for aligning two protein structures is presented. It uses as sole str... more A fully automatic procedure for aligning two protein structures is presented. It uses as sole structural similarity measure the root mean square (r.m.s.) deviation of superimposed backbone atoms (N, C alpha, C and O) and is designed to yield optimal solutions with respect to this measure. In a first step, the procedure identifies protein segments with similar conformations in both proteins. In a second step, a novel multiple linkage clustering algorithm is used to identify segment combinations which yield optimal global structure alignments. Several structure alignments can usually be obtained for a given pair of proteins, which are exploited here to define automatically the common structural core of a protein family. Furthermore, an automatic analysis of the clustering trees is described which enables detection of rigid-body movements between structure elements. To illustrate the performance of our procedure, we apply it to families of distantly related proteins. One groups the three alpha + beta proteins ubiquitin, ferredoxin and the B1-domain of protein G. Their common structure motif consists of four beta-strands and the only alpha-helix, with one strand and the helix being displaced as a rigid body relative to the remaining three beta-strands. The other family consists of beta-proteins from the Greek key group, in particular actinoxanthin, the immunoglobulin variable domain and plastocyanin. Their consensus motif, composed of five beta-strands and a turn, is identified, mostly intact, in all Greek key proteins except the trypsins, and interestingly also in three other beta-protein families, the lipocalins, the neuraminidases and the lectins. This result provides new insights into the evolutionary relationships in the very diverse group of all beta-proteins.

Research paper thumbnail of Are database-derived potentials valid for scoring both forward and inverted protein folding?

"Protein Engineering, Design and Selection", 1995

Database-derived potentials, compiled from frequencies of sequence and structure features, are of... more Database-derived potentials, compiled from frequencies of sequence and structure features, are often used for scoring the compatibility of protein sequences and conformations. It is often believed that these scores correspond to differences in free energy with, in addition, a term containing the partition function of the system. Since this function does not depend on the conformation, the potentials are considered to be valid for scoring the compatibility of different conformations with a given sequence ('forward folding'), but not of sequences with a given structure ('inverted folding'). This interpretation is questioned here. It is argued that when many body-effects, which dominate frequencies compiled from the protein database, are corrected for, the potentials approximate a physically meaningful free energy difference from which the partition function term cancels out. It is the difference between the free energy of a given sequence in a specific conformation and that of the same sequence in a denatured-like state. Two examples of denatured-like states are discussed. Depending on the considered state, the free energy difference reduces to the commonly used scoring scheme, or contains additional terms that depend on the sequence. In both cases, all the terms can be derived from sequence-structure frequencies in the database. Such free energy difference, commonly defined as the folding free energy, is a measure of protein stability and can be used for scoring both forward and inverted protein folding. The implications for the use of knowledge-based potentials in protein structure prediction are described. Finally, the difficulty of designing tests that could validate the proposed approach, and the inherent limitations of such tests, are discussed.

Research paper thumbnail of CYGD: the Comprehensive Yeast Genome Database

Nucleic Acids Research, 2004

The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for informa... more The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for information on the cellular functions of the yeast Saccharomyces cerevisiae and related species, chosen as the best understood model organism for eukaryotes. The database serves as a common resource generated by a European consortium, going beyond the provision of sequence information and functional annotations on individual genes and proteins. In addition, it provides information on the physical and functional interactions among proteins as well as other genetic elements. These cellular networks include metabolic and regulatory pathways, signal transduction and transport processes as well as co-regulated gene clusters. As more yeast genomes are published, their annotation becomes greatly facilitated using S.cerevisiae as a reference. CYGD provides a way of exploring related genomes with the aid of the S.cerevisiae genome as a backbone and SIMAP, the

Research paper thumbnail of Computer simulations of liquid water: treatment of long-range interactions

Molecular Physics, 1990

Computer simulations by the molecular-dynamics method are used to study the physical properties o... more Computer simulations by the molecular-dynamics method are used to study the physical properties of liquid water. Two three-point-charge models (SPC and TIPS) are analysed and compared using the Ewald-Kornfeld summation method to calculate long-range electrostatic interactions. Although these two models are not very different considering their geometry and energy parameters, they lead to different physical properties of liquid water. Simulations

Research paper thumbnail of Reaction pathway for the quaternary structure change in hemoglobin

Biopolymers, 1985

We perform a computer simulation of the quaternary structure change during the allosteric transit... more We perform a computer simulation of the quaternary structure change during the allosteric transition of hemoglobin. The simulation is based on a docking procedure by which αβ dimers of human hemoglobin are associated into tetramers after being rotated in various orientations. The stability of tetramers thus reconstituted is estimated from the values of a simplified energy function describing nonbonded interactions and from the area of the surface buried in dimer–dimer contacts (their interface area), which we take to represent stabilizing interactions and solvent contribution. A systematic analysis of tetramers reconstituted with twofold symmetry reveals that when the dimers have the R tertiary structure, only tetramers having R‐like quaternary structures are stable. When the dimers have the T tertiary structure, they may associate into T‐like tetramers or a variety of quaternary structures ranging from T to near R, thus tracing a plausible reaction pathway for the allosteric transi...

Research paper thumbnail of Arginine residues as stabilizing elements in proteins

Biochemistry, 1992

Site-specific substitutions of arginine for lysine in the thermostable D-xylose isomerase (XI) fr... more Site-specific substitutions of arginine for lysine in the thermostable D-xylose isomerase (XI) from Actinoplanes missouriensis are shown to impart significant heat stability enhancement in the presence of sugar substrates most probably by interfering with nonenzymatic glycation. The same substitutions are also found to increase heat stability in the absence of any sugar derivatives, where a mechanism based on prevention of glycation can no longer be invoked. This rather conservative substitution is moreover shown to improve thermostability in two other structurally unrelated proteins, human copper, zinc-superoxide dismutase (CuZnSOD) and ~-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from Bacillus subtilis. The stabilizing effect of Lys-Arg substitutions is rationalized on the basis of a detailed analysis of the crystal structures of wild-type XI and of engineered variants with Lys-Arg substitution at four distinct locations, residues 253, 309, 319, and 323. Molecular model building analysis of the structures of wild-type and mutant CuZnSOD (K9R) and GAPDH (G281K and G281R) is used to explain the observed stability enhancement in these proteins. In addition to demonstrating that even thermostable proteins can lend themselves to further stability improvement, our findings provide direct evidence that arginine residues are important stabilizing elements in proteins. Moreover, the stabilizing role of electrostatic interactions, particularly between subunits in oligomeric proteins. is documented. Enhancing protein stability by rational design has been one of the great challenges of protein engineering. Site-directed mutagenesis combined with X-ray diffraction studies as well as denaturation experiments has already been very useful in determining the contributions of specific amino acids to protein stability (Alber, 1989). From these studies several tentative rules have emerged. The contribution of buried residues to protein stability is correlated to their hydrophobicity as derived from their free energy of transfer (Matsumura et al., 1988; Kellis et al., 1988). Protein stability is more strongly affected by residues in the well-packed protein interior than by those on the surface (

Research paper thumbnail of SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model

Acta Crystallographica Section D Biological Crystallography, 1999

Research paper thumbnail of The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies

Briefings in Bioinformatics, Dec 22, 2020

Vaishali P. Waman holds a Ph.D. in Bioinformatics. She is a postdoctoral Research fellow in the O... more Vaishali P. Waman holds a Ph.D. in Bioinformatics. She is a postdoctoral Research fellow in the Orengo group, UCL, working on antimicrobial resistance and protein structural domain chopping. Neeladri Sen holds a Ph.D. in Biology. He is working as a postdoctoral fellow with Prof. Orengo, UCL on improvement of sequence alignment to model proteins and identify functional residues. Mihaly Varadi joined the PDBe team in 2015 where he worked first as a scientific programmer and more recently as the project leader of the Protein Data Bank in Europe-Knowledge Base. Antoine Daina holds a Ph.D. in pharmaceutical sciences. He is Senior Scientist in the Molecular Modeling Group at SIB, Swiss Institute of Bioinformatics, in charge of developing and applying computational methods to academic and industrial drug discovery. Shoshana J. Wodak holds a PhD in Biophysics, Columbia University, USA. She is a computational structural biologist and bioinformatician, specializing in modeling protein interactions. She directed research teams at the Free University of Brussels, EMBL-EBI and University of Toronto. Vincent Zoete holds a PhD in organic chemistry and is assistant professor in the Department of Fundamental Oncology at the University of Lausanne and Group leader at SIB, where he develops and applies CADD and in silico protein engineering methods. Sameer Velankar leads the Protein Data Bank in Europe team and is interested in macromolecular structure archiving and in the integration of macromolecular structure data with biomedical data resources. Christine Orengo is professor of Bioinformatics at University College London. Her group develop algorithms for structure comparison/classification and for protein-function prediction, also for the analysis of functional genomics data.

Research paper thumbnail of Discriminating physiological from non-physiological interfaces in structures of protein complexes: a community-wide study

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric... more Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13...

Research paper thumbnail of Who checks the checkers? four validation tools applied to eight atomic resolution structures 1 1Edited by I. A. Wilson

Journal of Molecular Biology, 1998

Eight protein crystal structures, which have been re®ned against X-ray diffraction data extending... more Eight protein crystal structures, which have been re®ned against X-ray diffraction data extending to atomic resolution, 1.2 A Ê or better, were inspected using four different validation tools, PROCHECK, PROVE, SQUID and WHATCHECK. Two general questions were addressed. (1) Do the structures imply changes in``expected'' stereochemical properties and are the target values used for restraints in the validation programs and the re®nement protocol appropriate? (2) Can errors in models be detected and how reliable are the coordinates after re®nement? Preliminary analysis by members of the network led to modi®cations both to the validation programs and to the re®nement protocols. The results of the ®nal analyses are reported here. Apparent discrepancies in cell dimensions were identi®ed. Most stereochemical properties are shown to be more tightly clustered than for lower resolution analyses. In contrast the o angle has a wider distribution. The validation software is generally available and can be accessed at servers listed at the end of the paper.

Research paper thumbnail of In-depth characterization of HINT1 pathogenic variants

bioRxiv (Cold Spring Harbor Laboratory), Dec 1, 2023

Loss-of-function variants in HINT1 were identified to cause axonal recessive peripheral neuropath... more Loss-of-function variants in HINT1 were identified to cause axonal recessive peripheral neuropathy with neuromyotonia (NMAN). Patients suffer from motor-greater-than-sensory polyneuropathy with an age of onset, mainly within the first decade of life. Currently, nearly 30 NMAN disease-causing variants have been described, predominantly in sporadic cases and small families, most of them with limited functional evidence of pathogenicity. We systematically characterized all reported pathogenic missense mutations in HINT1, aiming to dissect their underlying loss-of-function mechanism. Individual variants were mapped onto the crystal structure of the HINT1 protein, and their potential effect on protein stability was computed. These variants were grouped into three main clusters: around the catalytic pocket, at the dimer interface, and in the -sheet behind the catalytic pocket of the protein. The stability and functionality of the corresponding altered proteins were tested in vivo using HINT1 KO cells and a yeast model deficient for the orthologous gene (HNT1), providing insights into the structure-function relations. Our findings support the pathogenic character of most of the variants and uncover their differential effect on HINT1 function. Classifying the variants in three clusters sets the basis for patient stratification strategies for future therapeutic development.

Research paper thumbnail of Transcriptional regulation of protein complexes in yeast

HAL (Le Centre pour la Communication Scientifique Directe), 2003

Transcriptional regulation of protein complexes in yeast

Multiprotein complexes play an essent... more Transcriptional regulation of protein complexes in yeast

Multiprotein complexes play an essential role in many cellular processes. But our knowledge of the mechanism of their formation, regulation and lifetimes is very limited. We investigated transcriptional regulation of protein complexes in yeast using two approaches. First, known regulons, manually curated or identified by genome-wide screens, were mapped onto the components of multiprotein complexes. The complexes comprised manually curated ones and those characterized by high-throughput analyses. Second, putative regulatory sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the basis of these motifs.

Research paper thumbnail of Submit a Topic Page to PLOS Computational Biology and Wikipedia

PLOS Computational Biology, 2018

Back in March 2012, PLOS Computational Biology launched its 'Topic Pages' project as a way to hel... more Back in March 2012, PLOS Computational Biology launched its 'Topic Pages' project as a way to help fill important gaps in Wikipedia's coverage of computational biology content and to credit authors for their contributions. Topic Pages are written in the style of a Wikipedia article and are openly and publicly peer reviewed on the PLOS Wiki before being published in our PLOS journals, with a second, editable version posted to Wikipedia. Six years on, PLOS Computational Biology has published 11 Topic Pages covering a good range of subjects, from the Hypercycle to Approximate Bayesian Computation. The published articles have been widely viewed on Wikipedia as well as in the journal and well received by the community. We are welcoming submissions for further PLOS Computational Biology Topic Pages. We are looking for topics in computational biology that are of interest to our readership, the broader scientific community, and the public at large and that are not yet covered or insufficiently covered (i.e., exist as a 'stub') in Wikipedia. Last year, PLOS Genetics joined the Topic Pages initiative, as detailed in this blog post. We are also exploring how the Topic Pages approach could be extended to include Wikidata, the community-curated database connecting concepts covered in any Wikipedia article with the Semantic Web [1]. For instance, data from more and more research-related databases are being integrated with Wikidata or its semantic core, Wikibase. This creates the need to formalize data models: How should concepts like a disease outbreak, a cell-cycle checkpoint, a sequencer, biomineralization, or a functional magnetic resonance imaging (fMRI) data set be modelled in Wikidata or Wikibase? Conversely, what workflows allow us to collect information about such concepts in Wikidata, to interlink it with related information, to validate it, and to keep it up to date? Or, how can the data from Wikidata be explored or put to use in other contexts relevant to computational biology? We are working on establishing the editorial workflows to handle such Wikidata-focused Topic Pages and would welcome submissions to test these waters. For some inspiration, we suggest taking a look at Wikidata-based tools for browsing microbial genomes [2], scholarly publications [3], or software and file formats [4]. The Author Guidelines for Wikipedia-focused Topic Pages are available here. If you've noticed a gap in Wikipedia's coverage of particular computational biology topics, we want to hear from you! Please send ideas for Topic Pages to ploscompbiol@plos.org.

Research paper thumbnail of Faculty of 1000 evaluation for Extracting insight from noisy cellular networks

F1000 - Post-publication peer review of the biomedical literature, 2018

Research paper thumbnail of Blind predictions of protein interfaces by docking calculations in CAPRI

Proteins: Structure, Function, and Bioinformatics, 2010

Reliable prediction of the amino acid residues involved in protein–protein interfaces can provide... more Reliable prediction of the amino acid residues involved in protein–protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast‐growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein–protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each...

Research paper thumbnail of Protein structure prediction by threading methods: Evaluation of current techniques

Proteins: Structure, Function, and Bioinformatics, 1995

This paper evaluates the results of a protein structure prediction contest. The predictions were ... more This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these prot...

Research paper thumbnail of Sesam: A relational database for structure and sequence of macromolecules

Proteins: Structure, Function, and Bioinformatics, 1991

A system is described that provides ways of integrating data on protein structure, sequence, and ... more A system is described that provides ways of integrating data on protein structure, sequence, and survey results, with molecular graphics and molecular mechanics software. Its major component is the relational database SESAM, presently implemented under the commercial package SYBASE. By desin, the database allows full integration—within the same data organization—of raw data on protein structure, sequence, ligands, and heterogroups, obtained from the Brookhaven Protein Databank, with pure sequence information available from other databanks such as SWISS‐PROT. It contains in addition higher level descriptions of structural and topological properties, as well as survey results, obtained by executing specialized computer programs. Aside from the very useful attribute of closely combining structural and nonstructural information, other important features distinguish it from analogous systems developed elsewhere. It includes a molecular dictionary with complete description of geometric pr...

Research paper thumbnail of The design of idealized α/β‐barrels: Analysis of β‐sheet closure requirements

Proteins: Structure, Function, and Bioinformatics, 1990

The 8‐old parallel α/β‐barrel topology is encountered in proteins that display an impressive vari... more The 8‐old parallel α/β‐barrel topology is encountered in proteins that display an impressive variety of functions, suggesting that this topology may be a rather nonspecific and stable folding motif. Consequently, this motif can be considered as an interesting framework to design novel proteins. It has been shown that the shape of the β‐sheet portion of the barrel can be approximated by a hyperboloid. This geometric object may therefore be used as a scaffold to construct an idealized eight‐standard β‐barrel. To facilitate the de novo design of such structures, a collection of modelling tools has been developed allowing secondary structure elements to be mapped onto the scaffold surface and rotation and translation operations to be performed about user defined axes while evaluating their contribution to the conformational energy of the system. These tools have been applied in a systematic study assessing the ϕ, ψ requirements to design symmetric eight standard β barrels with optimal h...

Research paper thumbnail of Recurrent αβ loop structures in TIM barrel motifs show a distinct pattern of conserved structural features

Proteins: Structure, Function, and Bioinformatics, 1992

A systematic survey of seven parallel α/β barrel protein domains, based on exhaustive structural ... more A systematic survey of seven parallel α/β barrel protein domains, based on exhaustive structural comparisons, reveals that a sizable proportion of the αβ loops in these proteins—20 out of a total of 49—belong to either one of two loop types previously described by Thornton and co‐workers. Six loops are of the αβ1 type, with one residue between the α‐helix and β‐strand, and 13 are of the αβ3 type, with three residues between the helix and the strand. Protein fragments embedding the identified loops, and termed αβ connections since they contain parts of the flanking helix and strand, have been analyzed in detail revealing that each type of connection has a distinct set of conserved structural features. The orientation of the β‐strand relative to the helix and loop portions is different owing to a very localized difference in backbone conformation. In αβ1 connections, the chain enters the β‐strand via a residue adopting an extended conformation, while in αβ3 it does so via a residue in...

Research paper thumbnail of Modelling the polypeptide backbone with ‘spare parts’ from known protein structures

"Protein Engineering, Design and Selection", 1989

An automatic procedure for building a protein polyalanine backbone from C alpha positions and &am... more An automatic procedure for building a protein polyalanine backbone from C alpha positions and 'spare parts' retrieved from a data base of 66 high-resolution protein structures is described. Protein backbones are constructed from overlapping fragments of variable length, which allows the backbone of regular secondary structure elements to be built in one block. The procedure is shown to yield backbones which compare very favourably with those from highly refined X-ray structures (r.m.s. deviation between generated and crystal structures less than 1A). The method is furthermore quite insensitive to experimental errors in C alpha positions as well as to the size of the data base, and is seen to yield valuable insight into the relationships between sequence and 3-D structure: one example on triose phosphate isomerase, a beta-barrel protein, shows that beta alpha loops can be considered as structurally more uncommon than alpha beta loops. The 'spare parts' approach is also found to be useful for general-purpose modelling of local structural changes produced by insertion or deletion of residues. It should, however, be used with caution. Crude selection criteria based solely on fragment length and geometric fit to the loop base regions yield realistic backbones in about two-thirds of the test cases (r.m.s. deviations from refined crystal structure approximately 1A). In the remaining cases, sequence information, in particular the presence of glycine residues which tend to adopt more unusual backbone conformations, must be considered to obtain comparable results.

Research paper thumbnail of Optimal protein structure alignments by multiple linkage clustering: application to distantly related proteins

Protein Engineering, Design and Selection, 1995

A fully automatic procedure for aligning two protein structures is presented. It uses as sole str... more A fully automatic procedure for aligning two protein structures is presented. It uses as sole structural similarity measure the root mean square (r.m.s.) deviation of superimposed backbone atoms (N, C alpha, C and O) and is designed to yield optimal solutions with respect to this measure. In a first step, the procedure identifies protein segments with similar conformations in both proteins. In a second step, a novel multiple linkage clustering algorithm is used to identify segment combinations which yield optimal global structure alignments. Several structure alignments can usually be obtained for a given pair of proteins, which are exploited here to define automatically the common structural core of a protein family. Furthermore, an automatic analysis of the clustering trees is described which enables detection of rigid-body movements between structure elements. To illustrate the performance of our procedure, we apply it to families of distantly related proteins. One groups the three alpha + beta proteins ubiquitin, ferredoxin and the B1-domain of protein G. Their common structure motif consists of four beta-strands and the only alpha-helix, with one strand and the helix being displaced as a rigid body relative to the remaining three beta-strands. The other family consists of beta-proteins from the Greek key group, in particular actinoxanthin, the immunoglobulin variable domain and plastocyanin. Their consensus motif, composed of five beta-strands and a turn, is identified, mostly intact, in all Greek key proteins except the trypsins, and interestingly also in three other beta-protein families, the lipocalins, the neuraminidases and the lectins. This result provides new insights into the evolutionary relationships in the very diverse group of all beta-proteins.

Research paper thumbnail of Are database-derived potentials valid for scoring both forward and inverted protein folding?

"Protein Engineering, Design and Selection", 1995

Database-derived potentials, compiled from frequencies of sequence and structure features, are of... more Database-derived potentials, compiled from frequencies of sequence and structure features, are often used for scoring the compatibility of protein sequences and conformations. It is often believed that these scores correspond to differences in free energy with, in addition, a term containing the partition function of the system. Since this function does not depend on the conformation, the potentials are considered to be valid for scoring the compatibility of different conformations with a given sequence ('forward folding'), but not of sequences with a given structure ('inverted folding'). This interpretation is questioned here. It is argued that when many body-effects, which dominate frequencies compiled from the protein database, are corrected for, the potentials approximate a physically meaningful free energy difference from which the partition function term cancels out. It is the difference between the free energy of a given sequence in a specific conformation and that of the same sequence in a denatured-like state. Two examples of denatured-like states are discussed. Depending on the considered state, the free energy difference reduces to the commonly used scoring scheme, or contains additional terms that depend on the sequence. In both cases, all the terms can be derived from sequence-structure frequencies in the database. Such free energy difference, commonly defined as the folding free energy, is a measure of protein stability and can be used for scoring both forward and inverted protein folding. The implications for the use of knowledge-based potentials in protein structure prediction are described. Finally, the difficulty of designing tests that could validate the proposed approach, and the inherent limitations of such tests, are discussed.

Research paper thumbnail of CYGD: the Comprehensive Yeast Genome Database

Nucleic Acids Research, 2004

The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for informa... more The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for information on the cellular functions of the yeast Saccharomyces cerevisiae and related species, chosen as the best understood model organism for eukaryotes. The database serves as a common resource generated by a European consortium, going beyond the provision of sequence information and functional annotations on individual genes and proteins. In addition, it provides information on the physical and functional interactions among proteins as well as other genetic elements. These cellular networks include metabolic and regulatory pathways, signal transduction and transport processes as well as co-regulated gene clusters. As more yeast genomes are published, their annotation becomes greatly facilitated using S.cerevisiae as a reference. CYGD provides a way of exploring related genomes with the aid of the S.cerevisiae genome as a backbone and SIMAP, the

Research paper thumbnail of Computer simulations of liquid water: treatment of long-range interactions

Molecular Physics, 1990

Computer simulations by the molecular-dynamics method are used to study the physical properties o... more Computer simulations by the molecular-dynamics method are used to study the physical properties of liquid water. Two three-point-charge models (SPC and TIPS) are analysed and compared using the Ewald-Kornfeld summation method to calculate long-range electrostatic interactions. Although these two models are not very different considering their geometry and energy parameters, they lead to different physical properties of liquid water. Simulations

Research paper thumbnail of Reaction pathway for the quaternary structure change in hemoglobin

Biopolymers, 1985

We perform a computer simulation of the quaternary structure change during the allosteric transit... more We perform a computer simulation of the quaternary structure change during the allosteric transition of hemoglobin. The simulation is based on a docking procedure by which αβ dimers of human hemoglobin are associated into tetramers after being rotated in various orientations. The stability of tetramers thus reconstituted is estimated from the values of a simplified energy function describing nonbonded interactions and from the area of the surface buried in dimer–dimer contacts (their interface area), which we take to represent stabilizing interactions and solvent contribution. A systematic analysis of tetramers reconstituted with twofold symmetry reveals that when the dimers have the R tertiary structure, only tetramers having R‐like quaternary structures are stable. When the dimers have the T tertiary structure, they may associate into T‐like tetramers or a variety of quaternary structures ranging from T to near R, thus tracing a plausible reaction pathway for the allosteric transi...

Research paper thumbnail of Arginine residues as stabilizing elements in proteins

Biochemistry, 1992

Site-specific substitutions of arginine for lysine in the thermostable D-xylose isomerase (XI) fr... more Site-specific substitutions of arginine for lysine in the thermostable D-xylose isomerase (XI) from Actinoplanes missouriensis are shown to impart significant heat stability enhancement in the presence of sugar substrates most probably by interfering with nonenzymatic glycation. The same substitutions are also found to increase heat stability in the absence of any sugar derivatives, where a mechanism based on prevention of glycation can no longer be invoked. This rather conservative substitution is moreover shown to improve thermostability in two other structurally unrelated proteins, human copper, zinc-superoxide dismutase (CuZnSOD) and ~-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from Bacillus subtilis. The stabilizing effect of Lys-Arg substitutions is rationalized on the basis of a detailed analysis of the crystal structures of wild-type XI and of engineered variants with Lys-Arg substitution at four distinct locations, residues 253, 309, 319, and 323. Molecular model building analysis of the structures of wild-type and mutant CuZnSOD (K9R) and GAPDH (G281K and G281R) is used to explain the observed stability enhancement in these proteins. In addition to demonstrating that even thermostable proteins can lend themselves to further stability improvement, our findings provide direct evidence that arginine residues are important stabilizing elements in proteins. Moreover, the stabilizing role of electrostatic interactions, particularly between subunits in oligomeric proteins. is documented. Enhancing protein stability by rational design has been one of the great challenges of protein engineering. Site-directed mutagenesis combined with X-ray diffraction studies as well as denaturation experiments has already been very useful in determining the contributions of specific amino acids to protein stability (Alber, 1989). From these studies several tentative rules have emerged. The contribution of buried residues to protein stability is correlated to their hydrophobicity as derived from their free energy of transfer (Matsumura et al., 1988; Kellis et al., 1988). Protein stability is more strongly affected by residues in the well-packed protein interior than by those on the surface (

Research paper thumbnail of SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model

Acta Crystallographica Section D Biological Crystallography, 1999

Research paper thumbnail of The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies

Briefings in Bioinformatics, Dec 22, 2020

Vaishali P. Waman holds a Ph.D. in Bioinformatics. She is a postdoctoral Research fellow in the O... more Vaishali P. Waman holds a Ph.D. in Bioinformatics. She is a postdoctoral Research fellow in the Orengo group, UCL, working on antimicrobial resistance and protein structural domain chopping. Neeladri Sen holds a Ph.D. in Biology. He is working as a postdoctoral fellow with Prof. Orengo, UCL on improvement of sequence alignment to model proteins and identify functional residues. Mihaly Varadi joined the PDBe team in 2015 where he worked first as a scientific programmer and more recently as the project leader of the Protein Data Bank in Europe-Knowledge Base. Antoine Daina holds a Ph.D. in pharmaceutical sciences. He is Senior Scientist in the Molecular Modeling Group at SIB, Swiss Institute of Bioinformatics, in charge of developing and applying computational methods to academic and industrial drug discovery. Shoshana J. Wodak holds a PhD in Biophysics, Columbia University, USA. She is a computational structural biologist and bioinformatician, specializing in modeling protein interactions. She directed research teams at the Free University of Brussels, EMBL-EBI and University of Toronto. Vincent Zoete holds a PhD in organic chemistry and is assistant professor in the Department of Fundamental Oncology at the University of Lausanne and Group leader at SIB, where he develops and applies CADD and in silico protein engineering methods. Sameer Velankar leads the Protein Data Bank in Europe team and is interested in macromolecular structure archiving and in the integration of macromolecular structure data with biomedical data resources. Christine Orengo is professor of Bioinformatics at University College London. Her group develop algorithms for structure comparison/classification and for protein-function prediction, also for the analysis of functional genomics data.

Research paper thumbnail of Discriminating physiological from non-physiological interfaces in structures of protein complexes: a community-wide study

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric... more Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13...

Research paper thumbnail of Who checks the checkers? four validation tools applied to eight atomic resolution structures 1 1Edited by I. A. Wilson

Journal of Molecular Biology, 1998

Eight protein crystal structures, which have been re®ned against X-ray diffraction data extending... more Eight protein crystal structures, which have been re®ned against X-ray diffraction data extending to atomic resolution, 1.2 A Ê or better, were inspected using four different validation tools, PROCHECK, PROVE, SQUID and WHATCHECK. Two general questions were addressed. (1) Do the structures imply changes in``expected'' stereochemical properties and are the target values used for restraints in the validation programs and the re®nement protocol appropriate? (2) Can errors in models be detected and how reliable are the coordinates after re®nement? Preliminary analysis by members of the network led to modi®cations both to the validation programs and to the re®nement protocols. The results of the ®nal analyses are reported here. Apparent discrepancies in cell dimensions were identi®ed. Most stereochemical properties are shown to be more tightly clustered than for lower resolution analyses. In contrast the o angle has a wider distribution. The validation software is generally available and can be accessed at servers listed at the end of the paper.

Research paper thumbnail of In-depth characterization of HINT1 pathogenic variants

bioRxiv (Cold Spring Harbor Laboratory), Dec 1, 2023

Loss-of-function variants in HINT1 were identified to cause axonal recessive peripheral neuropath... more Loss-of-function variants in HINT1 were identified to cause axonal recessive peripheral neuropathy with neuromyotonia (NMAN). Patients suffer from motor-greater-than-sensory polyneuropathy with an age of onset, mainly within the first decade of life. Currently, nearly 30 NMAN disease-causing variants have been described, predominantly in sporadic cases and small families, most of them with limited functional evidence of pathogenicity. We systematically characterized all reported pathogenic missense mutations in HINT1, aiming to dissect their underlying loss-of-function mechanism. Individual variants were mapped onto the crystal structure of the HINT1 protein, and their potential effect on protein stability was computed. These variants were grouped into three main clusters: around the catalytic pocket, at the dimer interface, and in the -sheet behind the catalytic pocket of the protein. The stability and functionality of the corresponding altered proteins were tested in vivo using HINT1 KO cells and a yeast model deficient for the orthologous gene (HNT1), providing insights into the structure-function relations. Our findings support the pathogenic character of most of the variants and uncover their differential effect on HINT1 function. Classifying the variants in three clusters sets the basis for patient stratification strategies for future therapeutic development.