Firas Khatib | UMASS Dartmouth (original) (raw)
Papers by Firas Khatib
Bioinformatics, 2009
Our focus has been on detecting topological properties that are rare in real proteins, but occur ... more Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slipknots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. Results: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect unprotein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.
Computational biology and chemistry, Jan 12, 2016
Protein structure prediction is considered as one of the most challenging and computationally int... more Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational ...
IEEE Transactions on Evolutionary Computation, 2015
High-resolution structure of a retroviral protease folded as a monomer
Proceedings of the 6th International Conference on Foundations of Digital Games - FDG '11, 2011
ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of ma... more ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of managing this complexity. One approach is to give automation tools to players. In this paper, we analyze an in-game automation tool, the Foldit cookbook, for the scientific ...
The protein structure prediction problem continues to elude scientists. Even though many new meth... more The protein structure prediction problem continues to elude scientists. Even though many new methods have been introduced, certain classes of prediction targets such as free modeling targets remain a challenge based on blind predictions in the several previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments [1]. To meet this challenge, a large-scale collaborative effort called WeFold was undertaken by thirteen labs, each with their own specialties and approaches in addressing the problem. In this talk, we will present the different methods or branches collaboratively designed and tested during the WeFold experiment, as well as their predictive ability, outcomes, and lessons learned. Independent branches involved in the collaborative effort yielded several high-ranking predictions among all group and method submissions in CASP10 for human, free modeling (template free), and refinement targets. Remarkably, two WeFold methods were able to produce t...
The protein structure prediction problem continues to elude scientists. Despite the introduction ... more The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a socialmedia based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.
Proceedings of the Fifth International Conference on the Foundations of Digital Games - FDG '10, 2010
Incorporating the individual and collective problem solving skills of non-experts into the scient... more Incorporating the individual and collective problem solving skills of non-experts into the scientific discovery process could potentially accelerate the advancement of science. This paper discusses the design process used for Foldit, a multiplayer online biochemistry game that presents players with computationally difficult protein folding problems in the form of puzzles, allowing ordinary players to gain expertise and help solve these problems. The principle challenge of designing such scientific discovery games is harnessing the enormous collective problem-solving potential of the game playing population, who have not been previously introduced to the specific problem, or, often, the entire scientific discipline. To address this challenge, we took an iterative approach to designing the game, incorporating feedback from players and biochemical experts alike. Feedback was gathered both before and after releasing the game, to create the rules, interactions, and visualizations in Foldit that maximize contributions from game players. We present several examples of how this approach guided the game's design, and allowed us to improve both the quality of the gameplay and the application of player problem-solving.
Structure, 2013
Public participation in scientific research can be a powerful supplement to more-traditional appr... more Public participation in scientific research can be a powerful supplement to more-traditional approaches. We discuss aspects of the public participation project Foldit that may help others interested in starting their own projects.
Proceedings of the National Academy of Sciences, 2011
Foldit is a multiplayer online game in which players collaborate and compete to create accurate p... more Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.
Nature Structural & Molecular Biology, 2011
Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retrovi... more Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.
Nature Biotechnology, 2012
Computational enzyme design holds promise for the production of renewable fuels, drugs and chemic... more Computational enzyme design holds promise for the production of renewable fuels, drugs and chemicals. De novo enzyme design has generated catalysts for several reactions, but with lower catalytic efficiencies than naturally occurring enzymes. Here we report the use of game-driven crowdsourcing to enhance the activity of a computationally designed enzyme through the functional remodeling of its structure. Players of the online game Foldit were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24-residue helix-turn-helix motif, including a 13-residue insertion, that increased enzyme activity >18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. These results demonstrate that human creativity can extend beyond the macroscopic challenges encountered in everyday life to molecular-scale design problems.
Nature, 2010
People exert significant amounts of problem solving effort playing computer games. Simple image-a... more People exert significant amounts of problem solving effort playing computer games. Simple image-and text-recognition tasks have been successfully crowd-sourced through gamesi , ii , iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages nonscientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodologyiv, while they compete and collaborate to optimize the computed energy. We show that top Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Bioinformatics, 2006
Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently sh... more Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method.
Acta Crystallographica Section D Biological Crystallography, 2011
PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67... more PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67, 907-914 Gilski et al. Retroviral protease 913
Bioinformatics, 2009
Our focus has been on detecting topological properties that are rare in real proteins, but occur ... more Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slipknots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. Results: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect unprotein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.
Computational biology and chemistry, Jan 12, 2016
Protein structure prediction is considered as one of the most challenging and computationally int... more Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational ...
IEEE Transactions on Evolutionary Computation, 2015
High-resolution structure of a retroviral protease folded as a monomer
Proceedings of the 6th International Conference on Foundations of Digital Games - FDG '11, 2011
ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of ma... more ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of managing this complexity. One approach is to give automation tools to players. In this paper, we analyze an in-game automation tool, the Foldit cookbook, for the scientific ...
The protein structure prediction problem continues to elude scientists. Even though many new meth... more The protein structure prediction problem continues to elude scientists. Even though many new methods have been introduced, certain classes of prediction targets such as free modeling targets remain a challenge based on blind predictions in the several previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments [1]. To meet this challenge, a large-scale collaborative effort called WeFold was undertaken by thirteen labs, each with their own specialties and approaches in addressing the problem. In this talk, we will present the different methods or branches collaboratively designed and tested during the WeFold experiment, as well as their predictive ability, outcomes, and lessons learned. Independent branches involved in the collaborative effort yielded several high-ranking predictions among all group and method submissions in CASP10 for human, free modeling (template free), and refinement targets. Remarkably, two WeFold methods were able to produce t...
The protein structure prediction problem continues to elude scientists. Despite the introduction ... more The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a socialmedia based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.
Proceedings of the Fifth International Conference on the Foundations of Digital Games - FDG '10, 2010
Incorporating the individual and collective problem solving skills of non-experts into the scient... more Incorporating the individual and collective problem solving skills of non-experts into the scientific discovery process could potentially accelerate the advancement of science. This paper discusses the design process used for Foldit, a multiplayer online biochemistry game that presents players with computationally difficult protein folding problems in the form of puzzles, allowing ordinary players to gain expertise and help solve these problems. The principle challenge of designing such scientific discovery games is harnessing the enormous collective problem-solving potential of the game playing population, who have not been previously introduced to the specific problem, or, often, the entire scientific discipline. To address this challenge, we took an iterative approach to designing the game, incorporating feedback from players and biochemical experts alike. Feedback was gathered both before and after releasing the game, to create the rules, interactions, and visualizations in Foldit that maximize contributions from game players. We present several examples of how this approach guided the game's design, and allowed us to improve both the quality of the gameplay and the application of player problem-solving.
Structure, 2013
Public participation in scientific research can be a powerful supplement to more-traditional appr... more Public participation in scientific research can be a powerful supplement to more-traditional approaches. We discuss aspects of the public participation project Foldit that may help others interested in starting their own projects.
Proceedings of the National Academy of Sciences, 2011
Foldit is a multiplayer online game in which players collaborate and compete to create accurate p... more Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.
Nature Structural & Molecular Biology, 2011
Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retrovi... more Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.
Nature Biotechnology, 2012
Computational enzyme design holds promise for the production of renewable fuels, drugs and chemic... more Computational enzyme design holds promise for the production of renewable fuels, drugs and chemicals. De novo enzyme design has generated catalysts for several reactions, but with lower catalytic efficiencies than naturally occurring enzymes. Here we report the use of game-driven crowdsourcing to enhance the activity of a computationally designed enzyme through the functional remodeling of its structure. Players of the online game Foldit were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24-residue helix-turn-helix motif, including a 13-residue insertion, that increased enzyme activity >18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. These results demonstrate that human creativity can extend beyond the macroscopic challenges encountered in everyday life to molecular-scale design problems.
Nature, 2010
People exert significant amounts of problem solving effort playing computer games. Simple image-a... more People exert significant amounts of problem solving effort playing computer games. Simple image-and text-recognition tasks have been successfully crowd-sourced through gamesi , ii , iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages nonscientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodologyiv, while they compete and collaborate to optimize the computed energy. We show that top Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
Bioinformatics, 2006
Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently sh... more Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method.
Acta Crystallographica Section D Biological Crystallography, 2011
PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67... more PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67, 907-914 Gilski et al. Retroviral protease 913