Firas Khatib | UMASS Dartmouth (original) (raw)

Papers by Firas Khatib

Research paper thumbnail of Pokefind: a novel topological filter for use with protein structure prediction

Bioinformatics, 2009

Our focus has been on detecting topological properties that are rare in real proteins, but occur ... more Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slipknots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. Results: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect unprotein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.

Research paper thumbnail of Guided macro-mutation in a graded energy based genetic algorithm for protein structure prediction

Computational biology and chemistry, Jan 12, 2016

Protein structure prediction is considered as one of the most challenging and computationally int... more Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational ...

Research paper thumbnail of An Enhanced Genetic Algorithm for Ab initio Protein Structure Prediction

IEEE Transactions on Evolutionary Computation, 2015

Research paper thumbnail of research papers Acta Crystallographica Section D Biological

High-resolution structure of a retroviral protease folded as a monomer

Research paper thumbnail of Analysis of social gameplay macros in the Foldit cookbook

Proceedings of the 6th International Conference on Foundations of Digital Games - FDG '11, 2011

ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of ma... more ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of managing this complexity. One approach is to give automation tools to players. In this paper, we analyze an in-game automation tool, the Foldit cookbook, for the scientific ...

Research paper thumbnail of 313371 Wefold: A Collaborative Protein Structure Prediction Experiment

The protein structure prediction problem continues to elude scientists. Even though many new meth... more The protein structure prediction problem continues to elude scientists. Even though many new methods have been introduced, certain classes of prediction targets such as free modeling targets remain a challenge based on blind predictions in the several previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments [1]. To meet this challenge, a large-scale collaborative effort called WeFold was undertaken by thirteen labs, each with their own specialties and approaches in addressing the problem. In this talk, we will present the different methods or branches collaboratively designed and tested during the WeFold experiment, as well as their predictive ability, outcomes, and lessons learned. Independent branches involved in the collaborative effort yielded several high-ranking predictions among all group and method submissions in CASP10 for human, free modeling (template free), and refinement targets. Remarkably, two WeFold methods were able to produce t...

Research paper thumbnail of WeFold Paper Final

The protein structure prediction problem continues to elude scientists. Despite the introduction ... more The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a socialmedia based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.

Research paper thumbnail of The challenge of designing scientific discovery games

Proceedings of the Fifth International Conference on the Foundations of Digital Games - FDG '10, 2010

Incorporating the individual and collective problem solving skills of non-experts into the scient... more Incorporating the individual and collective problem solving skills of non-experts into the scientific discovery process could potentially accelerate the advancement of science. This paper discusses the design process used for Foldit, a multiplayer online biochemistry game that presents players with computationally difficult protein folding problems in the form of puzzles, allowing ordinary players to gain expertise and help solve these problems. The principle challenge of designing such scientific discovery games is harnessing the enormous collective problem-solving potential of the game playing population, who have not been previously introduced to the specific problem, or, often, the entire scientific discipline. To address this challenge, we took an iterative approach to designing the game, incorporating feedback from players and biochemical experts alike. Feedback was gathered both before and after releasing the game, to create the rules, interactions, and visualizations in Foldit that maximize contributions from game players. We present several examples of how this approach guided the game's design, and allowed us to improve both the quality of the gameplay and the application of player problem-solving.

Research paper thumbnail of Increasing Public Involvement in Structural Biology

Structure, 2013

Public participation in scientific research can be a powerful supplement to more-traditional appr... more Public participation in scientific research can be a powerful supplement to more-traditional approaches. We discuss aspects of the public participation project Foldit that may help others interested in starting their own projects.

Research paper thumbnail of Algorithm discovery by protein folding game players

Proceedings of the National Academy of Sciences, 2011

Foldit is a multiplayer online game in which players collaborate and compete to create accurate p... more Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.

Research paper thumbnail of Crystal structure of a monomeric retroviral protease solved by protein folding game players

Nature Structural & Molecular Biology, 2011

Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retrovi... more Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.

Research paper thumbnail of Increased Diels-Alderase activity through backbone remodeling guided by Foldit players

Nature Biotechnology, 2012

Computational enzyme design holds promise for the production of renewable fuels, drugs and chemic... more Computational enzyme design holds promise for the production of renewable fuels, drugs and chemicals. De novo enzyme design has generated catalysts for several reactions, but with lower catalytic efficiencies than naturally occurring enzymes. Here we report the use of game-driven crowdsourcing to enhance the activity of a computationally designed enzyme through the functional remodeling of its structure. Players of the online game Foldit were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24-residue helix-turn-helix motif, including a 13-residue insertion, that increased enzyme activity >18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. These results demonstrate that human creativity can extend beyond the macroscopic challenges encountered in everyday life to molecular-scale design problems.

Research paper thumbnail of Predicting protein structures with a multiplayer online game

Nature, 2010

People exert significant amounts of problem solving effort playing computer games. Simple image-a... more People exert significant amounts of problem solving effort playing computer games. Simple image-and text-recognition tasks have been successfully crowd-sourced through gamesi , ii , iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages nonscientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodologyiv, while they compete and collaborate to optimize the computed energy. We show that top Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

Research paper thumbnail of Rapid knot detection and application to protein structure prediction

Bioinformatics, 2006

Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently sh... more Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method.

Research paper thumbnail of High-resolution structure of a retroviral protease folded as a monomer

Acta Crystallographica Section D Biological Crystallography, 2011

PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67... more PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67, 907-914 Gilski et al. Retroviral protease 913

Research paper thumbnail of WeFold: A coopetition for protein structure prediction

Research paper thumbnail of Pokefind: a novel topological filter for use with protein structure prediction

Bioinformatics, 2009

Our focus has been on detecting topological properties that are rare in real proteins, but occur ... more Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slipknots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. Results: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect unprotein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.

Research paper thumbnail of Guided macro-mutation in a graded energy based genetic algorithm for protein structure prediction

Computational biology and chemistry, Jan 12, 2016

Protein structure prediction is considered as one of the most challenging and computationally int... more Protein structure prediction is considered as one of the most challenging and computationally intractable combinatorial problem. Thus, the efficient modeling of convoluted search space, the clever use of energy functions, and more importantly, the use of effective sampling algorithms become crucial to address this problem. For protein structure modeling, an off-lattice model provides limited scopes to exercise and evaluate the algorithmic developments due to its astronomically large set of data-points. In contrast, an on-lattice model widens the scopes and permits studying the relatively larger proteins because of its finite set of data-points. In this work, we took the full advantage of an on-lattice model by using a face-centered-cube lattice that has the highest packing density with the maximum degree of freedom. We proposed a graded energy-strategically mixes the Miyazawa-Jernigan (MJ) energy with the hydrophobic-polar (HP) energy-based genetic algorithm (GA) for conformational ...

Research paper thumbnail of An Enhanced Genetic Algorithm for Ab initio Protein Structure Prediction

IEEE Transactions on Evolutionary Computation, 2015

Research paper thumbnail of research papers Acta Crystallographica Section D Biological

High-resolution structure of a retroviral protease folded as a monomer

Research paper thumbnail of Analysis of social gameplay macros in the Foldit cookbook

Proceedings of the 6th International Conference on Foundations of Digital Games - FDG '11, 2011

ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of ma... more ABSTRACT As games grow in complexity, gameplay needs to provide players with powerful means of managing this complexity. One approach is to give automation tools to players. In this paper, we analyze an in-game automation tool, the Foldit cookbook, for the scientific ...

Research paper thumbnail of 313371 Wefold: A Collaborative Protein Structure Prediction Experiment

The protein structure prediction problem continues to elude scientists. Even though many new meth... more The protein structure prediction problem continues to elude scientists. Even though many new methods have been introduced, certain classes of prediction targets such as free modeling targets remain a challenge based on blind predictions in the several previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments [1]. To meet this challenge, a large-scale collaborative effort called WeFold was undertaken by thirteen labs, each with their own specialties and approaches in addressing the problem. In this talk, we will present the different methods or branches collaboratively designed and tested during the WeFold experiment, as well as their predictive ability, outcomes, and lessons learned. Independent branches involved in the collaborative effort yielded several high-ranking predictions among all group and method submissions in CASP10 for human, free modeling (template free), and refinement targets. Remarkably, two WeFold methods were able to produce t...

Research paper thumbnail of WeFold Paper Final

The protein structure prediction problem continues to elude scientists. Despite the introduction ... more The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a socialmedia based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.

Research paper thumbnail of The challenge of designing scientific discovery games

Proceedings of the Fifth International Conference on the Foundations of Digital Games - FDG '10, 2010

Incorporating the individual and collective problem solving skills of non-experts into the scient... more Incorporating the individual and collective problem solving skills of non-experts into the scientific discovery process could potentially accelerate the advancement of science. This paper discusses the design process used for Foldit, a multiplayer online biochemistry game that presents players with computationally difficult protein folding problems in the form of puzzles, allowing ordinary players to gain expertise and help solve these problems. The principle challenge of designing such scientific discovery games is harnessing the enormous collective problem-solving potential of the game playing population, who have not been previously introduced to the specific problem, or, often, the entire scientific discipline. To address this challenge, we took an iterative approach to designing the game, incorporating feedback from players and biochemical experts alike. Feedback was gathered both before and after releasing the game, to create the rules, interactions, and visualizations in Foldit that maximize contributions from game players. We present several examples of how this approach guided the game's design, and allowed us to improve both the quality of the gameplay and the application of player problem-solving.

Research paper thumbnail of Increasing Public Involvement in Structural Biology

Structure, 2013

Public participation in scientific research can be a powerful supplement to more-traditional appr... more Public participation in scientific research can be a powerful supplement to more-traditional approaches. We discuss aspects of the public participation project Foldit that may help others interested in starting their own projects.

Research paper thumbnail of Algorithm discovery by protein folding game players

Proceedings of the National Academy of Sciences, 2011

Foldit is a multiplayer online game in which players collaborate and compete to create accurate p... more Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.

Research paper thumbnail of Crystal structure of a monomeric retroviral protease solved by protein folding game players

Nature Structural & Molecular Biology, 2011

Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retrovi... more Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.

Research paper thumbnail of Increased Diels-Alderase activity through backbone remodeling guided by Foldit players

Nature Biotechnology, 2012

Computational enzyme design holds promise for the production of renewable fuels, drugs and chemic... more Computational enzyme design holds promise for the production of renewable fuels, drugs and chemicals. De novo enzyme design has generated catalysts for several reactions, but with lower catalytic efficiencies than naturally occurring enzymes. Here we report the use of game-driven crowdsourcing to enhance the activity of a computationally designed enzyme through the functional remodeling of its structure. Players of the online game Foldit were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24-residue helix-turn-helix motif, including a 13-residue insertion, that increased enzyme activity >18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. These results demonstrate that human creativity can extend beyond the macroscopic challenges encountered in everyday life to molecular-scale design problems.

Research paper thumbnail of Predicting protein structures with a multiplayer online game

Nature, 2010

People exert significant amounts of problem solving effort playing computer games. Simple image-a... more People exert significant amounts of problem solving effort playing computer games. Simple image-and text-recognition tasks have been successfully crowd-sourced through gamesi , ii , iii, but it is not clear if more complex scientific problems can be similarly solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages nonscientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodologyiv, while they compete and collaborate to optimize the computed energy. We show that top Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.

Research paper thumbnail of Rapid knot detection and application to protein structure prediction

Bioinformatics, 2006

Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently sh... more Motivation: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method.

Research paper thumbnail of High-resolution structure of a retroviral protease folded as a monomer

Acta Crystallographica Section D Biological Crystallography, 2011

PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67... more PDB Reference: monomeric M-PMV retroviral protease, 3sqf. research papers Acta Cryst. (2011). D67, 907-914 Gilski et al. Retroviral protease 913

Research paper thumbnail of WeFold: A coopetition for protein structure prediction