Ionel Rata - Academia.edu (original) (raw)
Papers by Ionel Rata
BMC Structural Biology, 2010
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method
Accurately modeling protein loops is critical to predicting three-dimensional structures and unde... more Accurately modeling protein loops is critical to predicting three-dimensional structures and understanding functions of many proteins. Local interactions between neighboring residue configurations contribute significantly to forming loop conformations. In this paper, we improve our statistical potential energy function (J. Phys. Chem. B 114(5): 1859-1869, 2010) for loop backbone torsion angle conformations by taking influences from the second nearest neighbor effect (SNNE) into consideration. This is based on our study showing that the second nearest neighbors along a protein sequence still have non-negligible influences on the torsion angles conformation of a loop residue while such correlations from further neighbors are much weakened. A biologically meaningful reference state is also introduced. Accuracy and sensitivity enhancements of the new loop torsion potential energy are observed on a decoy set with 4- to 12-residue loop targets.
The Journal of Physical Chemistry B, Feb 11, 2010
Native proteins have been optimized by evolution simultaneously for structure and sequence. Struc... more Native proteins have been optimized by evolution simultaneously for structure and sequence. Structural databases reflect this interdependency. In this paper, we present a new statistical potential for a reduced backbone representation that has both structure and sequence characteristics as variables. We use information from structural data available in the Protein Coil Library, selected on the basis of resolution and refinement factor. In these structures, the nonlocal interactions are randomly distributed and, thus, average out in statistics, so structural propensities due to local backbone-based interactions can be studied separately. We collect data in the form of local sequence-specific-ψ backbone dihedral pairs. From these data, we construct dihedral probability density functions (DPDFs) that quantify any adjacent-ψ pair distribution in the context of all possible combinations of local residue types. We use a probabilistic analysis to deduce how the correlations encoded in the various DPDFs as well as in residue frequencies propagate along the sequence and can be cumulated in a statistical potential capable of efficiently scoring a loop by its backbone conformation and sequence only. Our potential is able to identify with high accuracy the native structure of a loop with a given sequence among possible alternative conformations from sets of well-constructed decoys. Conversely, the potential can also be used for sequence prediction problems and is shown to score the native sequence of a given loop structure among the most fit of the possible sequence combinations. Applications for both structure prediction and sequence design are discussed.
We have conducted an extensive search of the configuration space for nineteen and twenty atom Si ... more We have conducted an extensive search of the configuration space for nineteen and twenty atom Si clusters in an attempt to determine the density of the local minima in the respective energy surfaces that are energetically close to the ground state. We did this using a combination of a fast, but accurate tight-binding method to identify the low-lying regions of the energy surfaces, and a fully first-principles density functional theory method to study the low-lying regions. We have found a number of structures that are slightly lower in energy than the structures recently proposed by Ho, et al.(K. M. Ho, et al., Nature 392), 582 (1998). The structures all lie within 0.1 eV of each other. We find that the compact structures are generally lower at the local density approximation level of theory, while the prolate structures lie lower using the generalized gradient approximation. The properties we have calculated for the various clusters include the average bond length, the HOMO-LUMO ga...
2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2010
In this article, we present a new protein structure modeling approach based on multi-scoring func... more In this article, we present a new protein structure modeling approach based on multi-scoring functions sampling. The rationale is to integrate multiple carefully-selected physicsor knowledge-based scoring functions to tolerate insensitivity and inaccuracy existing in an individual scoring function so as to improve protein structure modeling accuracy. We apply the multi-scoring function sampling approach to protein loop backbone structure modeling. Our computational results show that sampling the scoring function space of a physics-based soft-sphere potential function and a knowledge-based scoring function based on pairwise atoms distance has led to resolution improvement in the predicted decoy populations in a set of 12residue benchmark loop targets.
2013 IEEE 3rd International Conference on Computational Advances in Bio and medical Sciences (ICCABS), 2013
Modeling and Optimization in Science and Technologies, 2014
Physical Review Letters, 2000
We describe a novel method for the structural optimization of molecular systems. Similar to genet... more We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.
Journal of Chemical Information and Modeling, 2011
Accurately predicting loop structures is important for understanding functions of many proteins. ... more Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protein representations are often used to reduce the number of degrees of freedom as well as sampling computational time. However, due to implicitly considering many factors by reduced representations, the coarse-grain scoring functions may have potential insensitivity and inaccuracy, which can mislead the sampling process and consequently ignore important loop conformations. In this paper, we present a new computational sampling approach to obtain reasonable loop backbone models, so-called the Pareto Optimal Sampling (POS) method. The rationale of the POS method is to sample the function space of multiple, carefully-selected scoring functions to discover an ensemble of diversified structures yielding Pareto optimality to all sampled conformations. POS method can efficiently tolerate insensitivity and inaccuracy in individual scoring functions and thereby lead to significant accuracy improvement in loop structure prediction. We apply the POS method to a set of 4-to 12-residue loop targets using a function space composed of backbone-only Rosetta, DFIRE, and a triplet backbone dihedral potential developed in our lab. Our computational results show that in 501 out of 502 targets, the model sets generated by POS contain structure models are within subangstrom resolution. Moreover, the topranked models have Root Mean Square Deviation (RMSD) less than 1A in 96.8%, 84.1%, and 72.2% of the short (4~6 residues), medium (7~9 residues), and long (10~12) targets, respectively, when the all-atom models are generated by local optimization from the backbone models and are ranked by our recently developed Pareto Optimal Consensus (POC) method. Similar sampling effectiveness can also be found in a set of 13-residue loop targets.
Journal of Chemical Information and Modeling, 2013
The rapidly increasing number of protein crystal structures available in the Protein Data Bank (P... more The rapidly increasing number of protein crystal structures available in the Protein Data Bank (PDB) has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a context-based secondary structure potential (CSSP) for assessing the quality of predicted protein secondary structures generated by various prediction servers. CSSP is a sequence-positionspecific knowledge-based potential generated based on the potentials of mean force approach, where high-order inter-residue interactions are taken into consideration. The CSSP potential is effective in identifying secondary structure predictions with good quality. In 56% of the targets in the CB513 benchmark, the optimal CSSP potential is able to recognize the native secondary structure or a prediction with Q3 accuracy higher than 90% as best scored in the predicted secondary structures generated by 10 popularly used secondary structure prediction servers. In more than 80% of the CB513 targets, the predicted secondary structures with the lowest CSSP potential values yield higher than 80% Q3 accuracy. Similar performance of CSSP is found on the CASP9 targets as well. Moreover, our computational results also show that the CSSP potential using triplets outperforms the CSSP potential using doublets and is currently better than the CSSP potential using quartets.
Biophysical Journal, 2012
The Nuclear Pore Complex (NPC,~50 MDa) is the sole passageway for the transport of macromolecules... more The Nuclear Pore Complex (NPC,~50 MDa) is the sole passageway for the transport of macromolecules across the nuclear envelope. The NPC plays a key role in numerous critical cellular processes such as transcription, and many of its components are implicated in human diseases such as cancer. Previous work (ref 1, 2) defined the relative positions of its 456 constituent proteins (nucleoporin or Nups), based on spatial restraints derived from biophysical, electron microscopy, and proteomic data. Further elucidation of the evolutionary origin, transport mechanism, and assembly of the NPC will require higher resolution information. As part of an effort to improve upon the resolution and accuracy of the NPC structure, we set out to determine the atomic structures of the NPC components. Because it proved difficult to determine the atomic structures of whole Nups by X-ray crystallography alone, we are relying on multiple datasets that are combined computationally by our Integrative Modeling Platform (IMP) package (http://salilab.org/imp). In particular, we developed an integrative modeling approach that benefits from crystallographic structures of fragments of the protein or its homologs, Solution Small Angle X-ray Scattering (SAXS) profiles of the protein and its fragments (ref 3), NMR, and negative stain Electron Microscopy (EM) micrographs of the protein. Each dataset is converted into a set of spatial restraints on the protein structure, followed by finding a model that satisfies the restraints as well as possible using a Monte Carlo / molecular dynamics optimization procedure. The approach will be illustrated by its application to yeast Nup133. 1.
Physical Review Letters, 2000
We describe a novel method for the structural optimization of molecular systems. Similar to genet... more We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.
Physical Review Letters, 2000
We describe a novel method for the structural optimization of molecular systems. Similar to genet... more We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.
BMC Structural Biology, 2010
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method
Accurately modeling protein loops is critical to predicting three-dimensional structures and unde... more Accurately modeling protein loops is critical to predicting three-dimensional structures and understanding functions of many proteins. Local interactions between neighboring residue configurations contribute significantly to forming loop conformations. In this paper, we improve our statistical potential energy function (J. Phys. Chem. B 114(5): 1859-1869, 2010) for loop backbone torsion angle conformations by taking influences from the second nearest neighbor effect (SNNE) into consideration. This is based on our study showing that the second nearest neighbors along a protein sequence still have non-negligible influences on the torsion angles conformation of a loop residue while such correlations from further neighbors are much weakened. A biologically meaningful reference state is also introduced. Accuracy and sensitivity enhancements of the new loop torsion potential energy are observed on a decoy set with 4- to 12-residue loop targets.
The Journal of Physical Chemistry B, Feb 11, 2010
Native proteins have been optimized by evolution simultaneously for structure and sequence. Struc... more Native proteins have been optimized by evolution simultaneously for structure and sequence. Structural databases reflect this interdependency. In this paper, we present a new statistical potential for a reduced backbone representation that has both structure and sequence characteristics as variables. We use information from structural data available in the Protein Coil Library, selected on the basis of resolution and refinement factor. In these structures, the nonlocal interactions are randomly distributed and, thus, average out in statistics, so structural propensities due to local backbone-based interactions can be studied separately. We collect data in the form of local sequence-specific-ψ backbone dihedral pairs. From these data, we construct dihedral probability density functions (DPDFs) that quantify any adjacent-ψ pair distribution in the context of all possible combinations of local residue types. We use a probabilistic analysis to deduce how the correlations encoded in the various DPDFs as well as in residue frequencies propagate along the sequence and can be cumulated in a statistical potential capable of efficiently scoring a loop by its backbone conformation and sequence only. Our potential is able to identify with high accuracy the native structure of a loop with a given sequence among possible alternative conformations from sets of well-constructed decoys. Conversely, the potential can also be used for sequence prediction problems and is shown to score the native sequence of a given loop structure among the most fit of the possible sequence combinations. Applications for both structure prediction and sequence design are discussed.
We have conducted an extensive search of the configuration space for nineteen and twenty atom Si ... more We have conducted an extensive search of the configuration space for nineteen and twenty atom Si clusters in an attempt to determine the density of the local minima in the respective energy surfaces that are energetically close to the ground state. We did this using a combination of a fast, but accurate tight-binding method to identify the low-lying regions of the energy surfaces, and a fully first-principles density functional theory method to study the low-lying regions. We have found a number of structures that are slightly lower in energy than the structures recently proposed by Ho, et al.(K. M. Ho, et al., Nature 392), 582 (1998). The structures all lie within 0.1 eV of each other. We find that the compact structures are generally lower at the local density approximation level of theory, while the prolate structures lie lower using the generalized gradient approximation. The properties we have calculated for the various clusters include the average bond length, the HOMO-LUMO ga...
2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2010
In this article, we present a new protein structure modeling approach based on multi-scoring func... more In this article, we present a new protein structure modeling approach based on multi-scoring functions sampling. The rationale is to integrate multiple carefully-selected physicsor knowledge-based scoring functions to tolerate insensitivity and inaccuracy existing in an individual scoring function so as to improve protein structure modeling accuracy. We apply the multi-scoring function sampling approach to protein loop backbone structure modeling. Our computational results show that sampling the scoring function space of a physics-based soft-sphere potential function and a knowledge-based scoring function based on pairwise atoms distance has led to resolution improvement in the predicted decoy populations in a set of 12residue benchmark loop targets.
2013 IEEE 3rd International Conference on Computational Advances in Bio and medical Sciences (ICCABS), 2013
Modeling and Optimization in Science and Technologies, 2014
Physical Review Letters, 2000
We describe a novel method for the structural optimization of molecular systems. Similar to genet... more We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.
Journal of Chemical Information and Modeling, 2011
Accurately predicting loop structures is important for understanding functions of many proteins. ... more Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protein representations are often used to reduce the number of degrees of freedom as well as sampling computational time. However, due to implicitly considering many factors by reduced representations, the coarse-grain scoring functions may have potential insensitivity and inaccuracy, which can mislead the sampling process and consequently ignore important loop conformations. In this paper, we present a new computational sampling approach to obtain reasonable loop backbone models, so-called the Pareto Optimal Sampling (POS) method. The rationale of the POS method is to sample the function space of multiple, carefully-selected scoring functions to discover an ensemble of diversified structures yielding Pareto optimality to all sampled conformations. POS method can efficiently tolerate insensitivity and inaccuracy in individual scoring functions and thereby lead to significant accuracy improvement in loop structure prediction. We apply the POS method to a set of 4-to 12-residue loop targets using a function space composed of backbone-only Rosetta, DFIRE, and a triplet backbone dihedral potential developed in our lab. Our computational results show that in 501 out of 502 targets, the model sets generated by POS contain structure models are within subangstrom resolution. Moreover, the topranked models have Root Mean Square Deviation (RMSD) less than 1A in 96.8%, 84.1%, and 72.2% of the short (4~6 residues), medium (7~9 residues), and long (10~12) targets, respectively, when the all-atom models are generated by local optimization from the backbone models and are ranked by our recently developed Pareto Optimal Consensus (POC) method. Similar sampling effectiveness can also be found in a set of 13-residue loop targets.
Journal of Chemical Information and Modeling, 2013
The rapidly increasing number of protein crystal structures available in the Protein Data Bank (P... more The rapidly increasing number of protein crystal structures available in the Protein Data Bank (PDB) has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a context-based secondary structure potential (CSSP) for assessing the quality of predicted protein secondary structures generated by various prediction servers. CSSP is a sequence-positionspecific knowledge-based potential generated based on the potentials of mean force approach, where high-order inter-residue interactions are taken into consideration. The CSSP potential is effective in identifying secondary structure predictions with good quality. In 56% of the targets in the CB513 benchmark, the optimal CSSP potential is able to recognize the native secondary structure or a prediction with Q3 accuracy higher than 90% as best scored in the predicted secondary structures generated by 10 popularly used secondary structure prediction servers. In more than 80% of the CB513 targets, the predicted secondary structures with the lowest CSSP potential values yield higher than 80% Q3 accuracy. Similar performance of CSSP is found on the CASP9 targets as well. Moreover, our computational results also show that the CSSP potential using triplets outperforms the CSSP potential using doublets and is currently better than the CSSP potential using quartets.
Biophysical Journal, 2012
The Nuclear Pore Complex (NPC,~50 MDa) is the sole passageway for the transport of macromolecules... more The Nuclear Pore Complex (NPC,~50 MDa) is the sole passageway for the transport of macromolecules across the nuclear envelope. The NPC plays a key role in numerous critical cellular processes such as transcription, and many of its components are implicated in human diseases such as cancer. Previous work (ref 1, 2) defined the relative positions of its 456 constituent proteins (nucleoporin or Nups), based on spatial restraints derived from biophysical, electron microscopy, and proteomic data. Further elucidation of the evolutionary origin, transport mechanism, and assembly of the NPC will require higher resolution information. As part of an effort to improve upon the resolution and accuracy of the NPC structure, we set out to determine the atomic structures of the NPC components. Because it proved difficult to determine the atomic structures of whole Nups by X-ray crystallography alone, we are relying on multiple datasets that are combined computationally by our Integrative Modeling Platform (IMP) package (http://salilab.org/imp). In particular, we developed an integrative modeling approach that benefits from crystallographic structures of fragments of the protein or its homologs, Solution Small Angle X-ray Scattering (SAXS) profiles of the protein and its fragments (ref 3), NMR, and negative stain Electron Microscopy (EM) micrographs of the protein. Each dataset is converted into a set of spatial restraints on the protein structure, followed by finding a model that satisfies the restraints as well as possible using a Monte Carlo / molecular dynamics optimization procedure. The approach will be illustrated by its application to yeast Nup133. 1.
Physical Review Letters, 2000
We describe a novel method for the structural optimization of molecular systems. Similar to genet... more We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.
Physical Review Letters, 2000
We describe a novel method for the structural optimization of molecular systems. Similar to genet... more We describe a novel method for the structural optimization of molecular systems. Similar to genetic algorithms (GA), our approach involves an evolving population in which new members are formed by cutting and pasting operations on existing members. Unlike previous GA's, however, the population in each generation has a single parent only. This scheme has been used to optimize Si clusters with 13-23 atoms. We have found a number of new isomers that are lower in energy than any previously reported and have properties in much better agreement with experimental data.