Comparative protein structure modeling by iterative alignment, model building and model assessment - PubMed (original) (raw)

Comparative Study

. 2003 Jul 15;31(14):3982-92.

doi: 10.1093/nar/gkg460.

Affiliations

Comparative Study

Comparative protein structure modeling by iterative alignment, model building and model assessment

Bino John et al. Nucleic Acids Res. 2003.

Abstract

Comparative or homology protein structure modeling is severely limited by errors in the alignment of a modeled sequence with related proteins of known three-dimensional structure. To ameliorate this problem, we have developed an automated method that optimizes both the alignment and the model implied by it. This task is achieved by a genetic algorithm protocol that starts with a set of initial alignments and then iterates through re-alignment, model building and model assessment to optimize a model assessment score. During this iterative process: (i) new alignments are constructed by application of a number of operators, such as alignment mutations and cross-overs; (ii) comparative models corresponding to these alignments are built by satisfaction of spatial restraints, as implemented in our program MODELLER; (iii) the models are assessed by a variety of criteria, partly depending on an atomic statistical potential. When testing the procedure on a very difficult set of 19 modeling targets sharing only 4-27% sequence identity with their template structures, the average final alignment accuracy increased from 37 to 45% relative to the initial alignment (the alignment accuracy was measured as the percentage of positions in the tested alignment that were identical to the reference structure-based alignment). Correspondingly, the average model accuracy increased from 43 to 54% (the model accuracy was measured as the percentage of the C(alpha) atoms of the model that were within 5 A of the corresponding C(alpha) atoms in the superposed native structure). The present method also compares favorably with two of the most successful previously described methods, PSI-BLAST and SAM. The accuracy of the final models would be increased further if a better method for ranking of the models were available.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Overview of modeling by iterative alignment, model building and model assessment. An initial set of alignments is generated using sequence profiles of the target and template sequences (steps 1 and 2). Comparative models implied by the alignments are built by MODELLER (step 4) and ranked by the GA341 score (step 5). If the predicted model accuracy is low (GA341 score <0.6), genetic algorithm operators are applied to the selected initial alignments to generate new alignments (step 7). The cycle of alignment (steps 7–9), model building (step 4) and model assessment (step 10) is continued for up to 25 iterations (step 11). A composite model assessment score is used at the end to assess the accuracy of the models corresponding to all of the representative alignments from all 25 iterations (step 13). The top model is selected as the final output from the protocol (step 13). Refer to Methods for a detailed description of the steps.

Figure 2

Figure 2

Genetic algorithm operators used in iterative alignment, model building and model assessment. The five genetic algorithm operators that transform parent alignment(s) on the left into child alignment(s) on the right are illustrated in (AE). Alignment segments shown in bold italic type are altered by the operation. See Methods for details.

Figure 3

Figure 3

Efficiency of the genetic algorithm protocol relying on an ideal fitness function (Results). CE overlap of the evolving alignments is plotted against the iteration index (step 11 in Fig. 1). The evolution is shown for one of the testing template–target pairs, the 1MOL–1CEW pair; similar results were obtained for the other testing pairs (not shown). Closed circles, the top ranking alignment; open circles, the average of the top 10 alignments.

Figure 4

Figure 4

Accuracy of the genetic algorithm protocol as a function of the optimization progress. All three panels show the averages over the 19 target–template pairs in the ‘difficult’ testing set. The model accuracy is measured both by the Cα RMSD of a model from the native structure (A) and by the native overlap (B). The alignment accuracy is measured by the CE overlap (C). Closed circles, the highest ranking model in step 10; closed triangles, the most accurate model in step 9; closed squares, the average of the 10 highest ranking models in step 10; open triangles, the most accurate model among the 10 highest ranking models in step 10; open circles, the most accurate model generated in any of the steps 1–11 up to the current iteration index; diamonds, the final model (step 13) selected using the composite score in step 12.

Figure 5

Figure 5

The statistical potential score of a model of 1LTS based on 1BOV as a function of its Cα RMSD error. The model at the crossing of the vertical and horizontal lines corresponds to the best model according to the composite model accuracy criterion.

Figure 6

Figure 6

Accuracies of the 1LTS alignment and model as a function of the optimization progress. (AC) See Figure 4 for a description of the symbols. (D) Statistical potential score. The bottom panels (ac) show models (red) of representative iterations superposed on the native structure (blue). (d) The final model superposed on the native structure. The Cα RMSD errors for these models are 10.1 (a), 3.8 (b), 4.3 (c) and 3.6 Å (d).

References

    1. Cantor C.R. and Little,D.P. (1998) Massive attack on high-throughput biology. Nature Genet., 20, 5–6. - PubMed
    1. Grunenfelder B. and Winzeler,E.A. (2002) Treasures and traps in genome-wide data sets: case examples from yeast. Nature Rev. Genet., 3, 653–661. - PubMed
    1. Sali A. and Blundell,T.L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol., 234, 779–815. - PubMed
    1. Marti-Renom M.A., Stuart,A.C., Fiser,A., Sanchez,R., Melo,F. and Sali,A. (2000) Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct., 29, 291–325. - PubMed
    1. Baker D. and Sali,A. (2001) Protein structure prediction and structural genomics. Science, 294, 93–96. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources