Computational prediction of native protein ligand-binding and enzyme active site sequences - PubMed (original) (raw)

Comparative Study

. 2005 Jul 19;102(29):10153-8.

doi: 10.1073/pnas.0504023102. Epub 2005 Jul 5.

Affiliations

Comparative Study

Computational prediction of native protein ligand-binding and enzyme active site sequences

Raj Chakrabarti et al. Proc Natl Acad Sci U S A. 2005.

Abstract

Recent studies reveal that the core sequences of many proteins were nearly optimized for stability by natural evolution. Surface residues, by contrast, are not so optimized, presumably because protein function is mediated through surface interactions with other molecules. Here, we sought to determine the extent to which the sequences of protein ligand-binding and enzyme active sites could be predicted by optimization of scoring functions based on protein ligand-binding affinity rather than structural stability. Optimization of binding affinity under constraints on the folding free energy correctly predicted 83% of amino acid residues (94% similar) in the binding sites of two model receptor-ligand complexes, streptavidin-biotin and glucose-binding protein. To explore the applicability of this methodology to enzymes, we applied an identical algorithm to the active sites of diverse enzymes from the peptidase, beta-gal, and nucleotide synthase families. Although simple optimization of binding affinity reproduced the sequences of some enzyme active sites with high precision, imposition of additional, geometric constraints on side-chain conformations based on the catalytic mechanism was required in other cases. With these modifications, our sequence optimization algorithm correctly predicted 78% of residues from all of the enzymes, with 83% similar to native (90% correct, with 95% similar, excluding residues with high variability in multiple sequence alignments). Furthermore, the conformations of the selected side chains were often correctly predicted within crystallographic error. These findings suggest that simple selection pressures may have played a predominant role in determining the sequences of ligand-binding and active sites in proteins.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Comparison of native and computationally optimized active-site sequences. For each receptor-ligand or enzyme-substrate complex, residues forming essential contacts with the ligand/substrate or in the catalytic mechanism are listed (bold denotes computationally repredicted; italic denotes catalytic, conformationally optimized under constraints with fixed identity; purple denotes functionally promiscuous or displaying high variability in MSAs). Complementary moieties on substrates are listed above the native residues. Computationally predicted active site sequences are listed in the gray bars. The first sequence is that displaying the highest binding affinity while satisfying all geometric constraints. The second sequence is that displaying highest sequence identity to the native active site within the top 0.6 kcal/mol of ranked sequences. Designed number corresponds to rank in calculated sequence list. Blue amino acids, identical to native; red, isosteric to the native and engages in same mode of interaction with substrate (e.g., Tyr vs. Phe, Gln vs. Glu); green amino acids, same type as native and engaging in same mode of interaction (e.g., Asp vs. Glu, Lys vs. Arg); black, none of the above. Native energy corresponds to binding affinity of native sequence/structure after side-chain conformational optimization. Catalytic constraints: β-gal, residue 200 capable of acid/base catalysis and within 3.0 Å of C1-OH, Glu-299 within 3.5 Å of scissile C; R61 DD-peptidase, Lys-65 ε-N within 3.0 Å of Tyr-159's O, Tyr-159's O within 3.0 Å of Ser-62's O. Des, designed; Cat, catalytic; nucl, nucleophile; pim, pimelyl; ter, terminus. Results for thymidylate synthase are in Fig. 7. Ligands/substrates are listed in Table 1.

Fig. 2.

Fig. 2.

Computed (A) and MSA (B) amino acid frequencies for residue 256 of glucose-binding protein, representative of positions where the highest-affinity predicted sequence did not match the native active site sequence. Dark blue bars designate the native amino acid at that residue position. Computed frequencies are derived from the sequences displaying binding affinities within 2 kcal/mol of the tightest binding sequence (subject to catalytic constraints).

Fig. 3.

Fig. 3.

Similarity of predicted sequence distributions to native (A and B), sequence (site) entropies for predicted and MSA residue distributions (C and D), and the effect of catalytic constraints. Purple traces correspond to sequence distributions drawn from binding affinity windows of +2 kcal/mol (relative to the highest affinity predicted sequence), blue traces to +1 kcal/mol, and red traces to +1 kcal/mol with catalytic constraints. Constrained residues are depicted as heavy dots. (A) R61 DD-peptidase. (B and C) Calculated sequence distribution and site entropies for Penicillium sp. β-gal. (D) Site entropies of β-gal residues derived from MSA using an E-value cutoff of 10.

Fig. 4.

Fig. 4.

Structural accuracy of side-chain conformation prediction for correctly selected active site residues in Streptomyces R61 DD-peptidase. The rmsds were calculated over all side-chain heavy atoms. Purple bars denote conformationally optimized (but not sequence-optimized) catalytic residues.

Fig. 5.

Fig. 5.

Comparison of crystallographic and predicted active-site side-chain identities/geometries for R61 DD-peptidase (DD-peptidase) bound to the

d

-Ala-

d

-Ala peptide substrate. Crystallographic conformations of side chains directly involved in binding the ligand are shown in orange; predicted side chain conformations at these positions from the most similar high-affinity optimized sequence (Fig. 1) are shown in blue where residue identities match the native sequence and in purple where they do not. Side chains involved in catalysis but not directly in binding (and subjected to the geometric constraints listed in Fig. 1) are shown in red (crystallographic conformation) or white (predicted conformation). The substrate was fixed in its crystallographic conformation for this calculation. The prediction of Asn-123 in place of Thr-123 was corrected upon iteratively redocking the substrate to the active site and repredicting the conformations of the respective side chains.

Similar articles

Cited by

References

    1. Raha, K., Wollacott, A. M., Italia, M. J. & Desjarlais, J. R. (2000) Protein Sci. 9, 1106-1119. - PMC - PubMed
    1. Koehl, P. & Levitt, M. (2002) Proc. Natl. Acad. Sci. USA 99, 1280-1285. - PMC - PubMed
    1. Kuhlman, B. & Baker, D. (2000) Proc. Natl. Acad. Sci. USA 97, 10383-10388. - PMC - PubMed
    1. Jaramillo, A., Wernisch, L., Hery, S. & Wodak, S. J. (2002) Proc. Natl. Acad. Sci. USA 99, 13554-13559. - PMC - PubMed
    1. Voigt, C. A., Gordon, D. B. & Mayo, S. L. (2000) J. Mol. Biol. 299, 789-803. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources