Predicting absolute ligand binding free energies to a simple model site (original) (raw)

J Mol Biol. Author manuscript; available in PMC 2008 Aug 24.

Published in final edited form as:

PMCID: PMC2104542

NIHMSID: NIHMS28543

David L. Mobley,a,⋆ Alan P. Graves,b,⋆ John D. Chodera,b,⋆⋆ Andrea C. McReynolds,a Brian K. Shoichet,a,* and Ken A. Dilla,*

David L. Mobley

a Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, CA 94143-2518

Alan P. Graves

b Graduate Group in Biophysics, University of California at San Francisco, San Francisco, CA 94143-2518

John D. Chodera

b Graduate Group in Biophysics, University of California at San Francisco, San Francisco, CA 94143-2518

Andrea C. McReynolds

a Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, CA 94143-2518

Brian K. Shoichet

a Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, CA 94143-2518

Ken A. Dill

a Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, CA 94143-2518

b Graduate Group in Biophysics, University of California at San Francisco, San Francisco, CA 94143-2518

⋆These authors contributed equally to this work

⋆⋆Current address: Department of Chemistry, Stanford University, Stanford, CA 94305-5080

Abstract

A central challenge in structure-based ligand design is the accurate prediction of binding free energies. Here, we apply alchemical free energy calculations in explicit solvent to predict ligand binding in a model cavity in T4 lysozyme. Even in this simple site, there are challenges. We made systematic improvements, beginning with single poses from docking, then including multiple poses, additional protein conformational changes, and using an improved charge model. Computed absolute binding free energies had an RMS error of 1.9 kcal/mol relative to previously determined experimental values. In blind prospective tests, the methods correctly discriminated between several true ligands and decoys in a set of putative binders identified by docking. In these prospective tests, the RMS error in predicted binding free energies relative to those subsequently determined experimentally was only 0.6 kcal/mol. X-ray crystal structures of the new ligands bound in the cavity corresponded closely to predictions from the free energy calculations, but sometimes differed from those predicted by docking. Finally, we examined the impact of holding the protein rigid, as in docking, with a view to learning how approximations made in docking affect accuracy and how they may be improved.

Keywords: free energy calculation, docking, alchemical free energy, conformational change, isothermal titration calorimetry, cavity binding site

1 Introduction

A central problem in ligand discovery and design is the prediction of ligand–receptor binding free energies. Current methods cover a spectrum of physical rigor and computational cost. Among physics-based methods, physics-based docking and scoring is computationally the least expensive. In this approach, ligand orientations (poses) are assigned scores, related to the intermolecular interaction energy, and ranked relative to other poses and other ligands1. A few scoring functions include an explicit or implicit estimate of the desolvation free energy of the receptor and ligand2. Receptor flexibility, strain energies3,1, and various entropies are usually neglected, as is any reference to the unbound protein and ligand states. These approximations put estimation of binding affinities well out of the reach of docking methods, although these methods can often correctly rank-order candidate molecules for testing.

At a higher level of theory are MM-GBSA/PBSA methods4,5,6. These methods estimate the absolute free energies of bound and unbound reference states. Enthalpies are estimated using average energies from a molecular mechanics force field, and combined with an entropy estimate and a solvation free energy from an implicit solvent model. The difficulty is that the binding free energy is a small difference between very large absolute free energies, requiring either very high accuracy in computing these large numbers, or cancellation of errors. Thus, while these approaches have had successes5,6, they also have several drawbacks, such as sensitivity to details of the implicit solvent model used7,8 and to the method used for estimating the entropy term. Such methods perform poorly on some test sets9,10.

At the highest level of rigor are various free energy methods, including the alchemical free energy calculations described below, and PMF-based methods11,12 (for recent reviews of free energy methods, see Refs.13,14,15). Here, we focus on alchemical free energy methods, which evaluate ratios of partition functions to estimate binding free energies, and thus include entropic and other contributions neglected at lower levels of theory. These methods, combined with some theoretical developments first laid out in the mid-1990s16,17,18 and refined later19, now allow absolute binding free energies to be computed rigorously and exactly, provided that the molecular mechanics forcefield used accurately describes the underlying physics, and that enough sampling can be performed that the estimates of the relevant thermodynamic averages are converged20.

If, in principle, alchemical free energy calculations allow for exact prediction of binding free energies, the requirements of accurate force fields and adequate sampling introduce error into the computed free energies. This error is often difficult to isolate in the complicated environment of protein active sites. In such sites, failures of sampling or force fields are exacerbated by binding-induced conformational changes, titratable groups, metal ions, and ordered waters, among other complications. Furthermore, when sampling is inadequate, alchemical free energy methods can easily give biased results; for example, computed free energies are often sensitive to the choice of the initial receptor or ligand structure21,22,23,24,25,26,20.

Here, to isolate sources of error, we study a highly simplified binding site using alchemical free energy methods and molecular dynamics. We focus on the binding of small aromatic ligands to the small, buried hydrophobic binding site in an engineered mutant of T4 lysozyme (the L99A site; Figure 1) that has been studied extensively experimentally27,28,29,30,31,32,33, with docking methods31,33, and in some previous computational free energy studies18,34,19,24. Here, we systematically evaluate the effect of various approximations on computed binding free energies. This model binding site provides a good starting point because it is simple and has been thoroughly characterized experimentally.

The model hydrophobic binding site in the L99A mutant of T4 lysozyme

The enclosed molecular surface of the cavity is shown (brown) as is the crystallographic geometry of a bound benzene ligand (green), within the context of the overall structure of T4 lysozyme (green ribbons, PDB code 181L). The sidechain of Met102 is also shown for reference.

A second advantage of this model binding site also is that it provides an excellent opportunity for prospective predictions, since it is relatively easy to find new compounds that bind31. This is valuable, because it can be far easier to suggest explanations for previous observations than to actually make new predictions, and predictive ability provides a fundamental test for methods.

2 Results

2.1 Overview

Here, we performed two sets of studies: Retrospective, in which we studied binding of ligands with previously measured affinities, and prospective, in which we predicted, in a blind test, the binding modes and affinities of several previously uncharacterized small molecules. After making predictions, we tested them experimentally, using isothermal titration calorimetry to measure affinities and X-ray crystallography to determine structures of the complexes.

2.2 Retrospective studies: Comparison with previous experimental results

We first computed binding free energies for a test set of 13 small neutral compounds using alchemical free energy calculations, as described in Methods (Table 1). Of these, binding affinities for 11 had previously been measured by isothermal titration calorimetry30, and two had previously been determined not to bind more strongly than an affinity of 10 mM using a thermal denaturation assay29,33.

2.2.1 Binding affinities are underestimated from single docking poses

We started with a simple approach. We used the best-scoring docking pose for each compound as a starting structure from which to simply compute binding free energies using our standard free energy calculation protocol discussed in Methods. We previously found that this single-pose approach often results in ligands remaining trapped in the vicinity of their starting orientation on simulation timescales20. Thus, with this approach, the free energy calculations effectively become an expensive re-scoring of docking poses, including conformational averaging and entropic effects, but only for a single orientation. We present results using this approach as ΔGsingleo (Table 1). In keeping with the approximations typically made in docking, we consider only single orientations for these results, meaning that we also neglect symmetry-equivalent orientations for molecules like toluene, phenol, and benzene, which in reality also contribute to binding17,19,24,20. The RMS error for ΔGsingleo relative to experiment is 3.51±0.04 kcal/mol, and the correlation coefficient (R) between computed free energies and experiment is 0.51±0.05. (These RMS and correlation calculations do not include the nonbinders, since free energies of association for these are not known.) (Figure 2(a)). This approach underestimates all the binding affinities. This is likely due to undersampling, a failure to adequately sample the most optimal binding conformations. Previous work had suggested this approach would fail in the case where the best docking pose is not the orientation that actually contributes most to binding20, so this outcome was expected.

Calculated binding free energies compared with experiment

Calculated and experimental binding free energies are shown with error bars; the calculated error bars represent one standard deviation. The two points shown as larger diamonds are the non-ligands phenol and 2-fluorobenzaldehyde; for these, only a lower limit on the experimental binding free energy is known, as denoted by a large experimental error bar to the right. The diagonal x = y line denotes perfect agreement with experiment. (a) Calculated ΔGsingleo, single-orientation binding free energies, including only the contribution from the single best docking orientation. (b) ΔGcalc.o, binding free energies, including all relevant ligand orientations, and contributions from releasing Val111 from its kinetic confinement.

Table 1

Calculated and experimental binding free energies for ligands of the apolar binding site considered here. Experimental values (denoted by ΔGexp.o) are from Ref.30, except for 2-fluorobenzaldehyde and phenol, where no binding was observed (by a CD Δ_Tm_ upshift assay29,33), giving only a lower bound on the binding free energy. Calculated values shown are ΔGsingleo, the free energy computed using only the single best-scoring docking orientation; ΔGmultipleo, the full computed binding free energy using all orientations considered; and ΔGcalc.o, the computed binding free energy including multiple orientations and using the confine-and-release approach to account for protein conformational change at Val111. The final column is the difference from experiment. At the bottom is the RMS error relative to experiment across binders for each set of free energies, and the correlation coefficient, R, between calculated and experimental values. Experimental binding affinities were measured at 302K; binding free energies were computed at 300K. Calculated values used AM1-CM2 charges.

Molecule	ΔGexp.o kcal/mol	ΔGsingleo kcal/mol	ΔGmultipleo kcal/mol	ΔGcalc.o kcal/mol	ΔGcalc.o−ΔGexp.o kcal/mol

2,3-benzofuran	−5.46 ± 0.03	−2.77 ± 0.08	−3.45 ± 0.06	−3.53 ± 0.06	1.93 ± 0.07
benzene	−5.19 ± 0.16	−3.05 ± 0.20	−4.53 ± 0.20	−4.56 ± 0.20	0.63 ± 0.26
ethylbenzene	−5.76 ± 0.07	−4.95 ± 0.10	−5.36 ± 0.10	−6.36 ± 0.18	−0.60 ± 0.19
indene	−5.13 ± 0.01	−0.63 ± 0.11	−1.56 ± 0.06	−1.75 ± 0.07	3.38 ± 0.07
indole	−4.89 ± 0.06	0.06 ± 0.10	−0.24 ± 0.07	−0.42 ± 0.08	4.47 ± 0.10
isobutylbenzene	−6.51 ± 0.06	−0.16 ± 0.15	−4.14 ± 0.12	−5.01 ± 0.20	1.50 ± 0.21
_n_-butylbenzene	−6.70 ± 0.02	−4.03 ± 0.11	−4.44 ± 0.11	−4.87 ± 0.14	1.83 ± 0.14
_n_-propylbenzene	−6.55 ± 0.02	−5.29 ± 0.10	−5.70 ± 0.10	−5.88 ± 0.11	0.67 ± 0.12
_o_-xylene	−4.60 ± 0.06	−0.15 ± 0.10	−0.56 ± 0.10	−1.27 ± 0.18	3.33 ± 0.19
_p_-xylene	−4.67 ± 0.06	−2.13 ± 0.09	−2.96 ± 0.09	−3.54 ± 0.17	1.13 ± 0.18
toluene	−5.52 ± 0.06	−3.76 ± 0.09	−4.17 ± 0.09	−4.58 ± 0.12	0.94 ± 0.14
phenol	> −2.74	−0.86 ± 0.09	−1.27 ± 0.09	−1.26 ± 0.09	N/A
2-fluorobenzaldehyde	> −2.74	0.99 ± 0.25	−2.43 ± 0.10	−2.92 ± 0.14	N/A

Statistics

RMS error:	3.51±0.04	2.55±0.03	2.24±0.04
Correlation, R:	0.51±0.05	0.72±0.05	0.72±0.05

2.2.2 Accounting for multiple potential bound orientations reduces the error in computed binding affinities

Next, we account for the presence of multiple potential ligand binding modes separated by kinetic barriers. We compute binding free energies of different possible binding modes separately, and combine their contributions to get an overall binding free energy20 (see also Section 4.1). We refer to these free energies, which also include the contributions of orientations related by symmetry, as ΔGmultipleo (Table 1). With this approach, the computed binding free energies are substantially closer to experiment in several cases (and are never worse) than those computed using the single-orientation approach, since occasionally the best-scoring pose is not the pose that contributes most to the binding free energy. This inclusion of these contributions reduces the RMS error in the computed free energies relative to experiment, from 3.51±0.04 kcal/mol to 2.55±0.03 kcal/mol, and raises the correlation coefficient, R, from 0.51±0.05 to 0.72±0.05.

The improvements with this approach come for several reasons. For three of the ligands (indene, indole, and 2,3-benzofuran) multiple orientations are within kT of one another and all contribute substantially. For isobutylbenzene, the best pose from docking is not in the orientation that contributes most to binding, so including multiple candidate orientations results in inclusion of the dominant orientation. For the remainder of the compounds, improvements come from inclusion of symmetry number corrections. These issues have been addressed in more detail in work on a related binding site20. In general, it is extremely difficult to predict in advance whether multiple orientations may be relevant.

2.2.3 Accounting for more protein conformational change further improves computed binding free energies

The section above describes our treatment of relevant ligand orientations. However, the protein may also have relevant slow degrees of freedom which can be difficult to sample29,35. Here, a key change is the reorientation of the Val111 sidechain observed in X-ray structures in response to ligands such as _n_-butylbenzene, isobutylbenzene, _o_-xylene, and _p_-xylene (Ref. 29 and Figure 3). The energy barriers associated with this reorientation are large enough to prevent the sidechain from rotating on simulation timescales35. This leads to an apparent dependence of computed free energies on the initial protein structure used in simulations. For example, binding free energies that are computed from the holo protein structure are too negative (favorable) if the sidechain does not have time to re-orient as the ligand is removed, because the protein strain energy (the energetic cost of deforming the protein on binding) is not properly accounted for. On the other hand, if the apo protein structure is used, as we did here, binding free energies are too positive (unfavorable), as the ligand sterically clashes to some degree with the protein35. This dependence on the starting structure is simply due to kinetic trapping of the protein in conformations near its starting conformation.

Val111 reorients on ligand binding

Val111 is observed to adopt a different sidechain rotamer from the apo crystallographic structure in co-crystal structures with several different ligands. Shown here is the benzene-bound structure (PDB code 181L), green, which is virtually identical to the apo structure of the protein. Also shown is the _p_-xylene bound structure (PDB code 187L) in magenta. The sticks at left show the reorientation of the Val111 sidechain on binding to _p_-xylene by roughly 120° relative to the benzene-bound and apo structures.

To overcome the kinetic trapping of Val111, we use a recently-developed “confine-and-release” framework to obtain correct binding free energies that are independent of the starting structure35. Specifically, when the Val111 remains trapped, computed binding free energies are really “confined” binding free energies, with Val111 confined to a particular orientation, so we use umbrella sampling to compute the free energy of releasing the valine from its confinement in the bound and unbound states. Here, this is accomplished by forcing sampling of alternative orientations using a harmonic biasing potential, and recovering the free energy landscape for this degree of freedom35. We do this for all of the compounds considered here, although for many compounds, only the apo orientation of the valine is found to be relevant, as observed experimentally29. This is a rigorous way to account for kinetic trapping. The confine-and-release framework is a generalization (to protein degrees of freedom) of the biasing potential approaches applied previously to ligands in a number of studies16,18,19,20,22,24,11.

The confine-and-release approach, which yields the total estimated binding free energy Δ_Gbind_, further improves the agreement of computed binding free energies with experiment (Table 1 and Figure 2(b)). With this approach, the RMS error relative to experiment further decreases from 2.55±0.03 kcal/mol to 2.24±0.04 kcal/mol, while the correlation with experiment remains unchanged (R=0.72±0.05). There is significant improvement in the agreement with experiment for a number of the ligands, especially isobutylbenzene, _p_-xylene, _n_-butylbenzene, _o_-xylene, and ethylbenzene. As mentioned above, for the first four of these, Val111 re-orientation on binding is observed in the co-crystal structure29. For ethylbenzene, the deposited structure does not show the Val111 rotated relative to the benzene bound structure, but the electron density seems to allow the possibility of either orientation29. A prediction of this work is that, if the crystal structure with ethylbenzene can be solved at higher resolution, the Val111 sidechain should be observed to adopt a conformation similar to that in the _p_-xylene bound structure.

The confine-and-release approach can be applied to a variety of protein conformational changes. Here, we chose to apply it specifically to a single Val111 dihedral. This choice was motivated by the fact that Val111 was previously observed (experimentally) to reorient on ligand binding29. Also, previous computational work led us to believe that sampling of this degree of freedom could be very slow24. We therefore examined our initial simulations to look for Val111 reorientation, and found the kinetic trapping described above35. This led us to apply the confine-and-release approach to this particular degree of freedom.

It is worth noting that this is not simply an issue of predicting a correct bound structure. Rather, using either the apo or holo structure leads to biased binding free energies (Tables 1 and 2) when the confine-and-release approach is not used. Only when we account for Val111 reorientation using confine-and-release do computed free energies become consistent between simulations beginning from apo and holo starting structures (below and Ref 35).

Table 2

Binding free energies calculated for selected ligands using their holo structures as a starting point. Shown are the calculated binding free energies with and without including any Val111 reorientation, and the difference from the calculated binding free energies using the apo structure as a starting point (including the Val111 reorientation) in Table 1. PDB codes for the starting structures are 183L, 186L, 184L, 187L, and 188L, in order.

Molecule	ΔGmultipleo,holo (kcal/mol)	ΔGcalc.o,holo (kcal/mol)	ΔGcalc.o,holo−ΔGcalc.o (kcal/mol)
indene	−1.44 ± 0.07	−1.64 ± 0.09	0.10 ± 0.11
n-butylbenzene	−9.17 ± 0.13	−5.32 ± 0.22	−0.45 ± 0.26
isobutylbenzene	−8.98 ± 0.13	−4.80 ± 0.21	0.20 ± 0 29
p-xylene	−7.27 ± 0.09	−3.31 ± 0.20	0.23 ± 0.26
o-xylene	−5.93 ± 0.12	−1.91 ± 0.21	−0.64 ± 0.28

2.2.4 Binding free energies from holo structures agree with those from apo after accounting for Val111 reorientation

After the confine-and-release calculations, there was still particularly poor agreement with experiment for several ligands – especially _o_-xylene, indene, indole, isobutylbenzene, and 2,3-benzofuran. (The first three of these are the only binders in Figure 2(b) with computed binding free energies worse than −2 kcal/mol). One possible explanation is inadequate sampling – perhaps due to additional protein conformational rearrangements that are not being sampled. For example, for indene, isobutylbenzene, and _o_-xylene, helix F, which forms one side of the cavity, shifts around 2 Å on ligand binding, making the binding site larger29. Additionally, previous free energy calculations on the same system tended to overestimate binding free energies for some of these same compounds when beginning from the holo structures24.

However, inspection of simulation trajectories suggests that this helix motion is being sampled. As a more quantitative test, we repeated the calculations for selected compounds beginning from the holo structures. If computed binding free energies are different starting from apo and holo structures, even after applying the confine-and-release approach for Val111, it would indicate inadequate sampling. While computed binding free energies are significantly different for calculations started from the apo and holo structure before accounting for Val111 reorientation, the differences are essentially negligible (within uncertainty) when the confine-and-release approach is used to account for this change (Table 2). (The largest difference, using the confine-and-release approach, is for o_-xylene, −0.64 ± 0.28 kcal/mol; since the uncertainty represents one standard deviation, this is still only a 2_σ variation). This implies that sampling of these conformational changes is probably sufficient and that the error lies elsewhere.

Table 3

Calculated and experimental binding free energies for ligands of the apolar binding site considered here, as in Table 1 except using AM1-BCC charges. Shown are ΔGBCCo – full binding free energies done with AM1-BCC including contributions from multiple ligand orientations and any Val111 reorientation. These are equivalent to ΔGcalc.o from that table but done with AM1-BCC charges. Also shown are the difference between the AM1-BCC results and experiment (next to last column), and between the AM1-BCC and AM1-CM2 results (last column). When the values in the last two columns have the same sign, AM1-BCC charges improved the agreement with experiment. At the bottom is RMS error relative to experiment across binders for each set of free energies, and the correlation coefficient, R, between calculated and experimental values.

Molecule	ΔGexp.o kcal/mol	ΔGBCCo kcal/mol	ΔGBCCo−ΔGexp.o kcal/mol	ΔGAM1−CM2o−ΔGBCCo kcal/mol

2,3-benzofuran	−5.46 ± 0.03	−3.66 ± 0.06	1.80 ± 0.06	0.13 ± 0.08
Benzene	−5.19 ± 0.16	−3.95 ± 0.20	1.24 ± 0.26	−0.61 ± 0.28
Ethylbenzene	−5.76 ± 0.07	−5.82 ± 0.14	−0.06 ± 0.16	−0.54 ± 0.23
Indene	−5.13 ± 0.01	−1.63 ± 0.07	3.50 ± 0.07	−0.12 ± 0.09
Indole	−4.89 ± 0.06	−1.37 ± 0.10	3.52 ± 0.12	0.96 ± 0.13
isobutylbenzene	−6.51 ± 0.06	−8.09 ± 0.18	−1.58 ± 0.19	3.09 ± 0.27
_n_-butylbenzene	−6.70 ± 0.02	−5.70 ± 0.20	1.00 ± 0.20	0.83 ± 0.25
_n_-propylbenzene	−6.55 ± 0.02	−5.44 ± 0.10	1.11 ± 0.11	−0.44 ± 0.16
_o_-xylene	−4.60 ± 0.06	−3.23 ± 0.25	1.37 ± 0.25	1.96 ± 0.31
_p_-xylene	−4.67 ± 0.06	−3.59 ± 0.12	1.08 ± 0.14	0.05 ± 0.21
Toluene	−5.52 ± 0.06	−4.07 ± 0.10	1.45 ± 0.12	−0.51 ± 0.16
Phenol	> −2.74	−1.07 ± 0.20	N/A	−0.19 ± 0.22
2-fluorobenzaldehyde	> −2.74	−3.14 ± 0.13	N/A	0.22 ± 0.19

Statistics

RMS error:	1.89±0.04
Correlation, R:	0.79±0.07

Free energies computed using the holo starting structures also show that the holo protein structure of several of these ligands is unfavorable by roughly 4 kcal/mol in the absence of bound ligand (Table 2 and Ref.35). This is presumably because of steric clashes with the protein, and is the reason why only some of the ligands induce this conformational change on binding.

2.2.5 The AM1-BCC charge model further increases the accuracy of binding free energies

Next, we considered another possible source of error: the simulation parameters. There are different methods for assign partial charges for small molecules (for a recent discussion, see36). In the work reported above, we used AM1-CM237 partial atomic charges for the small molecules, as in docking studies. However, we found previously that AM1-BCC charges performed better than AM1-CM2 charges for hydration free energies, perhaps because they are more similar to the HF 6/31G* charges the force field was parameterized with36. Therefore, we tested the AM1-BCC charges here as well. Table 3 shows that AM1-BCC charges further reduce the RMS error between computed and experimental binding free energies from 2.24±0.04 to 1.89±0.04 kcal/mol, and the correlation coefficient, R, increases from 0.72±0.05 to 0.79±0.07.

2.2.6 Alchemical methods are more accurate than docking

One major challenge for docking methods is to discriminate between binders and non-binders. We have included two known non-binders (with affinities worse than 10 mM) in the set of molecules examined here: phenol and 2-fluorobenzaldehyde. For these two compounds, computed binding free energies indicate only weak affinity: −2.9 ± 0.1 kcal/mol and weaker (more positive) (Table 1). A 10mM detection threshold in binding affinity corresponds to a binding free energy of roughly −2.7 kcal/mol. Thus, the computed binding free energies for these two compounds are at the detection limit, essentially consistent with the experimental observation that they are non-binders.

Our free energy calculations are computationally expensive. Are the results any more accurate than those that can be obtained from molecular docking? As shown in Figure 4, DOCK scores for the ligands studied here correlate poorly with experimental binding free energies. In fact, they are anti-correlated (R=−0.69), the opposite of what one would like. Moreover, the two non-binders have DOCK scores similar to those for the majority of the true ligands (and much more favorable scores than several ligands), hence it is impossible to discriminate between binders and non-binders. In fairness to docking, it is worth noting that these nonbinders were included in our test set because they have proven challenging for docking to discriminate from the binders. Also, the first goal of docking is to separate likely from unlikely ligands, and it does seem to be performing remarkably well in this binding site, where about 80% of the top 100 docking hits would probably bind. Additionally, we find that docking also performs quite well in this site at generating sterically reasonable potential bound orientations. That said, the free energy calculations give substantially better affinity estimates and correlations than docking does, and are better at recognizing nonbinders.

DOCK scores for the best-ranked pose for each molecule versus experimental binding free energies

The correlation coefficient (R) is −0.69, meaning that compounds that DOCK predicts should bind strongly tend to bind weakly. Additionally, the two nonbinders have similar DOCK scores to a number of the binders.

The docking results discussed here used the benzene-bound protein structure, which is virtually identical to the apo structure. However, docking to alternate protein conformations seems not to result in significant improvements in quality of docking results, except when many different crystallographic protein conformations are used32.

2.3 A large source of error in docking is the rigid-protein approximation

Docking typically treats proteins as rigid. How big is the error introduced by this assumption? To test this, we held the protein rigid and repeated our free energy calculations, including the effects of ligand symmetries and multiple ligand orientations (and using AM1-CM2 charges). This led to essentially zero correlation (R=−0.05±0.09) between computed free energies and experimental values, and an RMS error of 19.78±0.06 kcal/mol (Figure 5(a)).

Comparison of calculated and experimental binding free energies with the protein held rigid

(a) Binding free energies with the protein completely rigid. The RMS error relative to experiment is 19.78±0.06 kcal/mol and the correlation coefficient (R) is −0.05±0.09. (b) Binding free energies with the whole protein minimized separately for each ligand. The RMS error relative to experiment is 4.92±0.07 kcal/mol and the correlation coefficient (R) is 0.82±0.09. (c) Binding free energies with only the binding site minimized for each ligand. The RMS error relative to experiment is 4.06±0.06 kcal/mol and the correlation coefficient (R) is 0.32±0.08. The x = y indicates perfect agreement with experiment.

As a simple improvement on this rigid protein approximation, we also allowed the protein to relax to a different structure for each ligand. First, we minimized the entire protein in the presence of each ligand, using the initial docking geometry, and subsequently held the protein rigid during the simulations. This resulted in a correlation (R) of 0.82±0.09 and an RMS error of 4.92±0.07 kcal/mol relative to experiment (Figure 5(b)). Second, we minimized only a region of the protein around the binding site (Section 4.1.10) in the presence of each ligand, before holding the protein fixed during the simulations. This resulted in a correlation (R) of 0.32±0.08 and an RMS error of 4.06±0.06 kcal/mol relative to experiment (Figure 5(c)).

Overall, it appears that keeping the protein rigid while estimating binding free energies is detrimental to binding free energy estimation, even if minimization is performed separately for each ligand.

2.4 Prospective studies: Predictions and experimental tests

2.4.1 Distinguishing binders from nonbinders

We also performed a blind test of these free energy methods. We selected five small molecules that were among the top-scoring molecules from a docking screen of a compound library (using protocols described previously33 ). We calculated binding free energies in the same manner as above, and compared the resulting dissociation constants with the detection threshold of 10 mM for the experimental thermal denaturation assay. We predicted that four of the molecules would bind and one would not 1 (the 10 mM threshold fell just between affinities for two of the compounds). We then tested these predictions experimentally using the upshift in thermal denaturation, in which the melting temperatures of the protein in the presence and absence of the ligand were compared27. All molecules were tested in their neutral forms, using either circular dichroism or fluorescence to monitor the transition from the folded to the unfolded state, and resulting Tm values were compared to that of the apo protein. All melts occurred reversibly in a manner consistent with two-state unfolding. 1,2-dichlorobenzene, n_-methylaniline, 1-methylpyrrole and 1,2-benzenedithiol increased the Tm significantly, between 1.0 and 2.9°_C (Table 4). Conversely, thieno-[2,3c]pyridine was not observed to increase the Tm, even at 2.5 mM concentration, consistent with the predictions of the free energy calculations. In contrast, docking had predicted all five would bind, but the free energy methods correctly identified the nonbinder (thieno[2,3-c]pyridine).

Table 4

Novel compounds for which predictions were made and later tested experimentally. DOCK scores suggested all five should bind. Alchemical free energy calculations were initially used to predict whether or not these molecules would bind, then Δ_Tm_ values were determined experimentally to test these predictions. Following this, final binding free energy predictions (ΔGcalco) were tested experimentally with isothermal titration calorimetry; results are as shown (ΔGexpo). The RMS difference between predicted Δ_Go_ and experiment for the three compounds tested with ITC is 0.57 kcal/mol.

Molecule	DOCK Score (kcal/mol)	Alchemical Prediction1	Δ_Tm_ (°C)	Experiment (kcal/mol)	ΔGcalco2	ΔGexp.o3 (kcal/mol)

1,2-dichlorobenzene	−19.99	Binder	2.90	Binder	−5.66 ± 0.15	−6.37
_n_-methylaniline	−17.29	Binder	1.00	Binder	−5.37 ± 0.11	−4.7
1-methylpyrrole	−15.27	Binder	2.20	Binder	−4.32 ± 0.08	−4.44
1,2-benzenedithiol	−18.51	Binder	2.50	Binder	−2.79 ± 0.13	N.D.
thieno-[2,3-c]pyridine	−18.81	Nonbinder	−0.40	Nonbinder	−2.56 ± 0.07	N.D.

2.4.2 Predicting bound orientations was successful

We then obtained crystal structures (Table 5 to determine how well these free energy calculations could predict the bound ligand conformations. We soaked three of the ligands into L99A lysozyme protein crystals. The crystals diffracted to between 1.7 Å and 2.07 Å on a home source. In all three structures, initial Fo_−_Fc electron density unambiguously identified the orientation of the ligand in the site; for dichlorobenzene, two orientations of the ligand were apparent.

Table 5

X-ray data collection and refinement.

Ligands bound to L99A
1,2-dichlorobenzene	n-methylaniline	1-methylpyrrole
Cell dimensions
a=b (Å)	60.2	59.9	60.2
c (Å)	97.0	96.1	96.4
Resolution (Å)	1.70 (1.76)a	2.07 (2.14)a	1.94 (2.01)a
Reflections	18469 (2263)a	12102 (1169)a	15355 (1484)a
Rmerge (%)	9.8 (64.5)a	13.5 (52.8)a	11.2 (56.9)a
Completeness (%)	99.8 (99.9)a	95.0 (92.8)a	99.4 (98.4)a
〈I〉 / 〈σ(I)〉	9.8 (2.3)a	7.4 (2.3)a	8.7 (2.1)a
R-factor (%)	19.8	21.9	19.1
R_free_ (%)	23.5	23.4	25.8
Resolution range (Å)	50.0–1.70	50.0–2.07	50.0–1.94
Δ_bondlengths_ (Å)	0.008	0.007	0.009
Δ bondangles (°)	1.004	0.916	1.074
PDB code	2OTY	2OTZ	2OU0

In parallel, we predicted dominant bound orientations for each of these ligands (Section 4.1). Then we determined the structures for these three ligands, and compared with our predicted structures, the best predicted DOCK poses, and the electron density (Figure 7). These structures are deposited in the Protein Data Bank under codes 2OTY, 2OTZ, and 2OU0.

Predicted and experimental ligand orientations

Stereo images comparing the experimental and predicted poses for three ligands bound to L99A. (a) The two observed configurations of 1,2-dichlorobenzene, structure determined to 1.70 Å resolution. (b) 1-methylpyrrole, structure determined to 1.94 Å A resolution and (c) n-methylaniline, structure determined to 2.07 Å resolution. The crystallographic carbon atoms of protein residue M102 and each ligand are colored grey. The carbon atoms of the docking predictions are colored yellow, and the carbon atoms of the free energy predictions are colored magenta. The carbon atoms of the second free energy prediction in (c) are colored cyan. The Fo_−_Fc density maps are contoured at 3_σ_ (green mesh). PDB codes are 2OTY, 2OU0, and 2OTZ, respectively.

For 1-methylpyrrole, the X-ray, docking, and molecular dynamics poses were quite similar (Figure 7). The particular molecular dynamics snapshot that was selected as representative appears slightly twisted relative to the other two structures, but this is simply due to the arbitrary selection of a single MD conformation from an ensemble. The X-ray pose falls well within the range of structures sampled by the simulation from which this snapshot was chosen (and the underlying electron density is consistent with a range of structures seen in simulation). The RMSD between the docking pose and X-ray structure is 0.39 Å and that between the free energy snapshot and X-ray is 0.94 Å

For _n_-methylaniline, free energy methods predicted two orientations, with essentially equal occupancy probabilities (Figure 7). The X-ray and docking poses match well with one of these orientations, but there is no evidence in the electron density for the second orientation. The two orientations differ only by rotation around the C1–N bond. The RMSD between the docking pose and X-ray is 0.63 Å and that between the lower RMSD free energy snapshot and X-ray is 0.69 Å (the higher RMSD orientation is 1.29 Å away from X-ray).

For 1,2-dichlorobenzene, two separate orientations were observed in the crystal structure, differening by a rotation of around 60° in the plane of the aromatic ring. DOCK failed to properly identify either of these orientations as the best-scoring pose, whereas our free energy methods picked out one but indicated that the second was energetically unfavorable. In particular, the ligand would occasionally transition into this alternate orientation, but it would typically remain only transiently. We further tested this by conducting a separate set of calculations where the ligand was restrained to remain in the alternate orientation, but, with our parameter set, it appears substantially less favorable for binding than the orientation predicted to dominate using our free energy methods. This could be a force-field failure, inadequate sampling, or a difference between experimental and simulation conditions. The docking pose is 2.8–2.9 Å away from the X-ray orientations, while the free energy snapshot is only 0.77 Å away from the most similar X-ray orientation.

2.4.3 Predicting binding affinities

Next, we predicted binding free energies and then measured them by isothermal titration calorimetry. Since the AM1-BCC charge model worked best in retrospective studies, we used these charges for binding free energy predictions. Ligand titrations gave easily modeled curves using a low c-value protocol38 (Figure 8). Not only did binding free energy calculations give the correct rank ordering of binders, but computed binding free energies for these compounds were remarkably accurate (an RMS error of 0.57±0.09 kcal/mol, Table 4).

Representative ITC data

Data and fit for T4 lysozyme L99A (0.063 mM) titrated with 1,2-dichlorobenzene (~ 0.6 mM). An initial injection of 2.5_μ_L was followed by 29 injections of 10_μ_L of the ligand solution made every 2.5 min into the 1.4 mL reaction cell. After subtraction of blank runs, titrations were fit as described under Experimental Procedures to obtain the results in Table 4

3 Discussion

3.1 Accuracy of free energy calculations in retrospective and prospective studies

Alchemical free energy calculations using molecular dynamics can be used to compute fairly accurate binding free energies of ligands in the T4 lysozyme L99A binding site, with an RMS error of 1.89±0.04 kcal/mol in retrospective tests. This is a much higher accuracy than docking, where scores were inversely correlated with experiment, at least among the top-scoring ligands here. Admittedly, the docking program, DOCK, was never designed to predict binding affinities, and performs remarkably well at ranking likely ligands highly in large libraries33. Also, in these calculations, we are comparing with previously known results. A more rigorous test is to compare genuinely new predictions on untested candidate ligands with subsequent experiment.

Therefore, in a blind test, we predicted affinities and binding orientations for five previously uncharacterized compounds predicted by DOCK to bind, then tested these predictions experimentally. With alchemical free energy calculations, we correctly recognized the one non-binder, accurately predicted ligand bound orientations, and quantitatively predicted binding free energies. In each of these areas, free energy methods agreed better with experiment than docking did. Free energy methods, unlike docking, also correctly ranked the ligand binding affinities in these prospective tests. Thus, it appears that alchemical free energy methods can be truly predictive and can rescue docking failures.

The approach described here, including the retrospective study of previously published data, required no knowledge of the bound structure of the protein and ligand, and used the apo protein structure. Previous work on this same binding site (for example, in Ref.24) has required an accurate bound structure of the complex of interest as a starting point.

3.2 Essential ingredients for accuracy

This present study is limited to a simplified model binding site, where many complications of other binding sites are absent. The L99A cavity studied here has no interface with bulk water, no ordered waters to displace, is small, and the dominant interactions appear to be mainly non-polar. Nevertheless, the use of this system has allowed us to be systematic in isolating and solving various sampling problems. We identified several keys to obtaining accurate binding free energies: First, multiple potential ligand orientations must be included; one cannot rely on the single top docking pose to be the dominant ligand orientation. There can be large energetic barriers between different ligand orientations which make timescales for orientational change long compared to simulation times. Second, even seemingly small protein conformational changes on ligand binding, such as the reorientation of a single sidechain, Val111, can be difficult to sample correctly in free energy calculations, yet it is essential to include these conformational changes to get correct binding free energies.

On this second point, it is interesting to note that previous computational work on this same binding site gave free energies that were more than 2 kcal/mol too negative for four of 10 known binders in calculations initiated from the holo structures24. We performed similar free energy calculations for these compounds, found that our calculations overestimated binding affinities for the same ligands, despite the fact that we use a different force field and different parameters – but only when we failed to account for the free energy associated with Val111 reorientation. Indeed, in the previous work, results were sensitive to the starting orientation of the Val111 sidechain for at least one ligand, indicating that this sidechain degree of freedom was inadequately sampled24. Our results indicate that if the free energy associated with this reorientation is not included, it can bias computed binding free energies by as much as 4 kcal/mol.

Given that these small conformational changes can contribute so substantially to overall binding free energies, it will likely prove essential to develop improved methods for computing free energies associated with larger-scale protein conformational changes, such as loop motions on ligand binding.

3.3 Lessons for docking

To address the rigid protein assumption typically used in docking, we also tried free energy calculations with the protein held rigid. Using a rigid apo structure for all ligands resulted in very large errors (RMS error 20 kcal/mol; zero correlation). Minimizing the protein separately in the presence of each ligand worked better, but RMS errors remained high (above 4 kcal/mol), and the approach lacked the ability to recognize non-binders. Apparently, for this binding site, holding the protein rigid cannot easily produce binding affinities that agree quantitatively (even within 4 kcal/mol RMS error) with experiment, but strategies involving minimization of the protein can provide some improvement over treating it as completely rigid. But for high accuracy, it may be necessary to include not only protein conformational change, but a correct accounting for the free energy costs associated with these protein conformational changes – which can be substantial, even at the single sidechain level. Lastly, our results indicated that higher-quality charges can lead to substantial improvements in binding affinities; thus, the AM1-BCC charges that performed best here may also be a better choice for docking.

3.4 Conclusions

Overall, our results indicate that free energy methods are reaching the point where they can be useful when used predictively. However, in the relatively simple system examined here, this reasonably high level of accuracy depends on carefully accounting for the presence of multiple potential ligand bound orientations, and the possibility of protein conformational changes on ligand binding. In principle extremely long molecular dynamics simulations can handle both of these, but in practice, the computational cost of such simulations is often prohibitive. For now, treating both problems requires deliberate sampling of the relevant degrees of freedom, and so some pre-knowledge of these degrees of freedom. We suspect that challenges observed in this model binding site will be found in biologically relevant binding sites as well. Despite these limitations, alchemical free energy methods hold great promise, both in predictive power, and in guiding improvement of more approximate physics-based methods.

4 Methods

4.1 Simulation Methods

4.1.1 Overview

We begin with the benzene-bound crystal structure of the T4 lysozyme L99A mutant30,29 (PDB entry 181L, which is virtually identical to the apo structure, 1L90, from which it has an RMSD of 0.14 Å), dock our ligands into the binding site, and determine which poses or orientations are kinetically stable and distinct. We then begin free energy calculations from stable, distinct orientations, as described previously20, to compute binding free energies. Finally, for each ligand, we combine contributions from these different orientations to rigorously estimate binding free energies.

4.1.2 System preparation

Except where indicated otherwise, the initial protein structure for molecular dynamics simulations and free energy calculations is the benzene-bound structure of the L99A lysozyme mutant, which is essentially identical to the apo structure. This was prepared with the GROMACS39,40 3.3 utility PDB2GMX with default protonation states, using a GROMACS port41 of the AMBER 9642 force field. Since the cavity that makes up the binding site is completely hydrophobic without any nearby titratable groups, these protonation states present no difficulties. Following preparation, the protein was placed in a dodecahedral simulation box and surrounded by roughly 6,000 water molecules which were pre-equilibrated for 1 ns with the protein held fixed prior to the equilibration of the full system, which is discussed below.

4.1.3 Docking

We used DOCK 3.5.54 to fit the molecules of interest into the protein structure (Tables 1 and 4). We retained all of the generated poses (numbering in the thousands) and scores of the molecules, then sorted these by score. We then began with the best scoring pose and worked towards the worst, retaining every pose that was different by more than 2 Å RMSD from a better scoring pose, to generate a set of the best scoring, distinct ligand orientations. This typically resulted in 10–40 distinct poses, of which we retained only the group of top-scoring poses (typically 3–8).

4.1.4 Identifying candidate orientations

From these poses, we generated general AMBER force field (GAFF)43 parameters for each ligand using ANTECHAMBER version 1.2.444, and AM1-CM2 charges45 as discussed previously20. These charges were employed in docking studies on the same system33, and we sought to separate parameter differences from methodology differences as much as possible. We also present results in this work where we use AM1-BCC46,47 charges computed with ANTECHAMBER.

From the resulting small-molecule AMBER topology and coordinate files, we generated GROMACS topology and coordinate files using the amb2gmx conversion utility48. These ligand topologies and coordinates were then merged with those for the pre-solvated protein system prior to simulation.

To further reduce the number of ligand orientations we consider in free energy calculations, we initiated separate 1 ns molecular dynamics simulations from all candidate orientations to identify those that are kinetically distinct20. We only retained one orientation of each set of orientations that interconvert easily within simulation timescales. This typically resulted in 1–4 kinetically distinct orientations which were used for the calculations presented in Section 2.

4.1.5 Choosing restrained orientations

We then chose reference orientations for restraining the ligand in the binding site relative to the protein for subsequent free energy calculations. These are defined by picking a specific value in each of six relative protein-ligand degrees of freedom to which to restrain the ligand20. These values were chosen as the most probable value of each degree of freedom as determined from histograms computed during the 1 ns simulations, although in principle this choice is arbitrary19,20. The degrees of freedom used are as described previously19,20.

4.1.6 Binding free energy calculations

We carried out independent binding free energy calculations for each kinetically distinct orientation. Using the orientational decomposition procedure described previously20, we combined the effective binding free energies of each orientation into an overall binding free energy (ΔGmultipleo). We also computed binding free energies that would have resulted had we only considered a single potential bound orientation and neglected symmetry number corrections, as done in docking (ΔGsingleo).

Binding free energy calculations were carried out in GROMACS 3.3 (with several crucial bugfixes described previously20 ) using the Bennett acceptance ratio (BAR)49,50 method to estimate free energy differences. To calculate absolute binding free energies, we used a thermodynamic cycle developed previously17,19,20. In this cycle, we begin with the ligand bound to the protein, then restrain the ligand harmonically to a reference orientation within the binding site. We then annihilate the ligand’s partial charges, then decouple its Lennard-Jones interactions with the rest of the system. The final ligand state is equivalent to a non-interacting ligand with no electrostatics, restrained, in vacuum or water. We then analytically calculate the free energy of removing the restraints, and compute the free energy of restoring first the Lennard-Jones and then the electrostatic interactions in water. This entire process forms a thermodynamic cycle that transfers the ligand from the binding site to bulk water in the standard state. If all of the component calculations are converged, this rigorously provides a measurement of the absolute binding free energy, Δ_Go_, for the forcefield and solvent model used 2. As part of each of the steps in the cycle, independent free energy calculations were conducted at a number of intermediate alchemical states (denoted by the parameter λ) which were the same as those described in our previous work20.

Following these binding free energy calculations and the predictions discussed below, we also computed binding free energies for the set of small molecules using AM1-BCC charges. To do this, we computed the free energy of changing AM1-CM2 to AM1-BCC charges in water for each compound, and then repeated the restraining and charging calculations for the compound in the protein for each orientation. Since the Lennard-Jones decoupling is done with compound’s electrostatics already turned off, it was unnecessary to repeat these calculations.

4.1.7 Simulation protocols

For all of the simulations discussed here (at each λ value), equilibration was performed as follows. First, velocities were assigned from a Maxwell-Boltzmann distribution at 300 K and the the system was subjected to 10 ps of isothermal molecular dynamics. This was followed by 100 ps of isothermal-isobaric dynamics with pressure regulated by the Berendsen weak-coupling scheme51 as discussed previously20. Following this, the simulation cell size was fixed and production simulations were run with isothermal dynamics, using the Langevin integrator for temperature control with a friction coefficient of 1 ps−1. Production simulations were 1 ns in length for simulations of the complex (at each alchemical intermediate state, or λ value), and 5 ns for the ligand in water, except where noted otherwise.

All remaining protocols are as discussed previously20, with several exceptions: First, PME parameters were modified from those used previously to increase accuracy. Here, we used a PME spline order of 6, a relative tolerance of 10−6, and a Fourier spacing of as close as possible to 1.0 Å. Additionally, we applied a long range van der Waals correction (in addition to the analytical correction employed previously20 ) to correct for the effect of truncating the long-range dispersive interactions at a finite cutoff. These interactions are everywhere attractive, and can contribute significantly to binding free energies due to the fact that proteins have a higher density of attractive sites than water22. While this issue will be discussed in detail elsewhere52, the approach used here was, briefly, to run as usual the set of simulations where the ligand Lennard-Jones interactions are decoupled (that is, the ligand-environment Lennard-Jones interactions are turned off). These simulations were then reprocessed with long (24 Å) cutoffs for Lennard-Jones interactions, and the weighted histogram analysis method (WHAM)53 was used to reweight the data from the simulations conducted with the short cutoff in order to estimate what the decoupling free energy would have been had we run with the longer cutoff. This is a relatively small correction (0.2–0.8 kcal/mol) in the direction of increased binding affinity. This correction would be larger had not an approximate analytical dispersion correction already been included in the original runs by using the GROMACS correction option ENERPRES, and tends to be larger for larger ligands22,52.

To aid convergence of the calculations for benzene, which has a very high symmetry number, we used an approach employed previously24 and restricted benzene to stay within a single symmetric orientation during our free energy calculations, then simply included the effect of symmetry as a symmetry number correction to the binding free energy20.

4.1.8 Confine-and-Release for Val111

For some ligands, a valine sidechain (Val111) in the binding cavity is observed experimentally to change rotameric states on ligand binding. This conformational change is not typically sampled (or not well sampled) during our molecular dynamics simulations (discussed in Section 2). Neglect of this change leads to an underestimate of binding free energies for those ligands when the apo protein structure is used, and an overestimate when the holo protein structure is used. In the former case, the protein typically remains trapped in a conformation where the valine interferes with ligand binding; in the latter, when the ligand is removed from the binding site, the protein remains kinetically trapped with the sidechain in an unfavorable orientation, leading to neglect of protein strain energy in the free energy calculation. One way to properly account for the presence of these multiple conformations is to use the “confine-and-release” strategy35. The basic idea is to compute the binding free energy of the ligand to the protein, with the protein conformation confined (either kinetically or with artificial restraints) to a particular region of configuration space, then to compute the free energy of releasing the protein from confinement in the bound and unbound states. This provides a rigorous approach for computing binding free energies which include contributions from these conformational changes35.

We apply the confine-and-release approach to compute the binding free energies of all the compounds considered here. To do this, we first compute the binding free energy with the valine sidechain kinetically trapped in the orientation from the apo structure (which we check by monitoring the dihedral angle throughout all of our simulations). In some cases, the valine actually manages to briefly escape from its kinetic trap at one or several λ values in the alchemical part of the calculation; we discard any simulation snapshots where it had done so from our data analysis in order to apply the confine-and-release approach. Once we have confined binding free energies, computed with the sidechain trapped, we use umbrella sampling and WHAM53 to compute the potential of mean force (PMF) for rotating the valine sidechain in the bound and unbound states. From this, we compute the free energy of releasing the protein from confinement, and thus the binding free energies. We present our results with and without the confine-and-release approach, which provides a rigorous way to account for inadequate sampling.

Simulation details for the umbrella sampling calculations are as described previously35.

Since free energy calculations were conducted with two charge sets, two sets of umbrella sampling calculations for Val111 were carried out for each ligand –one where the ligand had AM1-BCC charges, and one where it had AM1-CM2 charges. This was important since the details of the ligand electrostatics can influence the free energy landscape associated with this reorientation.

4.1.9 Predictions

To predict whether untested compounds would bind, we selected five small molecules predicted by docking (using protocols described previously33 ) to be binders. We followed the same protocols described above — docking to the binding site, retaining a number of different kinetically distinct starting orientations, running separate binding free energy calculations for each of these, and then combining them to get a total binding free energy. We also applied the confine-and-release approach to account for any reorientation of the Val111 sidechain on binding of these molecules. From computed binding free energies, we calculated dissociation constants, and then predict that those compounds with dissociation constants less than 10 mM (the experimental detection threshold for the thermal upshift assay) should bind. These predictions – and those of bound structures, below – were made with AM1-CM2 charges, as AM1-BCC charges were only examined later.

Predicting bound structures of these unknown ligands is challenging, since our method is intended to provide an accurate estimate of the binding free energy which includes contributions from a variety of different ligand and protein structures, and neglects any effects of the crystal environment which are present in X-ray structures, as well as other differences. Here, we attempted to identify the dominant bound structure by identifying which kinetically distinct ligand binding orientation contributes most favorably to the total binding free energy; to do this, we calculated occupancy probabilities of the different dominant orientations from their estimated binding free energies. The predicted orientation was the orientation with the highest probability of occupancy. Then, we predicted a bound structure by taking a representative snapshot from a simulation where the ligand remains, without restraints, stably within the region of configuration space corresponding to that orientation. Our predicted bound orientations, then, were single snapshots from molecular dynamics simulations. For one ligand, we predicted that two orientations would have nearly comparable occupancy probabilities, so we predicted that both would be observed.

After these predictions, we continued retrospective studies and found that the AM1-BCC charge model gave more accurate binding free energies. Therefore, we used the AM1-BCC charge set to make predictions for binding free energies prior to measuring these calorimetrically.

4.1.10 Binding free energies to a rigid protein

To compare our methodology more closely with docking, we repeated the free energy calculations using essentially the same protocols, but with the protein held rigid in its prepared starting structure, as is often done in docking. To do so, we used the GROMACS option of defining frozen groups that are held fixed during dynamics. Because there are so many fewer degrees of freedom when the protein is completely rigid, convergence was more rapid, and production simulations required only 100 ps at each λ value. Protocols were otherwise the same, and these calculations used AM1-CM2 charges.

We considered several choices for the rigid protein structure. First, we held the protein rigid in its starting structure (prepared from PDB code 181L). Second, motivated by testing approaches that could easily be applied in docking and scoring, we minimized the entire protein in the presence of each ligand individually in vacuum, and used each of these structures for the appropriate ligand. The RMSD to the starting prepared structure is typically around 0.5 Å with this approach. Finally, we modified this second protocol to allow only residues near the binding site (residues 78, 84, 85, 87, 88, 91, 98, 99, 100, 102, 103, 106, 111, 118, 121, 133, and 153) to move during minimization. With this protocol, changes in structure were very minor (often less than 0.01 Å RMSD (the RMSD reported is for the protein as a whole)).

Minimization protocols were as discussed above and previously20, except the order of minimization was reversed (steepest descents followed by L-BFGS) and, since minimization was done in vacuum, cutoff electrostatics was used instead of PME, using a cutoff of 11 Å.

4.1.11 Error analysis

Calculated uncertainties reported here are one standard deviation of the mean over 40 block bootstrap trials, where the block length is taken to be equal to the autocorrelation time, as described previously20.

4.2 Experimental Methods

4.2.1 Binding detection by upshift of thermal denaturation temperature

To detect binding, L99A protein was denatured reversibly by temperature in the presence and absence of the putative ligand. Molecules that bind preferentially to the folded cavity-containing protein should stabilize it relative to the apo protein, raising its temperature of melting29.

Thermal denaturation experiments were carried out in a Jasco J-715 spectropolarimeter with a Jasco PTC-348WI Peltier-effect in-cell temperature control device and in-cell stirring. Each compound was screened in its neutral form. 1,2-benzenedithiol was assayed in a pH 3 buffer containing 25mM KCl, 2.9 mM phosphoric acid and 17 mM KH2PO4. Compounds 1,2-dichlorobenzene and 1-methylpyrrole were screened in a pH 5.4 buffer containing 100 mM sodium chloride, 8.6 mM sodium acetate, and 1.6 mM acetic acid. Compounds thieno[2,3-c]pyridine and n_-methylaniline were screened in a pH 6.8 buffer composed of 50 mM potassium chloride, and 38% (v/v) ethylene glycol. All buffers are as previously described29. Thermal denaturation of the protein in the presence of compounds 1,2-dichlorobenzene, thieno[2,3-c]pyridine, and n_-methylaniline were monitored by circular dichroism (CD) between 223 and 234 nm (although the 223 nm wavelength is the ideal wavelength for measuring the helical signal of T4 lysozyme, the higher wavelengths, which were less affected by absorbance from some of the compounds, can be used to monitor the edge of the helical signal). For 1,2-benzenedithiol and 1-methylpyrrole, which have high absorbance in the far UV region, thermal denaturation was measured by the intensity of the integrated fluorescence emission for all wavelengths above 300 nm, exciting at 290nm. Thermal melts were performed at a temperature ramp rate of 2 K/min. A least-squares fit of the two-state transition model was performed with the program EXAM54 to calculate Tm and van’t Hoff Δ_H values for the thermal denaturations. The Δ_Cp was set to 1.94 kcal mol−1K−1.

Thermal denaturation of apo T4 lysozyme L99A was carried out with 0.02–0.04 mg/ml protein in the same buffer conditions described above. Compounds were included at concentrations as high as 10 mM. Each denaturation experiment was performed at least three times.

4.2.2 Isothermal titration calorimetry

Quantitative estimates of association for ligand binding to L99A T4 lysozyme were obtained by isothermal titration calorimetry (ITC) using a Microcal VP-ITC calorimeter55 operated at 10°C with a reference power of 10 _μ_cal/sec, a stirring speed of 300 rpm, and a data collection interval of four minutes per injection. An initial injection of 2 _μ_L of ligand was followed by an additional 29 injections of 10 _μ_L totaling 292 _μ_L. These were added to 0.05 to 0.13 mM protein in the 1.4266 mL sample cell. The concentration of small molecule ligands in the syringe was adjusted such that the final molar ratio of ligand to protein was at least twofold by the end of the titrations. Protein concentrations were determined by molar absorptivity at 280 nm in 0.5 M NaCl, 0.1 M sodium phosphate buffer, pH 6.8. Ligand concentrations were determined by volume of material added to a known volume of buffer. Baseline mixing heats were estimated by injection of ligand into buffer. Reaction heat profiles were fit to the single binding site model using the ITC worksheet of ORIGIN version 7.0.

4.2.3 Protein preparation and crystallography

T4 lysozyme mutant L99A was overexpressed, purified, and crystals grown as described previously56. The crystals belong to space group _P_3221. Crystals were soaked overnight to four days in crystallization buffer containing as much as 50 mM compound. In addition, drops of neat compound were added to the cover slip surrounding the drop containing the crystal. After soaking, the crystals were cryoprotected with a 50:50 Paraton-N (Hampton Research, Aliso Viejo, CA), mineral oil mix. X-ray data were collected at 110 K with an in-house Raxis IV detector. Reflections were indexed, integrated, and scaled using the HKL package57. The complex structures were refined using REFMAC558. For model building and water placement, we used Coot59. The X-ray crystal structures have been deposited in the PDB as 2OTY, 2OTZ, and 2OU0.

Five compounds for which binding predictions were made

(a) 1,2-dichlorobenzene; (b) n-methylaniline; (c) 1-methylpyrrole; (d) 1,2-benzenedithiol; and (e) thieno[2,3-c]pyridine.

Acknowledgments

We thank Sarah Boyce for help with protein preparation. We thank Benoît Roux (University of Chicago), Bill Swope (IBM Almaden), Jed Pitera (IBM Almaden), Guha Jayachandran (Stanford University), and Vijay Pande (Stanford University) for helpful discussions, and the reviewers for insightful comments on the manuscript. JDC was supported in part by HHMI and IBM pre-doctoral fellowships. DLM and KAD acknowledge NIH grant GM63592, Anteon Corporation grant USAF-5408-04-SC-0008, and a UCSF Sandler Award. BKS acknowledges NIH grant GM59957. This work was performed in part with the UCSF QB3 Shared Computing Facility.

Footnotes

1Initial predictions were made with AM1-CM2 charges, but AM1-BCC charges were tried later and led to the same outcome

2The sum of the components around the cycle is actually the negative of Δ_Go_.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1. Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins: Str, Funct, and Genet. 2002;47:409–443. [PubMed] [Google Scholar]

2. Shoichet BK, Leach AR, Kuntz ID. Ligand solvation in molecular docking. Proteins: Struct, Funct, and Genet. 1999;34:4–16. [PubMed] [Google Scholar]

3. Huang N, Kalyanaraman C, Bernacki K, Jacobson MP. Molecular mechanics methods for predicting protein-ligand binding. Phys Chem Chem Phys. 2006;8:5166–5177. [PubMed] [Google Scholar]

4. Srinivasan J, Cheatam TE, III, Cieplak P, Kollman PA, Case DA. Continuum solvent studies of the stability of DNA, RNA, and Phosphoramidate-DNA helices. J Am Chem Soc. 1998;120:9401–9409. [Google Scholar]

5. Kuhn B, Kollman PA. Binding of a diverse set of ligands to avidin and streptavidin: An accurate quantitative prediction of their relative affinities by a combination of molecular mechanics and continuum solvent models. J Med Chem. 2000;43:3786–3791. [PubMed] [Google Scholar]

6. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y, Wang W, Donini O, Cieplak P, Srinivasan J, Case DA, Cheatham TE., III Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Accounts of Chemical Research. 2000;33:889–897. [PubMed] [Google Scholar]

7. Allen TW, Andersen OS, Roux B. On the importance of atomic fluctuations, protein flexibility, and solvent in ion permeation. J Gen Physiol. 2004;124:679–690. [PMC free article] [PubMed] [Google Scholar]

8. Lee MS, Olson MA. Calculation of absolute protein-ligand binding affinity using path and endpoint approaches. Biophys J. 2006;90:864–877. [PMC free article] [PubMed] [Google Scholar]

9. Pearlman DA. Evaluating the molecular mechanics poisson-boltzmann surface area free energy method using a congeneric series of ligands to p38 map kinase. J Med Chem. 2005;48:7796–7807. [PubMed] [Google Scholar]

10. Steinbrecher T, Case DA, Labahn A. A multistep approach to structure-based drug design: Studying ligand binding at the human neutrophil elastase. J Med Chem. 2006;49:1837–1844. [PubMed] [Google Scholar]

11. Woo HJ, Roux B. Calculation of absolute protein-ligand binding free energy from computer simulations. Proc Nat Acad Sci USA. 2005;102:6825–6830. [PMC free article] [PubMed] [Google Scholar]

12. Zhang Y, McCammon JA. Potentials of mean force for acetylcholine unbinding from the alpha7 nicotinic acetylcholine receptor ligand-binding domain. J Am Chem Soc. 2006;128:3019–3026. [PMC free article] [PubMed] [Google Scholar]

13. Rodinger T, Pomès R. Enhancing the accuracy, the efficiency and the scope of free energy simulations. Curr Opin Struct Biol. 2005;15:164–170. [PubMed] [Google Scholar]

14. Kofke DA. Free energy methods in molecular simulation. Fluid Phase Equilibria. 2005;228–229:41–48. [Google Scholar]

15. Shirts MR, Mobley DL, Chodera JD. Alchemical free energy calculations: Ready for prime time? Ann Rep Comp Chem. In press. [Google Scholar]

16. Roux B, Nina M, Pomès R, Smith JC. Thermodynamic stability of water molecules in the bacteriorhodopsin proton channel: A molecular dynamics free energy perturbation study. Biophys J. 1996;71:670–681. [PMC free article] [PubMed] [Google Scholar]

17. Gilson MK, Given JA, Bush BL, McCammon JA. The statistical-thermodynamic basis for computation of binding affinities: A critical review. Biophys J. 1997;72:1047–1069. [PMC free article] [PubMed] [Google Scholar]

18. Hermans J, Wang L. Inclusion of the loss of translational and rotational freedom in theoretical estimates of free energies of binding. application to a complex of benzene and mutant T4 Lysozyme. J Am Chem Soc. 1997;119:2707–2714. [Google Scholar]

19. Boresch S, Tettinger F, Leitgeb M, Karplus M. Absolute binding free energies: A quantitative approach for their calculation. J Phys Chem B. 2003;107:9535–9551. [Google Scholar]

20. Mobley DL, Chodera JD, Dill KA. On the use of orientational restraints and symmetry corrections in alchemical free energy calculations. J Chem Phys. 2006;125:084902. [PMC free article] [PubMed] [Google Scholar]

21. Wang J, Deng Y, Roux B. Absolute binding free energy calculations using molecular dynamics simulations with restraining potentials. Biophysical Journal. 2006;91:2798–2814. [PMC free article] [PubMed] [Google Scholar]

22. Shirts MR. Calculating Precise and Accurate Free Energies in Biomolecular Systems, Stanford University, 2004, doctoral dissertation. available on the web at http://www.columbia.edu/mrs2144/

23. Fujitani H, Tanida Y, Ito M, Jayachandran G, Snow CD, Shirts MR, Sorin EJ, Pande VS. Direct calculation of the binding free energies of FKBP ligands. J Chem Phys. 2005;123:084108. [PubMed] [Google Scholar]

24. Deng Y, Roux B. Calculation of standard binding free energies: Aromatic molecules in the internal cavity of the T4 Lysozyme L99A mutant. Journal of Chemical Theory and Computation. 2006;2:1255–1273. [PubMed] [Google Scholar]

25. van den Bosch M, Swart M, Snijderst JG, Berendsen HJC, Mark AE, Oostenbrink C, van Gunsteren WF, Canters GW. Calculation of the redox potential of the protein azurin and some mutants. Chem Bio Chem. 2005;6:738–746. [PubMed] [Google Scholar]

26. Zhou Y, Oostenbrink C, Jongejan A, Hagen WR, de Leeuw SW, Longejan JA. Computational study of ground-state chiral induction in small peptides: Comparison of the relative stability of selected amino acid dimers and oligomers in homochiral and heterochiral combinations. J Comp Chem. 2006;27:857–867. [PubMed] [Google Scholar]

27. Eriksson AE, Baase WA, Zhang XJ, Heinz DW, Blaber M, Baldwin EP, Matthews BW. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science. 1992;255:178–183. [PubMed] [Google Scholar]

28. Eriksson AE, Baase WA, Matthews BW. Similar hydrophobic replacements of leu99 and phe153 within the core of t4 lysozyme have different structural and thermodynamic consequences. J Mol Biol. 1993;229:747–769. [PubMed] [Google Scholar]

29. Morton A, Matthews BW. Specificity of ligand binding in a buried non-polar cavity of T4 lysozyme: linkage of dynamics and structural plasticity. Biochemistry. 1995;34:8576–8588. [PubMed] [Google Scholar]

30. Morton A, Baase WA, Matthews BW. Energetic origins of specificity of ligand binding in an interior nonpolar cavity of T4 lysozyme. Biochemistry. 1995;34:8564–8575. [PubMed] [Google Scholar]

31. Wei BQ, Baase WA, Weaver LH, Matthews BW, Shoichet BK. A model binding site for testing scoring functions in molecular docking. J Mol Biol. 2002;322:339–355. [PubMed] [Google Scholar]

32. Wei BQ, Weaver LH, Ferrari AM, Matthews BW, Shoichet BK. Testing a flexible-receptor docking algorithm in a model binding site. J Mol Biol. 2004;337:1161–1182. [PubMed] [Google Scholar]

34. Mann G, Hermans J. Modeling protein-small molecule interactions: Structure and thermodynamics of noble gases binding in a cavity in mutant phage T4 lysozyme L99A. J Mol Biol. 2000;302:979–989. [PubMed] [Google Scholar]

35. Mobley DL, Chodera JD, Dill KA. Confine and release: Obtaining correct binding free energies in the presence of protein conformational change. J Chem Theor Comput. In press. [PMC free article] [PubMed] [Google Scholar]

36. Mobley DL, Dumont E, Chodera JD, Dill KA. Comparison of charge models for fixed-charge force fields: Small-molecule hydration free energies in explicit solvent. J Phys Chem B. Accepted. [PubMed] [Google Scholar]

37. Chambers CC, Hawkins GD, Cramer CJ, Truhlar DG. Model for aqueous solvation based on class iv atomic charges and first solvation-shell effects. J Phys Chem A. 1996;100:16385–16398. [Google Scholar]

38. Turnbull WB, Daranas AH. On the value of c: can low affinity systems be studied by isothermal titration calorimetry? J Am Chem Soc. 2003;125:14859–14866. [PubMed] [Google Scholar]

39. Lindahl E, Hess B, van der Spoel D. GROMACS 3.0: A package for molecular simulation and trajectory analysis. J Mol Mod. 2001;7:306–317. [Google Scholar]

40. van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJC. GROMACS: Fast, flexible, and free. J Comput Chem. 2005;26:1701–1718. [PubMed] [Google Scholar]

41. Sorin EJ, Pande VS. Exploring the helix-coil transition via all-atom equilibrium ensemble simulations. Biophys J. 2005;88:2472–2493. [PMC free article] [PubMed] [Google Scholar]

42. Kollman PA. Advances and continuing challenges in achieving realistic and predictive simulations of the properties of organic and biological molecules. Accounts of Chemical Research. 1996;29:461–469. [Google Scholar]

43. Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. Development and testing of a general AMBER force field. J Comp Chem. 2004;25:1157–1174. [PubMed] [Google Scholar]

44. Wang J, Wang W, Kollman PA, Case DA. Automatic atom type and bond type perception in molecular mechanical calculations. Journal of Molecular Graphics and Modelling. 2006;26:247260. [PubMed] [Google Scholar]

45. Li J, Zhu T, Cramer CJ, Truhlar DG. New class IV charge model for extracting accurate partial charges from wave functions. J Phys Chem A. 1998;102:1820–1831. [Google Scholar]

46. Jakalian A, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic charges. am1-bcc model: Ii. parameterization and validation. J Comput Chem. 2002;23:1623–1641. [PubMed] [Google Scholar]

47. Jakalian A, Bush BL, Jack DB, Bayly CI. Fast, efficient generation of high-quality atomic charges. am1-bcc model: I. method. J Comput Chem. 2000;21:132–146. [PubMed] [Google Scholar]

48. Sorin EJ, Chodera JD, Mobley DL. the conversion script was originally written by Eric Sorin in the laboratory of Vijay Pande, and modified and updated by the present authors. 2006. It is available online at http://folding.stanford.edu/1amber.

49. Bennett CH. Efficient Estimation of Free Energy Differences from Monte Carlo Data. J Comput Phys. 1976;22:245–268. [Google Scholar]

50. Shirts MR, Pande VS. Comparison of efficiency and bias of free energies computed by exponential averaging, the Bennett Acceptance Ratio, and thermodynamic integration. J Chem Phys. 2005;122:144107. [PubMed] [Google Scholar]

51. Berendsen HJC, Postma JPM, van Gunsteren WF, DiNola A, Haak JR. Molecular dynamics with coupling to an external bath. J Chem Phys. 1984;81:3584–3690. [Google Scholar]

52. Shirts MR, Mobley DL, Chodera JD, Pande VS. Accurate and efficient corrections for missing dispersion interactions in molecular simulations. Submitted. [PubMed] [Google Scholar]

53. Kumar S, Rosenberg JM, Bouzida D, Swendsen RH, Kollman PA. The weighted histogram analysis method for free-energy calculations on biomolecules. I. The method. J Comp Chem. 1992;13:1011–1021. [Google Scholar]

54. Kirchhoff W. EXAM: A two-state thermodynamic analysis program. Gaithersburg, Maryland: NIST; 1993. [Google Scholar]

55. Plotnikov VV, Brandts JF, Lin LN. A new ultrasensitive scanning calorimeter. Anal Biochem. 1997;250:237–244. [PubMed] [Google Scholar]

56. Poteete AR, Dao-Pin S, Nicholson H, Matthews BW. Second-site revertants of an inactive t4 lysozyme mutant restore activity by restructuring the active site cleft. Biochemistry. 1991;30:1425–1432. [PubMed] [Google Scholar]

57. Otwinowski Z, Minor W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 1997;276:307–326. [PubMed] [Google Scholar]

58. Murshudov GN, Vagin AA, Dobson EJ. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr. 1997;53:240–255. [PubMed] [Google Scholar]

59. Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004;60:2126–2132. [PubMed] [Google Scholar]