Implementation of molecular replacement in AMoRe (original) (raw)

research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logo BIOLOGICALCRYSTALLOGRAPHY

ISSN: 1399-0047

CROSSMARK_Color_square_no_text.svg

(Received 5 March 2001; accepted 12 June 2001)

An account is given of the molecular replacement method as implemented in the package AMoRe. The overall strategy of the method is presented and the main functions used in the package are described. The most important features of AMoRe are the quality of the fast rotation and translation functions and the facility of multiple inputs to translation and rigid-body refinement functions, which allow for a fast multiple exploration of crystal configurations with a high level of automation.

1. Introduction

The idea of molecular replacement is to build a tentative crystal structure using known molecular models similar to the actual molecules that constitute the crystal in order to start model building or refinement. The problem is to determine the positions of the models within the crystal cell. This is ultimately performed by comparing observed and calculated structure factors for selected positions of the independent molecules within the cell. In AMoRe, the comparison essentially involves the correlation coefficient in terms of amplitudes. This criterion was chosen in the light of the results available one decade ago, results that now may be considered as corresponding to easy or moderately difficult MR problems. At that time, an exhaustive positional search involving in general six variables per independent model using that simple but robust criterion could not be envisaged. Nowadays, a full six-dimensional search would also be too lengthy, although feasible. This explains, perhaps, the fact that the original ideas of Rossmann and Blow, i.e. the splitting of the search into two consecutive three-dimensional ones, are still found in filigree in most MR packages.

The main programs in AMoRe aim at selecting a certain number of positions, obtained through the exhaustive exploration of three-dimensional domains with fast functions, and computing the correlation coefficients associated with these positions. The idea is to assess many crystal configurations, as it is the contrast in the values of the criterion that gives one confidence in the solution. The fast functions, rotation functions and translation functions are either improved versions of already proposed ones or new ones. Accurate and fast algorithms are used throughout the package in order to save computing time. In particular, molecular scattering factors replace coordinates, which are used only once in the whole procedure.

The main stream in AMoRe is the set of values of the variables that specify the positions of the independent models within the crystal, from which structure factors and inputs to the fast functions are calculated. We will first define these variables and their relationship to the calculated structure factors. We will then describe the strategy for the selection of configurations.

2. Positional variables and crystal configurations

The position of the molecular model within the crystal is determined by the rotation R and the translation T that move the model from a reference initial position, specified by the atomic vectors {ro}, to the current position, specified by the atomic vectors {r},

[{\bf r} = {\bf R} {\bf r}^{o} + {\bf T}. \eqno (1)]

The translation T is usually given in fractional coordinates (x, y, z) in the crystal cell. The rotation R is parameterized with the Euler angles (φ, θ, ψ) associated with an orthonormal frame (X, Y, Z). Several conventions exist for the names of angles and definitions of the axes involved in this parameterization. We will follow the convention by which (φ, θ, ψ) denotes a rotation of ψ about the Z axis, followed by a rotation of θ about the Y axis and finally a rotation of φ about the Z axis,

[{\bf R}(\varphi,\theta,\psi) = {\bf R}(\varphi,{\bf Z}) {\bf R}(\theta,{\bf Y}) {\bf R}(\psi,{\bf Z}). \eqno (2)]

The angles take values within the parallelepiped {0 ≤ φ < 360; 0 ≤ θ ≤ 180; 0 ≤ ψ < 360°}. For θ = 0 or 180°, only the combinations φ + ψ or φ − ψ are independent, respectively.

The initial position of the model is usually chosen with its center of mass placed at the origin and its principal axes of inertia parallel to the orthonormal frame, as this leads to an efficient sampling of configurations. A good choice for the orthonormal frame is Z parallel to the highest crystal symmetry axis (nort = 0 in AMoRe). This choice restricts the orientational search to {0 ≤ φ < 36/n}, where n is the order of the rotational symmetry around Z.

Therefore, given the models' initial positions, the crystal unit-cell parameters, the space-group symmetry and the orientation of the orthonormal frame, a crystal configuration is uniquely determined by giving the positions of the independent molecular models within the unit cell, expressed in terms of the positional variables,

[\matrix {\#m' & \varphi_{m'}& \theta_{m'}& \psi_{m'}& x_{m'}& y_{m'}& z_{m'}\cr \#\ldots& \ldots& \ldots&\ldots &\ldots & \ldots& \ldots\cr\#m& \varphi_{m}& \theta_{m}& \psi_{m}&x_{m} & y_{m}& z_{m}.}]

The labels _m_′, …, m identify the molecules and the molecular models. Note that some of these models may coincide.

3. Structure-factor calculation

The calculated structure factors are conveniently written in terms of the individual molecular scattering factors fm(s), i.e. the Fourier transform of the electron density corresponding to the isolated molecule in its initial position. These molecular scattering factors are computed with the TABLING program, which translates the model coordinates so that the center of mass is at the origin and rotates the coordinates so that the model's principal axes of inertia are parallel to the model box. An electron density is then constructed and eventually transformed by Fast Fourier techniques. One feature of AMoRe is that the model may well be an electron density or an electron-microscopy reconstruction, as only the Fourier coefficients are used.

If Rm and Tm denote the rotation and translation that define the molecule's current position, Mg and tg the space-group transformation matrix and translation vector of the _g_th symmetry operation and H the coordinates of a crystal reciprocal vector, the contribution of molecule m to the calculated crystal structure factor is

![[\textstyle \sum \limits_{g = 1}^{G} f_{m}({\bf HM}{g}{\bf DR}{m}{\bf O}{m}) \exp[2\pi i{\bf H}({\bf M}{g}{\bf T}{m}{\bf + t}{g})]. \eqno (3)]](https://doi.org/teximages/ba5008fd4.gif)

D and Om are orthogonalizing and deorthogonalizing matrices. In fact, DRmOm is simply the rotation matrix Rm expressed in a mixed basis: it applies (from left to right) to reciprocal coordinates (Miller indices) in the crystal and produces reciprocal coordinates in the model box. If there are M independent molecules we have to add M terms like this. Assuming that the individual molecular scattering factors fm(s) have been set to a common scale, we have

![[F^{\rm cal}{\bf H} = \textstyle \sum \limits{m = 1}^{M} \sum \limits_{g = 1}^{G} f_{m}({\bf HM}{g}{\bf DR}{m}{\bf O}{m}) \exp[2\pi i{\bf H}({\bf M}{g}{\bf T}{m}{\bf + t}{g})]. \eqno (4)]](https://doi.org/teximages/ba5008fd5.gif)

4. Correlation coefficient

As stated in the introduction, the agreement criterion to assess crystal configurations is the (linear) correlation coefficient between observed and calculated amplitudes,

![[{\rm CC}{F} = \left(\textstyle \sum \limits{\bf H} \overline{\overline{|F^{\rm obs}{\bf H}|}} \times \overline{\overline{|F^{\rm cal}{\bf H}|}} \right) \biggr / \left [{ \left(\textstyle \sum \limits_{\bf H} \overline{\overline{|F^{\rm obs}{\bf H}|}}^{2} \right) \times \left(\textstyle \sum \limits{\bf H} \overline{\overline{|F^{\rm cal}_{\bf H}|}}^{2} \right)} \right]^{1/2}, \eqno (5)]](https://doi.org/teximages/ba5008fd6.gif)

where [\overline{\overline{|F_{\bf H}|}}] denotes a `centered' variable, e.g.

[\overline{\overline{|F_{\bf H}|}} = |F_{\bf H}| - \langle |F_{\bf H}| \rangle, \eqno (6)]

and [\langle \cdots \rangle] means average over reflections. CC_F_ takes values in the interval (1, −1).

5. Strategy

The overall strategy of MR as implemented in AMoRe is easily understood if we consider the correlation coefficient between intensities

![[{\rm CC}{I} = \left(\textstyle \sum \limits{\bf H} \overline{\overline{I^{\rm obs}{\bf H}}} \times \overline{\overline{I^{\rm cal}{\bf H}}} \right) \bigg / \left [\left(\textstyle \sum \limits_{\bf H} \overline{\overline{I^{\rm obs}{\bf H}}}^{2} \right) \times \left(\textstyle \sum \limits{\bf H} \overline{\overline{I^{\rm cal}_{\bf H}}}^{2} \right) \right] ^{1/2} \eqno (7)]](https://doi.org/teximages/ba5008fd8.gif)

as the target function for screening. The calculated total intensity is given by

![[\eqalignno {I^{\rm cal}{\bf H} = &\textstyle \sum \limits{m,m' = 1}^{M} \sum \limits_{g,g' = 1}^{G} f_{m}({\bf HM}{g}{\bf DR}{m}{\bf O}{m}) \overline{f{m'}({\bf HM}{g'}{\bf DR}{m'}{\bf O}{m'})} \cr & \ {\times}\ \exp [2\pi i{\bf H} ({\bf M}{g}{\bf T}{m}{\bf + t}{g} {\bf - M}{g'}{\bf T}{m'}{\bf - t}_{g'})], & (8)}]](https://doi.org/teximages/ba5008fd9.gif)

where the overline means `complex conjugate'. The positional variables entering into this expression are successively determined by using different approximations to [I^{\rm cal}_{\bf H}] and, accordingly, CC_I_. The protocol consists of three main steps.

The actual protocol in AMoRe differs from the one above mainly in the rotational search. The ROTING program, based on the fast rotation function proposed by Crowther, is used to determine the possible orientations of the models (Crowther, 1972[[Crowther, R. A. (1972). The Molecular Replacement Method, edited by M. G. Rossmann, pp. 173-178. New York: Gordon & Breach.]](#BB2)). Also, as previously stated, the crystal configurations are assessed with CC_F_ instead of CC_I_. The translations of the oriented models (one-body and n_-body searches) are determined with the TRAING program. Several translation functions have been incorporated, among which the one described in the above protocol, i.e. CC_I as a function of Tm. The refinement of the positional variables is performed with the fast rigid-body refinement program FITING (Castellano et al., 1992[[Castellano, E., Oliva, G. & Navaza, J. (1992). J. Appl. Cryst. 25, 281-284.]](#BB1)). These fast functions will be described in the following section.

A situation where this protocol fails is often one in which a six-dimensional search fails too. As a rule, this corresponds to a poor quality of the search model or a small size of the search fragment with respect to the asymmetric unit content.

The fast structure-factor calculation algorithm (4), the performance of ROTING and the facility of multiple inputs to TRAING and FITING allow for a fast multiple exploration. A link between the input/output of the above programs allows for automation. In fact, three levels of automation may be distinguished.

6. Description of the fast search programs

6.1. The ROTING program

It is possible to determine the rotations R that superimpose a search molecule upon the homologous ones within the target crystal by calculating the overlap within a conveniently chosen region Ω of volume v of the observed Patterson function (the target function Pt) and a rotated version of the Patterson function corresponding to the isolated search molecule (the search function Ps),

[{\cal R}({\bf R}) = {{1}\over{v}} \textstyle \int \limits _{\Omega} P_{t}({\bf r}) P_{s}({\bf R}^{-1}{\bf r}) \,{\rm d}^{3} {\bf r} \eqno (12)]

(Rossmann & Blow, 1962[[Rossmann, M. G. & Blow, D. M. (1962). Acta Cryst. 15, 24-31.]](#BB8)). [\cal R] should display a local maximum for the sought rotations. Note that when we rotate the search function Ps by R, its argument contains R−1.

It may be useful to compare rotation functions obtained under different conditions. For this, some kind of normalization is needed. In fact, [\cal R] is cast into the form of a correlation coefficient by dividing (12) by the norms of the truncated Patterson functions,

![[{\cal R}N({\bf R}) = \textstyle \int \limits{\Omega} P_{t}({\bf r}) P_{s}({\bf R}^{-1}{\bf r}) ,{\rm d}^{3} {\bf r} \bigg/ \left [\textstyle \int \limits_{\Omega} P_{t}({\bf r})^{2} ,{\rm d}^{3} {\bf r} \int \limits_{\Omega} P_{s}({\bf r})^{2} , {\rm d}^{3} {\bf r}\right] ^{1/2}. \eqno (13)]](https://doi.org/teximages/ba5008fd16.gif)

The reciprocal-space formulation of (12) is obtained by replacing the Patterson functions by their Fourier summations

[P({\bf r}) = {\textstyle \sum \limits_{\bf h}} {{I({\bf h})}\over{V}} \exp (-2\pi i{\bf hr}). \eqno (14)]

Taking into account that I(−h) = I(h), we obtain

![[\eqalignno {{\cal R}({\bf R}) & = {\textstyle \sum \limits_{\bf h} \sum \limits_{\bf k}} {{I_{t}({\bf h})}\over{V_{t}}} {{I_{s}({\bf k})}\over{V_{s}}} {{1}\over{v}} \textstyle \int \limits_{\Omega} \exp [2\pi i({\bf h - kR}^{-1}){\bf r}] ,{\rm d}^{3}{\bf r} \cr & = {\textstyle \sum \limits_{\bf h} \sum \limits_{\bf k}} {{I_{t}({\bf h})}\over{V_{t}}} {{I_{s}({\bf k})}\over{V_{s}}} \chi_{\Omega}({\bf h - kR}^{-1}). &(15)}]](https://doi.org/teximages/ba5008fd18.gif)

[\chi_\Omega}] is the Fourier transform of the function that takes the value 1 within Ω and 0 outside. In principle, the domain of integration could have any shape. However, in order to take full advantage of the properties of the rotation group, Ω is usually chosen as a spherical domain of radius b. Letting s = hkR−1 for short, we have

[\eqalignno { \chi_b({\bf s}) & = {{3}\over{4\pi b^{3}}} \textstyle \int \limits_{0}^{b} \int \limits_{0}^{\pi} \int \limits_{0}^{2\pi} \exp (2\pi i{\bf s r}) r^{2} \sin(\theta) \,{\rm d} r\, {\rm d}\theta \,{\rm d}\varphi \cr & = 3 {{\sin(2\pi sb) - 2\pi sb \cos(2 \pi sb)}\over{(2 \pi sb)^{3}}}. & (16)}]

Although simple, the resulting expression for the rotation function has the disadvantage of containing entangled h, k and R contributions, which renders its computation time consuming if the whole domain of rotations has to be explored. The difficulty may be overcome by expanding the exponentials entering into (15) in spherical harmonics, Y l,m. Taking advantage of their transformation under rotations and using recurrence relationships between spherical Bessel functions jl, we obtain

![[\eqalignno {\chi_{b}(&{\bf h - kR}^{-1}) \cr &= {\textstyle \sum \limits_{l = 0}^{\infty}} {{ j_{l}(2\pi hb)j_{l-1}(2\pi kb) 2\pi kb - j_{l}(2\pi kb)j_{l-1}(2\pi hb)2\pi hb }\over{ (2\pi hb)^2 - (2\pi kb)^2 }} \cr &\ \quad {\times}\ {\textstyle \sum \limits_{m,m' = -l}^{l}} \overline{Y_{l,m}({\bf h}/h)} Y_{l,m'}({\bf k}/k) D^{l}{m,m'}({\bf R}) \cr &= {\textstyle \sum \limits{l = 0}^{\infty}} \left { {\textstyle \sum \limits_{n = 1}^{\infty}} 12\pi [2(l+2n)-1] {{j_{l+2n-1}(2\pi hb)}\over{2\pi hb}} {{ j_{l+2n-1}(2\pi kb)}\over{2\pi kb}} \right } \cr &\ \quad {\times}\ {\textstyle \sum \limits_{m,m' = -l}^{l}} \overline{Y_{l,m}({\bf h}/h)} Y_{l,m'}({\bf k}/k) D^{l}_{m,m'}({\bf R}), & (17)}]](https://doi.org/teximages/ba5008fd20.gif)

where [D^{l}_{m,m'}] are the matrices of the irreducible representations of the rotation group. The awkwardness of (17) is apparent rather than real.

This formulation is referred to as the fast rotation function (Crowther, 1972[[Crowther, R. A. (1972). The Molecular Replacement Method, edited by M. G. Rossmann, pp. 173-178. New York: Gordon & Breach.]](#BB2)).

6.2. Computing the fast rotation function

The calculations are organized as follows.

[{\cal R}_N] is used in AMoRe just to select a certain number of peaks. The output of ROTING contains, besides the values of [{\cal R}_N], those of the correlation coefficients (CC_F_ and CC_I_ as in P_1) for each of the selected orientations. CC_F is more efficient, in general.

6.3. The locked rotation function

The rotational NCS, determined with the help of the self-rotation function, may be used to enhance the signal-to-noise ratio of cross-rotation functions (Rossmann et al., 1972[[Rossmann, M. G., Ford, G. C., Watson, H. C. & Banaszak, L. J. (1972). J. Mol. Biol. 64, 237-245.]](#BB9); Tong & Rossmann, 1990[[Tong, L. & Rossmann, M. G. (1990). Acta Cryst. A46, 783-792.]](#BB10)). If Sn, n = 1, …, N denotes the set of NCS rotations, including the identity, and R is a correct orientation of the cross rotation, then SnR must also correspond to a correct orientation. Here, we are assuming that the rotational NCS forms a group. Otherwise, either SnR or [{\bf S}_n^{-1} {\bf R}], but not both, corresponds to another correct orientation. Therefore, a function may be defined, the locked cross rotation, whose values are the average of the values of [\cal R] at orientations related by the NCS,

[{\cal R}_L({\bf R}) = \textstyle \sum \limits_{n = 1}^{N} {\cal R}({\bf S}_{n}{\bf R}) / N. \eqno (24)]

By redefining the target function, it can be computed as an ordinary cross rotation. Indeed, [\cal R]L may be written in a form similar to (12),

![[\eqalignno {{\cal R}L({\bf R}) & = {\textstyle \sum \limits{n = 1}^{N}} {{1}\over{v}} {\textstyle \int \limits_{\Omega}} P_{t}({\bf r}) P_{s}({\bf R}^{-1}{\bf S}{n}^{-1}{\bf r}) , {\rm d}^{3} {\bf r} / N \cr & = {{1}\over{v}} {\textstyle \int \limits{\Omega}} \left[{\textstyle \sum \limits_{n = 1}^{N}} P_{t}({\bf S}{n}{\bf r}) / N \right] P{s}({\bf R}^{-1}{\bf r}) , {\rm d}^{3} {\bf r}, & (25)}]](https://doi.org/teximages/ba5008fd31.gif)

with the target Patterson function substituted by the average over the NCS of the rotated target functions. The computation of (25) is particularly simple in the case of the fast rotation function. The substitution

![[e^{(t)}{l,m,n} \rightarrow \textstyle \sum \limits{m' = -l}^{l} \left[\textstyle \sum \limits_{n = 1}^{N} D^{l}{m,m'}({\bf S}{n}) / N \right] e^{(t)}_{l,m',n}, \eqno (26)]](https://doi.org/teximages/ba5008fd32.gif)

where we replaced the sum over [{\bf S}_n^{-1}] by a sum over Sn, because of the rearrangement theorem of group theory, gives the required target coefficients.

6.4. The TRAING program

The possible translations of an oriented model are selected in AMoRe by means of fast translation functions computed with the TRAING program. The output of this program contains, besides the values of the fast translation function, those of CC_F_, CC_I_ and the R factor for each of the selected translations. Several fast translation functions may be calculated. If we write the Fourier coefficient of the oriented model, rotated by a given Rm and placed at T, as

![[\eqalignno {F^{\rm cal}{\bf H}({\bf T}) &= \textstyle \sum \limits{g = 1}^{G} [f_{m} ({\bf HM}{g}{\bf DR}{m}{\bf O}{m}) \exp (2\pi i{\bf H t}{g})] \exp (2\pi i{\bf H}{\bf M}{g}{\bf T})\cr &= \textstyle \sum \limits{g = 1}^{G} u_{g}^{m}({\bf H}) \exp(2\pi i{\bf H}{\bf M}_{g}{\bf T}) & (27)}]](https://doi.org/teximages/ba5008fd33.gif)

(see equation 3[[link]](#FD4)) and the corresponding intensity as

![[\eqalignno {I^{\rm cal}{\bf H}({\bf T}) =& \textstyle \sum \limits{g,g' = 1}^{G} f_{m}({\bf HM}{g}{\bf DR}{m}{\bf O}{m}) \overline{f{m}({\bf HM}{g'}{\bf DR}{m}{\bf O}{m})} \cr &\times \exp{2\pi i{\bf H} [({\bf M}{g} {\bf - M}{g'}){\bf T}{\bf + t}{g} {\bf - t}{g'}] } \cr =& \textstyle \sum \limits{g,g' = 1}^{G} u_{g}^{m}({\bf H}) \overline{u_{g'}^{m}({\bf H})} \exp[2\pi i{\bf H}({\bf M}{g} {\bf - M}{g'}){\bf T}] & (28)}]](https://doi.org/teximages/ba5008fd34.gif)

(see equation 10[[link]](#FD11)), then the options are (same notation as in equations 6[[link]](#FD7) and 7[[link]](#FD8))

SCAL is a scale factor to subtract the contribution of the phasing position. The complex exponentials in (29[[link]](#FD32)) to (32[[link]](#FD35)) depend on reciprocal vectors H(MgM_g_′), which are in the Cheshire reciprocal cell (Hirshfeld, 1968[[Hirshfeld, F. L. (1968). Acta Cryst. A24, 301-311.]](#BB5)).

6.5. The FITING program

Although FITING is not a search program, we include it here as it is one of the main molecular-replacement programs. It performs rigid-body refinement by a fast technique first proposed by Huber & Schneider (1985[[Huber, R. & Schneider, M. (1985). J. Appl. Cryst. 18, 165-169.]](#BB6)). The quadratic misfit

![[\eqalignno { {\textstyle \sum \limits_{\bf H}} \biggr {|&F^{\rm obs}{\bf H}| - {{\exp(B|{\bf H}|^2)}\over{\lambda}} \biggr|\sum \limits{m = 1}^{M} f_{m}({\bf HM}{g}{\bf DR}{m}{\bf O}{m}) \cr &\times \exp[2\pi i{\bf H}({\bf M}{g}{\bf T}{m} + {\bf t}{g})] \biggr| \biggr}^2 & (36)}]](https://doi.org/teximages/ba5008fd42.gif)

is minimized with respect to the positional variables {Rm, Tm}, the overall scale factor λ and the overall temperature factor B.

References

First citationCastellano, E., Oliva, G. & Navaza, J. (1992). J. Appl. Cryst. 25, 281–284. CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationCrowther, R. A. (1972). The Molecular Replacement Method, edited by M. G. Rossmann, pp. 173–178. New York: Gordon & Breach. Google Scholar
First citationDeLano, W. L. & Brünger, A. T. (1995). Acta Cryst. D51, 740–748. CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationHarada, Y., Lifchitz, A., Berthou, J. & Jolles, P. (1981). Acta Cryst. A37, 398–406. CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationHirshfeld, F. L. (1968). Acta Cryst. A24, 301–311. CrossRef IUCr Journals Web of Science Google Scholar
First citationHuber, R. & Schneider, M. (1985). J. Appl. Cryst. 18, 165–169. CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationNavaza, J. & Vernoslova, E. (1995). Acta Cryst. A51, 445–449. CrossRef CAS Web of Science IUCr Journals Google Scholar
First citationRossmann, M. G. & Blow, D. M. (1962). Acta Cryst. 15, 24–31. CrossRef CAS IUCr Journals Web of Science Google Scholar
First citationRossmann, M. G., Ford, G. C., Watson, H. C. & Banaszak, L. J. (1972). J. Mol. Biol. 64, 237–245. CrossRef CAS PubMed Web of Science Google Scholar
First citationTong, L. & Rossmann, M. G. (1990). Acta Cryst. A46, 783–792. CrossRef CAS Web of Science IUCr Journals Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logo BIOLOGICALCRYSTALLOGRAPHY

ISSN: 1399-0047