Data mining of metal ion environments present in protein structures - PubMed (original) (raw)

Data mining of metal ion environments present in protein structures

Heping Zheng et al. J Inorg Biochem. 2008 Sep.

Abstract

Analysis of metal-protein interaction distances, coordination numbers, B-factors (displacement parameters), and occupancies of metal-binding sites in protein structures determined by X-ray crystallography and deposited in the PDB shows many unusual values and unexpected correlations. By measuring the frequency of each amino acid in metal ion-binding sites, the positive or negative preferences of each residue for each type of cation were identified. Our approach may be used for fast identification of metal-binding structural motifs that cannot be identified on the basis of sequence similarity alone. The analysis compares data derived separately from high and medium-resolution structures from the PDB with those from very high-resolution small-molecule structures in the Cambridge Structural Database (CSD). For high-resolution protein structures, the distribution of metal-protein or metal-water interaction distances agrees quite well with data from CSD, but the distribution is unrealistically wide for medium (2.0-2.5A) resolution data. Our analysis of cation B-factors versus average B-factors of atoms in the cation environment reveals substantial numbers of structures contain either an incorrect metal ion assignment or an unusual coordination pattern. Correlation between data resolution and completeness of the metal coordination spheres is also found.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Calcium-oxygen and magnesium-oxygen distance distributions for complete and incomplete coordination spheres. (A) Calcium-oxygen distance distribution for incomplete coordination spheres (CN<5) and (C) calcium-oxygen distance distribution for complete coordination spheres (CN≥5). Magnesium-oxygen distance distributions for incomplete and complete coordination spheres are shown in B and D respectively. The vertical axes give the number of interactions and the horizontal axes the distance between metal and oxygen.

Figure 2

Figure 2

Metal-oxygen coordination sphere for various resolutions and coordination sphere components (calcium A,C and magnesium B,D). A,B: The horizontal axes give the coordination number and the vertical axes the percentage of structures for each data set. The cyan bars (left) correspond to structures with a resolution of 1.5Å or better, the violet bars (middle) correspond to structures with resolution between 2.0Å and 2.5Å, and the yellow bars (right) correspond to structures with a resolution worse than 2.5Å. C,D: the relative fractions of coordination sphere components. Cyan fractions (top) correspond to interaction with water, magenta (3rd from top, for 2–8) to bidentate coordination from a carboxyl group from Asp/Glu, yellow (2nd from top) to non-bidentate interaction with amino acid oxygen, and violet (bottom) to interaction with oxygen from a non-proteinaceous ligand (see the online version of this article for the colors).

Figure 3

Figure 3

Scatter plots of mean _B_-factor of coordinating oxygens _versus B_-factor for (A) calcium and (B) magnesium ions. The histograms show the percentage of _B_-factor difference outliers for (C) calcium and (D) magnesium as a function of resolution. The cyan bars (left) show the percentage of points where the difference between the metal _B_-factor and the mean _B_-factor of its coordinating atoms is bigger than 5 Å2. The yellow bars (right) show the percentage of points where the difference is bigger than 10 Å2 (see the online version of this article for the colors).

Figure 4

Figure 4

Distributions of calcium-to-protein-oxygen and calcium-to-water distances for different resolution ranges. The vertical axes give the number of interactions in each distance bin and the horizontal axes give the distance between calcium and oxygen. Distributions are made for both oxygen from protein (A, C, E) and oxygen from water (B, D, F). Data from the CSD (A, B), PDB high resolution data (C, D), and PDB moderate resolution data (E, F) are plotted individually.

Figure 5

Figure 5

Unusual metal atom model parameters. (A) An atom identified as magnesium with unusually long Mg-O distances (PDB code: 1JUB; Mg A850). (B) (C) Two atoms identified as magnesium in a structure with multiple geometry problems (PDB code: 1Q9Q) (see the online version of this article for the colors).

Figure 6

Figure 6

Re-interpretation of a magnesium binding site as calcium (PDB code: 2AS8; Mg 1001). (A) The binding site of an atom identified as magnesium with unusually long Mg-O distances. (B) Re-refinement of the same structure, after identifying the metal atom as calcium. The histograms below show the distance distributions of the metal binding site before (C) and after (D) re-interpretation. The vertical axes give the percentage of structures and the horizontal axes the metal-oxygen distance in Å. The blue lines (diamond) represent the Mg-O (C) or Ca-O (D) distance distributions for CSD data. The magenta lines (square) represent the distance distributions for high resolution PDB data. The orange bars are the magnesium-oxygen distances in 2AS8 structure (C), while the cyan bars are the calcium-oxygen distances after re-interpretation of the structure (D) (see the online version of this article for the colors).

Similar articles

Cited by

References

    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. Nucleic Acids Res. 2000;28:235–242. - PMC - PubMed
    1. Evans PR. Acta Cryst. D. 2007;63:58–61. - PMC - PubMed
    1. Engh R, Huber R. Acta Cryst. A. 1991;47:392–400.
    1. Jaskolski M, Gilski M, Dauter Z, Wlodawer A. Acta Cryst. D. 2007;63:611–620. - PubMed
    1. Harding M. Acta Cryst. D. 2001;57:401–411. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources