Predicting deleterious amino acid substitutions - PubMed (original) (raw)

Comparative Study

Predicting deleterious amino acid substitutions

P C Ng et al. Genome Res. 2001 May.

Abstract

Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT, which sorts intolerant from tolerant substitutions, classifies substitutions as tolerated or deleterious. A higher proportion of substitutions predicted to be deleterious by SIFT gives an affected phenotype than substitutions predicted to be deleterious by substitution scoring matrices in three test cases. Using SIFT before mutagenesis studies could reduce the number of functional assays required and yield a higher proportion of affected phenotypes. may be used to identify plausible disease candidates among the SNPs that cause missense substitutions.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Sequence conservation corresponds to intolerant positions. (Top) Sequence logo representation (Schneider and Stephens 1990) of the LacI multiple alignment for positions 5–38, a region involved in binding DNA. At each position, the stack of letters indicates which amino acids appear in the alignment, and the total height of the stack is a measure of conservation. (Bottom) Number of substitutions deleterious to LacI function at the corresponding positions (Markiewicz et al. 1994; Suckow et al. 1996). Positions with high conservation, such as 19–23, do not tolerate substitutions. Positions with low conservation, such as 26–28, can tolerate most substitutions. Positions 17 and 18 appear diverse in the alignment but cannot tolerate most substitutions. The side chains of these residues are involved in DNA-specific recognition (Chuprina et al. 1993) that is not conserved among the paralogous sequences.

Figure 2

Figure 2

(A)

SIFT

predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.

SIFT

makes a prediction for every possible substitution, but only substitutions predicted correctly by

SIFT

are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by

SIFT

, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but

SIFT

predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).

Figure 2

Figure 2

(A)

SIFT

predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.

SIFT

makes a prediction for every possible substitution, but only substitutions predicted correctly by

SIFT

are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by

SIFT

, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but

SIFT

predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).

Figure 2

Figure 2

(A)

SIFT

predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.

SIFT

makes a prediction for every possible substitution, but only substitutions predicted correctly by

SIFT

are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by

SIFT

, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but

SIFT

predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).

Figure 3

Figure 3

(A) Structure of LacI as a homodimer (light and dark blue strands) with DNA (yellow strand). The N-terminal subdomain whose interface is important for DNA binding and the allosteric mechanism is at the upper part of the figure; the C-terminal domain is at the bottom. The 186 positions tolerant for six or more substitutions are colored in white on one monomer (Markiewicz et al. 1994; Suckow et al. 1996). For 31 of these positions, >50% of the substitutions were predicted to affect phenotype according to

SIFT

when experimentally they did not (see also Fig. 2, asterisks). These positions are shown as space-fill atoms in red. Noticeably, many of these occurred at the bottom face of the C-terminal domain. This structure is 1EFA from PDB (Bell and Lewis 2000). (B) Same figure rotated 90° about the _Z_-axis.

References

    1. Altschul SF. Amino acid matrices from an information theoretic perspective. J Mol Biol. 1991;219:555–565. - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. GappedBLASTand PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acid Res. 2000;28:45–48. - PMC - PubMed
    1. Bell C, Lewis M. A closer view of the conformation of the Lac repressor bound to operator. Nat Struct Biol. 2000;7:209–214. - PubMed
    1. Bentley A, MacLennan B, Calvo J, Dearolf CR. Targeted recovery of mutations in Drosophila. Genetics. 2000;156:1169–1173. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources