Predicting deleterious amino acid substitutions - PubMed (original) (raw)
Comparative Study
Predicting deleterious amino acid substitutions
P C Ng et al. Genome Res. 2001 May.
Abstract
Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT, which sorts intolerant from tolerant substitutions, classifies substitutions as tolerated or deleterious. A higher proportion of substitutions predicted to be deleterious by SIFT gives an affected phenotype than substitutions predicted to be deleterious by substitution scoring matrices in three test cases. Using SIFT before mutagenesis studies could reduce the number of functional assays required and yield a higher proportion of affected phenotypes. may be used to identify plausible disease candidates among the SNPs that cause missense substitutions.
Figures
Figure 1
Sequence conservation corresponds to intolerant positions. (Top) Sequence logo representation (Schneider and Stephens 1990) of the LacI multiple alignment for positions 5–38, a region involved in binding DNA. At each position, the stack of letters indicates which amino acids appear in the alignment, and the total height of the stack is a measure of conservation. (Bottom) Number of substitutions deleterious to LacI function at the corresponding positions (Markiewicz et al. 1994; Suckow et al. 1996). Positions with high conservation, such as 19–23, do not tolerate substitutions. Positions with low conservation, such as 26–28, can tolerate most substitutions. Positions 17 and 18 appear diverse in the alignment but cannot tolerate most substitutions. The side chains of these residues are involved in DNA-specific recognition (Chuprina et al. 1993) that is not conserved among the paralogous sequences.
Figure 2
(A)
SIFT
predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.
SIFT
makes a prediction for every possible substitution, but only substitutions predicted correctly by
SIFT
are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by
SIFT
, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but
SIFT
predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).
Figure 2
(A)
SIFT
predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.
SIFT
makes a prediction for every possible substitution, but only substitutions predicted correctly by
SIFT
are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by
SIFT
, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but
SIFT
predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).
Figure 2
(A)
SIFT
predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.
SIFT
makes a prediction for every possible substitution, but only substitutions predicted correctly by
SIFT
are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by
SIFT
, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but
SIFT
predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).
Figure 3
(A) Structure of LacI as a homodimer (light and dark blue strands) with DNA (yellow strand). The N-terminal subdomain whose interface is important for DNA binding and the allosteric mechanism is at the upper part of the figure; the C-terminal domain is at the bottom. The 186 positions tolerant for six or more substitutions are colored in white on one monomer (Markiewicz et al. 1994; Suckow et al. 1996). For 31 of these positions, >50% of the substitutions were predicted to affect phenotype according to
SIFT
when experimentally they did not (see also Fig. 2, asterisks). These positions are shown as space-fill atoms in red. Noticeably, many of these occurred at the bottom face of the C-terminal domain. This structure is 1EFA from PDB (Bell and Lewis 2000). (B) Same figure rotated 90° about the _Z_-axis.
Similar articles
- Dimerisation mutants of Lac repressor. II. A single amino acid substitution, D278L, changes the specificity of dimerisation.
Spott S, Dong F, Kisters-Woike B, Müller-Hill B. Spott S, et al. J Mol Biol. 2000 Feb 18;296(2):673-84. doi: 10.1006/jmbi.1999.3469. J Mol Biol. 2000. PMID: 10669616 - SIFT: Predicting amino acid changes that affect protein function.
Ng PC, Henikoff S. Ng PC, et al. Nucleic Acids Res. 2003 Jul 1;31(13):3812-4. doi: 10.1093/nar/gkg509. Nucleic Acids Res. 2003. PMID: 12824425 Free PMC article. - Lactose repressor protein: functional properties and structure.
Matthews KS, Nichols JC. Matthews KS, et al. Prog Nucleic Acid Res Mol Biol. 1998;58:127-64. doi: 10.1016/s0079-6603(08)60035-5. Prog Nucleic Acid Res Mol Biol. 1998. PMID: 9308365 Review. - Lac repressor genetic map in real space.
Pace HC, Kercher MA, Lu P, Markiewicz P, Miller JH, Chang G, Lewis M. Pace HC, et al. Trends Biochem Sci. 1997 Sep;22(9):334-9. doi: 10.1016/s0968-0004(97)01104-3. Trends Biochem Sci. 1997. PMID: 9301333 Review.
Cited by
- The barley Uniculme4 gene encodes a BLADE-ON-PETIOLE-like protein that controls tillering and leaf patterning.
Tavakol E, Okagaki R, Verderio G, Shariati J V, Hussien A, Bilgic H, Scanlon MJ, Todt NR, Close TJ, Druka A, Waugh R, Steuernagel B, Ariyadasa R, Himmelbach A, Stein N, Muehlbauer GJ, Rossini L. Tavakol E, et al. Plant Physiol. 2015 May;168(1):164-74. doi: 10.1104/pp.114.252882. Epub 2015 Mar 27. Plant Physiol. 2015. PMID: 25818702 Free PMC article. - Mapping the signal peptide binding and oligomer contact sites of the core subunit of the pea twin arginine protein translocase.
Ma X, Cline K. Ma X, et al. Plant Cell. 2013 Mar;25(3):999-1015. doi: 10.1105/tpc.112.107409. Epub 2013 Mar 19. Plant Cell. 2013. PMID: 23512851 Free PMC article. - Identification of rare protein disulfide isomerase gene variants in amyotrophic lateral sclerosis patients.
Gonzalez-Perez P, Woehlbier U, Chian RJ, Sapp P, Rouleau GA, Leblond CS, Daoud H, Dion PA, Landers JE, Hetz C, Brown RH. Gonzalez-Perez P, et al. Gene. 2015 Jul 25;566(2):158-65. doi: 10.1016/j.gene.2015.04.035. Epub 2015 Apr 22. Gene. 2015. PMID: 25913742 Free PMC article. - X-linked adrenoleukodystrophy: molecular and functional analysis of the ABCD1 gene in Argentinean patients.
Amorosi CA, Myskóva H, Monti MR, Argaraña CE, Morita M, Kemp S, Dodelson de Kremer R, Dvoráková L, Oller de Ramírez AM. Amorosi CA, et al. PLoS One. 2012;7(12):e52635. doi: 10.1371/journal.pone.0052635. Epub 2012 Dec 31. PLoS One. 2012. PMID: 23300730 Free PMC article. - Novel brain expression of ClC-1 chloride channels and enrichment of CLCN1 variants in epilepsy.
Chen TT, Klassen TL, Goldman AM, Marini C, Guerrini R, Noebels JL. Chen TT, et al. Neurology. 2013 Mar 19;80(12):1078-85. doi: 10.1212/WNL.0b013e31828868e7. Epub 2013 Feb 13. Neurology. 2013. PMID: 23408874 Free PMC article.
References
- Bell C, Lewis M. A closer view of the conformation of the Lac repressor bound to operator. Nat Struct Biol. 2000;7:209–214. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases