Predicting deleterious amino acid substitutions - PubMed (original) (raw)
Comparative Study
Predicting deleterious amino acid substitutions
P C Ng et al. Genome Res. 2001 May.
Abstract
Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT, which sorts intolerant from tolerant substitutions, classifies substitutions as tolerated or deleterious. A higher proportion of substitutions predicted to be deleterious by SIFT gives an affected phenotype than substitutions predicted to be deleterious by substitution scoring matrices in three test cases. Using SIFT before mutagenesis studies could reduce the number of functional assays required and yield a higher proportion of affected phenotypes. may be used to identify plausible disease candidates among the SNPs that cause missense substitutions.
Figures
Figure 1
Sequence conservation corresponds to intolerant positions. (Top) Sequence logo representation (Schneider and Stephens 1990) of the LacI multiple alignment for positions 5–38, a region involved in binding DNA. At each position, the stack of letters indicates which amino acids appear in the alignment, and the total height of the stack is a measure of conservation. (Bottom) Number of substitutions deleterious to LacI function at the corresponding positions (Markiewicz et al. 1994; Suckow et al. 1996). Positions with high conservation, such as 19–23, do not tolerate substitutions. Positions with low conservation, such as 26–28, can tolerate most substitutions. Positions 17 and 18 appear diverse in the alignment but cannot tolerate most substitutions. The side chains of these residues are involved in DNA-specific recognition (Chuprina et al. 1993) that is not conserved among the paralogous sequences.
Figure 2
(A)
SIFT
predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.
SIFT
makes a prediction for every possible substitution, but only substitutions predicted correctly by
SIFT
are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by
SIFT
, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but
SIFT
predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).
Figure 2
(A)
SIFT
predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.
SIFT
makes a prediction for every possible substitution, but only substitutions predicted correctly by
SIFT
are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by
SIFT
, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but
SIFT
predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).
Figure 2
(A)
SIFT
predictions for substitutions in LacI. The effects of 12–13 substitutions at each position were assayed (Markiewicz et al. 1994; Suckow et al. 1996). The number of substitutions above the _X_-axis are those that gave a wild-type phenotype; the number of substitutions below the _X_-axis gave an affected phenotype.
SIFT
makes a prediction for every possible substitution, but only substitutions predicted correctly by
SIFT
are depicted here and are colored in black. Gray bars above the x-axis indicate false positive error; these substitutions were predicted to be deleterious by
SIFT
, when experimentally they gave wild-type phenotype. Gray bars below the x-axis indicate true negative error; these substitutions were predicted to be neutral, but in fact gave an affected phenotype. Amino acid side chains that have been identified as involved in interactions (Chuprina et al. 1993; Bell and Lewis 2000) are labeled as follows: (double helix) those that interact with DNA, (double cylinders) those participating in the dimer interface. (Hexagons) Positions having six or more substitutions that are unable to respond to the inducer (Markiewicz et al. 1994; Pace et al. 1997). Many of the intolerant positions that were predicted to tolerate substitutions correspond to these query-specific positions. (Asterisks) Positions that can tolerate at least six substitutions, but
SIFT
predicted more than half of these substitutions as deleterious. The consensus sequence and the original query sequence, LACI_ECOLI, are shown. (B) BLOSUM62 prediction for substitutions in LacI for positions 1–50 and 101–150. BLOSUM62 performs well in the DNA-binding region (residues 1–50) because this region cannot tolerate many substitutions. However, in a region that tolerates substitutions, such as positions 101–150, BLOSUM62 performs poorly, predicting many experimental false positives (large gray bars above the _X_-axis).
Figure 3
(A) Structure of LacI as a homodimer (light and dark blue strands) with DNA (yellow strand). The N-terminal subdomain whose interface is important for DNA binding and the allosteric mechanism is at the upper part of the figure; the C-terminal domain is at the bottom. The 186 positions tolerant for six or more substitutions are colored in white on one monomer (Markiewicz et al. 1994; Suckow et al. 1996). For 31 of these positions, >50% of the substitutions were predicted to affect phenotype according to
SIFT
when experimentally they did not (see also Fig. 2, asterisks). These positions are shown as space-fill atoms in red. Noticeably, many of these occurred at the bottom face of the C-terminal domain. This structure is 1EFA from PDB (Bell and Lewis 2000). (B) Same figure rotated 90° about the _Z_-axis.
Similar articles
- Dimerisation mutants of Lac repressor. II. A single amino acid substitution, D278L, changes the specificity of dimerisation.
Spott S, Dong F, Kisters-Woike B, Müller-Hill B. Spott S, et al. J Mol Biol. 2000 Feb 18;296(2):673-84. doi: 10.1006/jmbi.1999.3469. J Mol Biol. 2000. PMID: 10669616 - SIFT: Predicting amino acid changes that affect protein function.
Ng PC, Henikoff S. Ng PC, et al. Nucleic Acids Res. 2003 Jul 1;31(13):3812-4. doi: 10.1093/nar/gkg509. Nucleic Acids Res. 2003. PMID: 12824425 Free PMC article. - Lactose repressor protein: functional properties and structure.
Matthews KS, Nichols JC. Matthews KS, et al. Prog Nucleic Acid Res Mol Biol. 1998;58:127-64. doi: 10.1016/s0079-6603(08)60035-5. Prog Nucleic Acid Res Mol Biol. 1998. PMID: 9308365 Review. - Lac repressor genetic map in real space.
Pace HC, Kercher MA, Lu P, Markiewicz P, Miller JH, Chang G, Lewis M. Pace HC, et al. Trends Biochem Sci. 1997 Sep;22(9):334-9. doi: 10.1016/s0968-0004(97)01104-3. Trends Biochem Sci. 1997. PMID: 9301333 Review.
Cited by
- Cross-ancestry analysis identifies genes associated with obesity risk and protection.
Banerjee D, Girirajan S. Banerjee D, et al. medRxiv [Preprint]. 2024 Oct 16:2024.10.13.24315422. doi: 10.1101/2024.10.13.24315422. medRxiv. 2024. PMID: 39484254 Free PMC article. Preprint. - Exome sequencing analysis reveals variants in primary immunodeficiency genes in patients with very early onset inflammatory bowel disease.
Kelsen JR, Dawany N, Moran CJ, Petersen BS, Sarmady M, Sasson A, Pauly-Hubbard H, Martinez A, Maurer K, Soong J, Rappaport E, Franke A, Keller A, Winter HS, Mamula P, Piccoli D, Artis D, Sonnenberg GF, Daly M, Sullivan KE, Baldassano RN, Devoto M. Kelsen JR, et al. Gastroenterology. 2015 Nov;149(6):1415-24. doi: 10.1053/j.gastro.2015.07.006. Epub 2015 Jul 17. Gastroenterology. 2015. PMID: 26193622 Free PMC article. - Charcot-Marie-Tooth gene, SBF2, associated with taxane-induced peripheral neuropathy in African Americans.
Schneider BP, Lai D, Shen F, Jiang G, Radovich M, Li L, Gardner L, Miller KD, O'Neill A, Sparano JA, Xue G, Foroud T, Sledge GW Jr. Schneider BP, et al. Oncotarget. 2016 Dec 13;7(50):82244-82253. doi: 10.18632/oncotarget.12545. Oncotarget. 2016. PMID: 27732968 Free PMC article. Clinical Trial. - Mutations in a TGF-β ligand, TGFB3, cause syndromic aortic aneurysms and dissections.
Bertoli-Avella AM, Gillis E, Morisaki H, Verhagen JMA, de Graaf BM, van de Beek G, Gallo E, Kruithof BPT, Venselaar H, Myers LA, Laga S, Doyle AJ, Oswald G, van Cappellen GWA, Yamanaka I, van der Helm RM, Beverloo B, de Klein A, Pardo L, Lammens M, Evers C, Devriendt K, Dumoulein M, Timmermans J, Bruggenwirth HT, Verheijen F, Rodrigus I, Baynam G, Kempers M, Saenen J, Van Craenenbroeck EM, Minatoya K, Matsukawa R, Tsukube T, Kubo N, Hofstra R, Goumans MJ, Bekkers JA, Roos-Hesselink JW, van de Laar IMBH, Dietz HC, Van Laer L, Morisaki T, Wessels MW, Loeys BL. Bertoli-Avella AM, et al. J Am Coll Cardiol. 2015 Apr 7;65(13):1324-1336. doi: 10.1016/j.jacc.2015.01.040. J Am Coll Cardiol. 2015. PMID: 25835445 Free PMC article. - Genetic basis of pregnancy-associated decreased platelet counts and gestational thrombocytopenia.
Yang Z, Hu L, Zhen J, Gu Y, Liu Y, Huang S, Wei Y, Zheng H, Guo X, Chen GB, Yang Y, Xiong L, Wei F, Liu S. Yang Z, et al. Blood. 2024 Apr 11;143(15):1528-1538. doi: 10.1182/blood.2023021925. Blood. 2024. PMID: 38064665 Free PMC article.
References
- Bell C, Lewis M. A closer view of the conformation of the Lac repressor bound to operator. Nat Struct Biol. 2000;7:209–214. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases