How well is enzyme function conserved as a function of pairwise sequence identity? - PubMed (original) (raw)
. 2003 Oct 31;333(4):863-82.
doi: 10.1016/j.jmb.2003.08.057.
Affiliations
- PMID: 14568541
- DOI: 10.1016/j.jmb.2003.08.057
Free article
How well is enzyme function conserved as a function of pairwise sequence identity?
Weidong Tian et al. J Mol Biol. 2003.
Free article
Abstract
Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified.
Similar articles
- Enzyme function less conserved than anticipated.
Rost B. Rost B. J Mol Biol. 2002 Apr 26;318(2):595-608. doi: 10.1016/S0022-2836(02)00016-5. J Mol Biol. 2002. PMID: 12051862 - EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference.
Tian W, Arakaki AK, Skolnick J. Tian W, et al. Nucleic Acids Res. 2004 Dec 1;32(21):6226-39. doi: 10.1093/nar/gkh956. Print 2004. Nucleic Acids Res. 2004. PMID: 15576349 Free PMC article. - A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods.
Tan JX, Lv H, Wang F, Dao FY, Chen W, Ding H. Tan JX, et al. Curr Drug Targets. 2019;20(5):540-550. doi: 10.2174/1389450119666181002143355. Curr Drug Targets. 2019. PMID: 30277150 Review. - Profiling the orphan enzymes.
Sorokina M, Stam M, Médigue C, Lespinet O, Vallenet D. Sorokina M, et al. Biol Direct. 2014 Jun 6;9:10. doi: 10.1186/1745-6150-9-10. Biol Direct. 2014. PMID: 24906382 Free PMC article. Review.
Cited by
- DLKcat cannot predict meaningful k cat values for mutants and unfamiliar enzymes.
Kroll A, Lercher MJ. Kroll A, et al. Biol Methods Protoc. 2024 Aug 24;9(1):bpae061. doi: 10.1093/biomethods/bpae061. eCollection 2024. Biol Methods Protoc. 2024. PMID: 39346751 Free PMC article. - PEZy-miner: An artificial intelligence driven approach for the discovery of plastic-degrading enzyme candidates.
Jiang R, Yue Z, Shang L, Wang D, Wei N. Jiang R, et al. Metab Eng Commun. 2024 Sep 5;19:e00248. doi: 10.1016/j.mec.2024.e00248. eCollection 2024 Dec. Metab Eng Commun. 2024. PMID: 39310048 Free PMC article. - A large-scale assessment of sequence database search tools for homology-based protein function prediction.
Zhang C, Freddolino L. Zhang C, et al. Brief Bioinform. 2024 May 23;25(4):bbae349. doi: 10.1093/bib/bbae349. Brief Bioinform. 2024. PMID: 39038936 Free PMC article. - Metagenomic profiling of halites from the Atacama Desert: an extreme environment with natural perchlorate does not promote high diversity of perchlorate reducing microorganisms.
Cadena S, Cerqueda-García D, Uribe-Flores MM, Ramírez SI. Cadena S, et al. Extremophiles. 2024 Apr 25;28(2):25. doi: 10.1007/s00792-024-01342-6. Extremophiles. 2024. PMID: 38664270 - The thiol methyltransferase activity of TMT1A (METTL7A) is conserved across species.
González Dalmasy JM, Fitzsimmons CM, Frye WJE, Perciaccante AJ, Jewell CP, Jenkins LM, Batista PJ, Robey RW, Gottesman MM. González Dalmasy JM, et al. Chem Biol Interact. 2024 May 1;394:110989. doi: 10.1016/j.cbi.2024.110989. Epub 2024 Apr 3. Chem Biol Interact. 2024. PMID: 38574836
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials