Using substitution probabilities to improve position-specific scoring matrices - PubMed (original) (raw)
Using substitution probabilities to improve position-specific scoring matrices
J G Henikoff et al. Comput Appl Biosci. 1996 Apr.
Abstract
Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'pseudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.
Similar articles
- Improved sensitivity of profile searches through the use of sequence weights and gap excision.
Thompson JD, Higgins DG, Gibson TJ. Thompson JD, et al. Comput Appl Biosci. 1994 Feb;10(1):19-29. doi: 10.1093/bioinformatics/10.1.19. Comput Appl Biosci. 1994. PMID: 8193951 - Rapid protein structure classification using one-dimensional structure profiles on the bioSCAN parallel computer.
Hoffman DL, Laiter S, Singh RK, Vaisman II, Tropsha A. Hoffman DL, et al. Comput Appl Biosci. 1995 Dec;11(6):675-9. doi: 10.1093/bioinformatics/11.6.675. Comput Appl Biosci. 1995. PMID: 8808584 - Searching for distantly related protein sequences in large databases by parallel processing on a transputer machine.
Vogt G, Argos P. Vogt G, et al. Comput Appl Biosci. 1992 Feb;8(1):49-55. doi: 10.1093/bioinformatics/8.1.49. Comput Appl Biosci. 1992. PMID: 1568125 - Protein database searches using compositionally adjusted substitution matrices.
Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schäffer AA, Yu YK. Altschul SF, et al. FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review. - Scores for sequence searches and alignments.
Henikoff S. Henikoff S. Curr Opin Struct Biol. 1996 Jun;6(3):353-60. doi: 10.1016/s0959-440x(96)80055-8. Curr Opin Struct Biol. 1996. PMID: 8804821 Review.
Cited by
- Post-translational modification prediction via prompt-based fine-tuning of a GPT-2 model.
Shrestha P, Kandel J, Tayara H, Chong KT. Shrestha P, et al. Nat Commun. 2024 Aug 7;15(1):6699. doi: 10.1038/s41467-024-51071-9. Nat Commun. 2024. PMID: 39107330 Free PMC article. - An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models.
Anderson T, Wheeler TJ. Anderson T, et al. BMC Bioinformatics. 2024 Jul 29;25(1):247. doi: 10.1186/s12859-024-05879-3. BMC Bioinformatics. 2024. PMID: 39075359 Free PMC article. - Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information.
Milchevskiy YV, Milchevskaya VY, Nikitin AM, Kravatsky YV. Milchevskiy YV, et al. Int J Mol Sci. 2023 Oct 27;24(21):15656. doi: 10.3390/ijms242115656. Int J Mol Sci. 2023. PMID: 37958639 Free PMC article. - De novo protein design by inversion of the AlphaFold structure prediction network.
Goverde CA, Wolf B, Khakzad H, Rosset S, Correia BE. Goverde CA, et al. Protein Sci. 2023 Jun;32(6):e4653. doi: 10.1002/pro.4653. Protein Sci. 2023. PMID: 37165539 Free PMC article. - Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons.
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Oliveira LS, et al. Viruses. 2023 Feb 13;15(2):519. doi: 10.3390/v15020519. Viruses. 2023. PMID: 36851733 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Other Literature Sources