Using substitution probabilities to improve position-specific scoring matrices - PubMed (original) (raw)

Using substitution probabilities to improve position-specific scoring matrices

J G Henikoff et al. Comput Appl Biosci. 1996 Apr.

Abstract

Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'pseudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.

PubMed Disclaimer

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources