Amino acid substitution matrices from an information theoretic perspective - PubMed (original) (raw)

Comparative Study

Amino acid substitution matrices from an information theoretic perspective

S F Altschul. J Mol Biol. 1991.

Abstract

Protein sequence alignments have become an important tool for molecular biologists. Local alignments are frequently constructed with the aid of a "substitution score matrix" that specifies a score for aligning each pair of amino acid residues. Over the years, many different substitution matrices have been proposed, based on a wide variety of rationales. Statistical results, however, demonstrate that any such matrix is implicitly a "log-odds" matrix, with a specific target distribution for aligned pairs of amino acid residues. In the light of information theory, it is possible to express the scores of a substitution matrix in bits and to see that different matrices are better adapted to different purposes. The most widely used matrix for protein sequence comparison has been the PAM-250 matrix. It is argued that for database searches the PAM-120 matrix generally is more appropriate, while for comparing two specific proteins with suspected homology the PAM-200 matrix is indicated. Examples discussed include the lipocalins, human alpha 1 B-glycoprotein, the cystic fibrosis transmembrane conductance regulator and the globins.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Altschul S.F., Erickson B.W. A nonlinear measure of subalignment similarity and its significance levels. Bull. Math. Biol. 1986;48:617–632. - PubMed
    1. Altschul S.F., Lipman D.J. Vol. 87. 1990. Protein database searches for multiple alignments; pp. 5509–5513. (Proc. Nat. Acad. Sci., U.S.A.). - PMC - PubMed
    1. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
    1. Argos P. A sensitive procedure to compare amino acid sequences. J. Mol. Biol. 1987;193:385–396. - PubMed
    1. Armstrong J., Niemann H., Smeekens S., Rottier P., Warren G. Sequence and topology of a model intracellular membrane protein. El glycoprotein. from a coronavirus. Nature (London) 1984;308:751–752. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources