Methods and algorithms for statistical analysis of protein sequences (original) (raw)

Abstract

We describe several protein sequence statistics designed to evaluate distinctive attributes of residue content and arrangement in primary structure. Considered are global compositional biases, local clustering of different residue types (e.g., charged residues, hydrophobic residues, Ser/Thr), long runs of charged or uncharged residues, periodic patterns, counts and distribution of homooligopeptides, and unusual spacings between particular residue types. The computer program SAPS (statistical analysis of protein sequences) calculates all the statistics for any individual protein sequence input and is available for the UNIX environment through electronic mail on request to V.B. (volker/genomic@stanford.edu).

2002

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2247–2249. doi: 10.1093/nar/19.suppl.2247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2241–2245. doi: 10.1093/nar/19.suppl.2241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bause E. Structural requirements of N-glycosylation of proteins. Studies with proline peptides as conformational probes. Biochem J. 1983 Feb 1;209(2):331–336. doi: 10.1042/bj2090331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Benezra R., Davis R. L., Lockshon D., Turner D. L., Weintraub H. The protein Id: a negative regulator of helix-loop-helix DNA binding proteins. Cell. 1990 Apr 6;61(1):49–59. doi: 10.1016/0092-8674(90)90214-y. [DOI] [PubMed] [Google Scholar]
  5. Biou V., Gibrat J. F., Levin J. M., Robson B., Garnier J. Secondary structure prediction: combination of three different methods. Protein Eng. 1988 Sep;2(3):185–191. doi: 10.1093/protein/2.3.185. [DOI] [PubMed] [Google Scholar]
  6. Blochlinger K., Bodmer R., Jack J., Jan L. Y., Jan Y. N. Primary structure and expression of a product from cut, a locus involved in specifying sensory organ identity in Drosophila. Nature. 1988 Jun 16;333(6174):629–635. doi: 10.1038/333629a0. [DOI] [PubMed] [Google Scholar]
  7. Brendel V., Dohlman J., Blaisdell B. E., Karlin S. Very long charge runs in systemic lupus erythematosus-associated autoantigens. Proc Natl Acad Sci U S A. 1991 Feb 15;88(4):1536–1540. doi: 10.1073/pnas.88.4.1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brendel V., Karlin S. Association of charge clusters with functional domains of cellular transcription factors. Proc Natl Acad Sci U S A. 1989 Aug;86(15):5698–5702. doi: 10.1073/pnas.86.15.5698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Karlin S., Altschul S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. doi: 10.1073/pnas.87.6.2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Karlin S., Blaisdell B. E., Brendel V. Identification of significant sequence patterns in proteins. Methods Enzymol. 1990;183:388–402. doi: 10.1016/0076-6879(90)83026-6. [DOI] [PubMed] [Google Scholar]
  11. Karlin S., Brendel V., Bucher P. Significant similarity and dissimilarity in homologous proteins. Mol Biol Evol. 1992 Jan;9(1):152–167. doi: 10.1093/oxfordjournals.molbev.a040704. [DOI] [PubMed] [Google Scholar]
  12. Karlin S., Brendel V. Charge configurations in oncogene products and transforming proteins. Oncogene. 1990 Jan;5(1):85–95. [PubMed] [Google Scholar]
  13. Karlin S., Bucher P., Brendel V., Altschul S. F. Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem. 1991;20:175–203. doi: 10.1146/annurev.bb.20.060191.001135. [DOI] [PubMed] [Google Scholar]
  14. Krebs E. G., Beavo J. A. Phosphorylation-dephosphorylation of enzymes. Annu Rev Biochem. 1979;48:923–959. doi: 10.1146/annurev.bi.48.070179.004423. [DOI] [PubMed] [Google Scholar]
  15. Kyte J., Doolittle R. F. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982 May 5;157(1):105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  16. Landschulz W. H., Johnson P. F., McKnight S. L. The DNA binding domain of the rat liver nuclear protein C/EBP is bipartite. Science. 1989 Mar 31;243(4899):1681–1688. doi: 10.1126/science.2494700. [DOI] [PubMed] [Google Scholar]
  17. Leung M. Y., Blaisdell B. E., Burge C., Karlin S. An efficient algorithm for identifying matches with errors in multiple long molecular sequences. J Mol Biol. 1991 Oct 20;221(4):1367–1378. doi: 10.1016/0022-2836(91)90938-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]