The statistical distribution of nucleic acid similarities - PubMed (original) (raw)
Comparative Study
. 1985 Jan 25;13(2):645-56.
doi: 10.1093/nar/13.2.645.
- PMID: 3871073
- PMCID: PMC341021
- DOI: 10.1093/nar/13.2.645
Free PMC article
Comparative Study
The statistical distribution of nucleic acid similarities
T F Smith et al. Nucleic Acids Res. 1985.
Free PMC article
Abstract
All pairs of a large set of known vertebrate DNA sequences were searched by computer for most similar segments. Analysis of this data shows that the computed similarity scores are distributed proportionally to the logarithm of the product of the lengths of the sequences involved. This distribution is closely related to recent results of Erdos and others on the longest run of heads in coin tossing. A simple rule is derived for determination of statistical significance of the similarity scores and to assist in relating statistical and biological significance.
Similar articles
- On the statistical significance of nucleic acid similarities.
Lipman DJ, Wilbur WJ, Smith TF, Waterman MS. Lipman DJ, et al. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):215-26. doi: 10.1093/nar/12.1part1.215. Nucleic Acids Res. 1984. PMID: 6694902 Free PMC article. - An Erdős-Révész Type Law for the Length of the Longest Match of Two Coin-Tossing Sequences.
Grill K. Grill K. Entropy (Basel). 2025 Jan 3;27(1):34. doi: 10.3390/e27010034. Entropy (Basel). 2025. PMID: 39851654 Free PMC article. - The probabilities of similarities in DNA sequence comparisons.
Brooks LD, Weir BS, Schaffer HE. Brooks LD, et al. Genomics. 1988 Oct;3(3):207-16. doi: 10.1016/0888-7543(88)90081-x. Genomics. 1988. PMID: 3224980 - Statistical methods and insights for protein and DNA sequences.
Karlin S, Bucher P, Brendel V, Altschul SF. Karlin S, et al. Annu Rev Biophys Biophys Chem. 1991;20:175-203. doi: 10.1146/annurev.bb.20.060191.001135. Annu Rev Biophys Biophys Chem. 1991. PMID: 1867715 Review. No abstract available. - Statistical analysis of DNA sequences.
Weir BS. Weir BS. J Natl Cancer Inst. 1988 May 18;80(6):395-406. doi: 10.1093/jnci/80.6.395. J Natl Cancer Inst. 1988. PMID: 3285010 Review.
Cited by
- Multiple nuclear proteins bind upstream sequences in the promotor region of a T-cell receptor beta-chain variable-region gene: evidence for tissue specificity.
Royer HD, Reinherz EL. Royer HD, et al. Proc Natl Acad Sci U S A. 1987 Jan;84(1):232-6. doi: 10.1073/pnas.84.1.232. Proc Natl Acad Sci U S A. 1987. PMID: 3025857 Free PMC article. - An Eulerian path approach to local multiple alignment for DNA sequences.
Zhang Y, Waterman MS. Zhang Y, et al. Proc Natl Acad Sci U S A. 2005 Feb 1;102(5):1285-90. doi: 10.1073/pnas.0409240102. Epub 2005 Jan 24. Proc Natl Acad Sci U S A. 2005. PMID: 15668398 Free PMC article. - Towards drug repositioning: a unified computational framework for integrating multiple aspects of drug similarity and disease similarity.
Zhang P, Wang F, Hu J. Zhang P, et al. AMIA Annu Symp Proc. 2014 Nov 14;2014:1258-67. eCollection 2014. AMIA Annu Symp Proc. 2014. PMID: 25954437 Free PMC article. - INDI: a computational framework for inferring drug interactions and their associated recommendations.
Gottlieb A, Stein GY, Oron Y, Ruppin E, Sharan R. Gottlieb A, et al. Mol Syst Biol. 2012 Jul 17;8:592. doi: 10.1038/msb.2012.26. Mol Syst Biol. 2012. PMID: 22806140 Free PMC article. - Poisson, compound Poisson and process approximations for testing statistical significance in sequence comparisons.
Goldstein L, Waterman MS. Goldstein L, et al. Bull Math Biol. 1992 Sep;54(5):785-812. doi: 10.1007/BF02459930. Bull Math Biol. 1992. PMID: 1638260
References
- Nature. 1981 May 14;291(5811):127-31 - PubMed
- Cell. 1980 Jul;20(3):625-37 - PubMed
- Nature. 1983 Jan 20;301(5897):194 - PubMed
- Nature. 1977 Dec 8;270(5637):486-94 - PubMed
- Proc Natl Acad Sci U S A. 1980 Feb;77(2):919-23 - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical