Combining evidence using p-values: application to sequence homology searches - PubMed (original) (raw)
Combining evidence using p-values: application to sequence homology searches
T L Bailey et al. Bioinformatics. 1998.
Abstract
Motivation: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches.
Results: In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
Comment in
- Concerning the accuracy of MAST E-values.
Bailey TL, Gribskov M. Bailey TL, et al. Bioinformatics. 2000 May;16(5):488-9. doi: 10.1093/bioinformatics/16.5.488. Bioinformatics. 2000. PMID: 10871274 No abstract available.
Similar articles
- A test for the statistical significance of DNA sequence similarities for application in databank searches.
Mott RF, Kirkwood TB, Curnow RN. Mott RF, et al. Comput Appl Biosci. 1989 Apr;5(2):123-31. doi: 10.1093/bioinformatics/5.2.123. Comput Appl Biosci. 1989. PMID: 2720462 - Methods and statistics for combining motif match scores.
Bailey TL, Gribskov M. Bailey TL, et al. J Comput Biol. 1998 Summer;5(2):211-21. doi: 10.1089/cmb.1998.5.211. J Comput Biol. 1998. PMID: 9672829 - Matching among multiple random sequences.
Naus JI, Sheng KN. Naus JI, et al. Bull Math Biol. 1997 May;59(3):483-96. doi: 10.1007/BF02459461. Bull Math Biol. 1997. PMID: 9172825 - Score distributions for simultaneous matching to multiple motifs.
Bailey TL, Gribskov M. Bailey TL, et al. J Comput Biol. 1997 Spring;4(1):45-59. doi: 10.1089/cmb.1997.4.45. J Comput Biol. 1997. PMID: 9109037 - Estimating statistical significance of sequence alignments.
Waterman M. Waterman M. Philos Trans R Soc Lond B Biol Sci. 1994 Jun 29;344(1310):383-90. doi: 10.1098/rstb.1994.0077. Philos Trans R Soc Lond B Biol Sci. 1994. PMID: 7800708
Cited by
- Conserved sequence motifs in the abiotic stress response protein late embryogenesis abundant 3.
Singh KK, Graether SP. Singh KK, et al. PLoS One. 2020 Aug 6;15(8):e0237177. doi: 10.1371/journal.pone.0237177. eCollection 2020. PLoS One. 2020. PMID: 32760115 Free PMC article. - The Landscape of Mouse Meiotic Double-Strand Break Formation, Processing, and Repair.
Lange J, Yamada S, Tischfield SE, Pan J, Kim S, Zhu X, Socci ND, Jasin M, Keeney S. Lange J, et al. Cell. 2016 Oct 20;167(3):695-708.e16. doi: 10.1016/j.cell.2016.09.035. Epub 2016 Oct 13. Cell. 2016. PMID: 27745971 Free PMC article. - Identification of a Non-Pentapeptide Region Associated with Rapid Mycobacterial Evolution.
Warholm P, Light S. Warholm P, et al. PLoS One. 2016 May 5;11(5):e0154059. doi: 10.1371/journal.pone.0154059. eCollection 2016. PLoS One. 2016. PMID: 27149271 Free PMC article. - Serratia marcescens RamA Expression Is under PhoP-Dependent Control and Modulates Lipid A-Related Gene Transcription and Antibiotic Resistance Phenotypes.
Mariscotti JF, García Véscovi E. Mariscotti JF, et al. J Bacteriol. 2021 Jun 8;203(13):e0052320. doi: 10.1128/JB.00523-20. Epub 2021 Jun 8. J Bacteriol. 2021. PMID: 33927048 Free PMC article. - Maintaining replication origins in the face of genomic change.
Di Rienzi SC, Lindstrom KC, Mann T, Noble WS, Raghuraman MK, Brewer BJ. Di Rienzi SC, et al. Genome Res. 2012 Oct;22(10):1940-52. doi: 10.1101/gr.138248.112. Epub 2012 Jun 4. Genome Res. 2012. PMID: 22665441 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources