BLAST and FASTA similarity searching for multiple sequence alignment - PubMed (original) (raw)
BLAST and FASTA similarity searching for multiple sequence alignment
William R Pearson. Methods Mol Biol. 2014.
Abstract
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Similar articles
- Flexible sequence similarity searching with the FASTA3 program package.
Pearson WR. Pearson WR. Methods Mol Biol. 2000;132:185-219. doi: 10.1385/1-59259-192-2:185. Methods Mol Biol. 2000. PMID: 10547837 - Selecting the Right Similarity-Scoring Matrix.
Pearson WR. Pearson WR. Curr Protoc Bioinformatics. 2013;43:3.5.1-3.5.9. doi: 10.1002/0471250953.bi0305s43. Curr Protoc Bioinformatics. 2013. PMID: 24509512 Free PMC article. - Database similarity searches.
Plewniak F. Plewniak F. Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24. Methods Mol Biol. 2008. PMID: 18592192 - Sequence Similarity Searching.
Hu G, Kurgan L. Hu G, et al. Curr Protoc Protein Sci. 2019 Feb;95(1):e71. doi: 10.1002/cpps.71. Epub 2018 Aug 13. Curr Protoc Protein Sci. 2019. PMID: 30102464 Review. - An introduction to sequence similarity ("homology") searching.
Pearson WR. Pearson WR. Curr Protoc Bioinformatics. 2013 Jun;Chapter 3:3.1.1-3.1.8. doi: 10.1002/0471250953.bi0301s42. Curr Protoc Bioinformatics. 2013. PMID: 23749753 Free PMC article. Review.
Cited by
- Resurrecting Enzymes by Ancestral Sequence Reconstruction.
Mascotti ML. Mascotti ML. Methods Mol Biol. 2022;2397:111-136. doi: 10.1007/978-1-0716-1826-4_7. Methods Mol Biol. 2022. PMID: 34813062 - Bioinformatics Goes Viral: I. Databases, Phylogenetics and Phylodynamics Tools for Boosting Virus Research.
Vello F, Filippini F, Righetto I. Vello F, et al. Viruses. 2024 Sep 6;16(9):1425. doi: 10.3390/v16091425. Viruses. 2024. PMID: 39339901 Free PMC article. Review. - Innovative molecular diagnosis of Trichinella species based on β-carbonic anhydrase genomic sequence.
Zolfaghari Emameh R, Kuuslahti M, Näreaho A, Sukura A, Parkkila S. Zolfaghari Emameh R, et al. Microb Biotechnol. 2016 Mar;9(2):172-9. doi: 10.1111/1751-7915.12327. Epub 2015 Dec 7. Microb Biotechnol. 2016. PMID: 26639312 Free PMC article. - Contrasted Micro-Eukaryotic Diversity Associated with Sphagnum Mosses in Tropical, Subtropical and Temperate Climatic Zones.
Singer D, Metz S, Unrein F, Shimano S, Mazei Y, Mitchell EAD, Lara E. Singer D, et al. Microb Ecol. 2019 Oct;78(3):714-724. doi: 10.1007/s00248-019-01325-7. Epub 2019 Feb 12. Microb Ecol. 2019. PMID: 30756135 - Microbiology, genomics, and clinical significance of the Pseudomonas fluorescens species complex, an unappreciated colonizer of humans.
Scales BS, Dickson RP, LiPuma JJ, Huffnagle GB. Scales BS, et al. Clin Microbiol Rev. 2014 Oct;27(4):927-48. doi: 10.1128/CMR.00044-14. Clin Microbiol Rev. 2014. PMID: 25278578 Free PMC article. Review.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials