A fast and symmetric DUST implementation to mask low-complexity DNA sequences - PubMed (original) (raw)
A fast and symmetric DUST implementation to mask low-complexity DNA sequences
Aleksandr Morgulis et al. J Comput Biol. 2006 Jun.
Abstract
The DUST module has been used within BLAST for many years to mask low-complexity sequences. In this paper, we present a new implementation of the DUST module that uses the same function to assign a complexity score to a sequence, but uses a different rule by which high-scoring sequences are masked. The new rule masks every nucleotide masked by the old rule and occasionally masks more. The new masking rule corrects two related deficiencies with the old rule. First, the new rule is symmetric with respect to reversing the sequence. Second, the new rule is not context sensitive; the decision to mask a subsequence does not depend on what sequences flank it. The new implementation is at least four times faster than the old on the human genome. We show that both the percentage of additional bases masked and the effect on MegaBLAST outputs are very small.
Similar articles
- WindowMasker: window-based masker for sequenced genomes.
Morgulis A, Gertz EM, Schäffer AA, Agarwala R. Morgulis A, et al. Bioinformatics. 2006 Jan 15;22(2):134-41. doi: 10.1093/bioinformatics/bti774. Epub 2005 Nov 15. Bioinformatics. 2006. PMID: 16287941 - Fast model-based protein homology detection without alignment.
Hochreiter S, Heusel M, Obermayer K. Hochreiter S, et al. Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8. Bioinformatics. 2007. PMID: 17488755 - DIALIGN-T: an improved algorithm for segment-based multiple sequence alignment.
Subramanian AR, Weyer-Menkhoff J, Kaufmann M, Morgenstern B. Subramanian AR, et al. BMC Bioinformatics. 2005 Mar 22;6:66. doi: 10.1186/1471-2105-6-66. BMC Bioinformatics. 2005. PMID: 15784139 Free PMC article. - DNA sequence analysis linguistic tools: contrast vocabularies, compositional spectra and linguistic complexity.
Bolshoy A. Bolshoy A. Appl Bioinformatics. 2003;2(2):103-12. Appl Bioinformatics. 2003. PMID: 15130826 Review.
Cited by
- Sequence analysis of the human virome in febrile and afebrile children.
Wylie KM, Mihindukulasuriya KA, Sodergren E, Weinstock GM, Storch GA. Wylie KM, et al. PLoS One. 2012;7(6):e27735. doi: 10.1371/journal.pone.0027735. Epub 2012 Jun 13. PLoS One. 2012. PMID: 22719819 Free PMC article. - Vy-PER: eliminating false positive detection of virus integration events in next generation sequencing data.
Forster M, Szymczak S, Ellinghaus D, Hemmrich G, Rühlemann M, Kraemer L, Mucha S, Wienbrandt L, Stanulla M; UFO Sequencing Consortium within I-BFM Study Group; Franke A. Forster M, et al. Sci Rep. 2015 Jul 13;5:11534. doi: 10.1038/srep11534. Sci Rep. 2015. PMID: 26166306 Free PMC article. - Region-based analysis with functional annotation identifies genes associated with cognitive function in South Asians from India.
Abu-Amara H, Zhao W, Li Z, Leung YY, Schellenberg GD, Wang LS, Moorjani P, Dey AB, Dey S, Zhou X, Gross AL, Lee J, Kardia SLR, Smith JA. Abu-Amara H, et al. Res Sq [Preprint]. 2024 Aug 10:rs.3.rs-4712660. doi: 10.21203/rs.3.rs-4712660/v1. Res Sq. 2024. PMID: 39149469 Free PMC article. Preprint. - Bacteria pathogens drive host colonic epithelial cell promoter hypermethylation of tumor suppressor genes in colorectal cancer.
Xia X, Wu WKK, Wong SH, Liu D, Kwong TNY, Nakatsu G, Yan PS, Chuang YM, Chan MW, Coker OO, Chen Z, Yeoh YK, Zhao L, Wang X, Cheng WY, Chan MTV, Chan PKS, Sung JJY, Wang MH, Yu J. Xia X, et al. Microbiome. 2020 Jul 16;8(1):108. doi: 10.1186/s40168-020-00847-4. Microbiome. 2020. PMID: 32678024 Free PMC article. - Metagenomic analysis of dental calculus in ancient Egyptian baboons.
Ottoni C, Guellil M, Ozga AT, Stone AC, Kersten O, Bramanti B, Porcier S, Van Neer W. Ottoni C, et al. Sci Rep. 2019 Dec 23;9(1):19637. doi: 10.1038/s41598-019-56074-x. Sci Rep. 2019. PMID: 31873124 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials