RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data - PubMed (original) (raw)
RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data
Yongan Zhao et al. Bioinformatics. 2012.
Abstract
Summary: With the wide application of next-generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20-90-fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g. 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.
Availability and implementation: Implemented in C++, the source code is freely available for download at the RAPSearch2 website: http://omics.informatics.indiana.edu/mg/RAPSearch2/.
Contact: yye@indiana.edu
Supplementary information: Available at the RAPSearch2 website.
Similar articles
- RAPSearch: a fast protein similarity search tool for short reads.
Ye Y, Choi JH, Tang H. Ye Y, et al. BMC Bioinformatics. 2011 May 15;12:159. doi: 10.1186/1471-2105-12-159. BMC Bioinformatics. 2011. PMID: 21575167 Free PMC article. - SWORD-a highly efficient protein database search.
Vaser R, Pavlović D, Šikić M. Vaser R, et al. Bioinformatics. 2016 Sep 1;32(17):i680-i684. doi: 10.1093/bioinformatics/btw445. Bioinformatics. 2016. PMID: 27587689 - muBLASTP: database-indexed protein sequence search on multicore CPUs.
Zhang J, Misra S, Wang H, Feng WC. Zhang J, et al. BMC Bioinformatics. 2016 Nov 4;17(1):443. doi: 10.1186/s12859-016-1302-4. BMC Bioinformatics. 2016. PMID: 27809763 Free PMC article. - Review of alignment and SNP calling algorithms for next-generation sequencing data.
Mielczarek M, Szyda J. Mielczarek M, et al. J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review. - Identifying local associations in biological time series: algorithms, statistical significance, and applications.
Ai D, Chen L, Xie J, Cheng L, Zhang F, Luan Y, Li Y, Hou S, Sun F, Xia LC. Ai D, et al. Brief Bioinform. 2023 Sep 22;24(6):bbad390. doi: 10.1093/bib/bbad390. Brief Bioinform. 2023. PMID: 37930023 Review.
Cited by
- A pilot study to disentangle the infant gut microbiota composition and identification of bacteria correlates with high fat mass.
Mancabelli L, Milani C, Fontana F, Liotto N, Tabasso C, Perrone M, Lugli GA, Tarracchini C, Alessandri G, Viappiani A, Bernasconi S, Roggero P, Mosca F, Turroni F, Ventura M. Mancabelli L, et al. Microbiome Res Rep. 2023 Jun 25;2(3):23. doi: 10.20517/mrr.2023.11. eCollection 2023. Microbiome Res Rep. 2023. PMID: 38046821 Free PMC article. - Nitrogen Fertilizer Application Alters the Root Endophyte Bacterial Microbiome in Maize Plants, but Not in the Stem or Rhizosphere Soil.
Miranda-Carrazco A, Navarro-Noya YE, Govaerts B, Verhulst N, Dendooven L. Miranda-Carrazco A, et al. Microbiol Spectr. 2022 Dec 21;10(6):e0178522. doi: 10.1128/spectrum.01785-22. Epub 2022 Oct 18. Microbiol Spectr. 2022. PMID: 36255324 Free PMC article. - The dynamics of the midgut microbiome in Aedes aegypti during digestion reveal putative symbionts.
Salgado JFM, Premkrishnan BNV, Oliveira EL, Vettath VK, Goh FG, Hou X, Drautz-Moses DI, Cai Y, Schuster SC, Junqueira ACM. Salgado JFM, et al. PNAS Nexus. 2024 Aug 1;3(8):pgae317. doi: 10.1093/pnasnexus/pgae317. eCollection 2024 Aug. PNAS Nexus. 2024. PMID: 39157462 Free PMC article. - Microbial communities in the tropical air ecosystem follow a precise diel cycle.
Gusareva ES, Acerbi E, Lau KJX, Luhung I, Premkrishnan BNV, Kolundžija S, Purbojati RW, Wong A, Houghton JNI, Miller D, Gaultier NE, Heinle CE, Clare ME, Vettath VK, Kee C, Lim SBY, Chénard C, Phung WJ, Kushwaha KK, Nee AP, Putra A, Panicker D, Yanqing K, Hwee YZ, Lohar SR, Kuwata M, Kim HL, Yang L, Uchida A, Drautz-Moses DI, Junqueira ACM, Schuster SC. Gusareva ES, et al. Proc Natl Acad Sci U S A. 2019 Nov 12;116(46):23299-23308. doi: 10.1073/pnas.1908493116. Epub 2019 Oct 28. Proc Natl Acad Sci U S A. 2019. PMID: 31659049 Free PMC article. - The human gallbladder microbiome is related to the physiological state and the biliary metabolic profile.
Molinero N, Ruiz L, Milani C, Gutiérrez-Díaz I, Sánchez B, Mangifesta M, Segura J, Cambero I, Campelo AB, García-Bernardo CM, Cabrera A, Rodríguez JI, González S, Rodríguez JM, Ventura M, Delgado S, Margolles A. Molinero N, et al. Microbiome. 2019 Jul 4;7(1):100. doi: 10.1186/s40168-019-0712-8. Microbiome. 2019. PMID: 31272480 Free PMC article.
References
- Altschul S.F., et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. - PubMed
- Dinsdale E.A., et al. Functional metagenomic profiling of nine biomes. Nature. 2008;452:629–632. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials