SeqAn an efficient, generic C++ library for sequence analysis - PubMed (original) (raw)

SeqAn an efficient, generic C++ library for sequence analysis

Andreas Döring et al. BMC Bioinformatics. 2008.

Abstract

Background: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion: We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.

PubMed Disclaimer

Figures

Figure 1

Genome comparison tools and their algorithmic components.

Figure 2

SeqAn Contents Overview.

Figure 3

Runtimes of String Matching Algorithms. We compared three exact string matching algorithms from SeqAn with the member function basic_string::find of the standard library, as it was implemented for Microsoft Visual C++. The left figure shows the runtimes (in ms) for searching a DNA sequence (human chromosome 21), the right figure for searching a proteine database. The search pattern was taken randomly from the sequence. The figures show the average time needed to find all occurrences of patterns of a given length.

Cited by

Genetic links between ovarian ageing, cancer risk and de novo mutation rates.
Stankovic S, Shekari S, Huang QQ, Gardner EJ, Ivarsdottir EV, Owens NDL, Mavaddat N, Azad A, Hawkes G, Kentistou KA, Beaumont RN, Day FR, Zhao Y, Jonsson H, Rafnar T, Tragante V, Sveinbjornsson G, Oddsson A, Styrkarsdottir U, Gudmundsson J, Stacey SN, Gudbjartsson DF; Breast Cancer Association Consortium; Kennedy K, Wood AR, Weedon MN, Ong KK, Wright CF, Hoffmann ER, Sulem P, Hurles ME, Ruth KS, Martin HC, Stefansson K, Perry JRB, Murray A. Stankovic S, et al. Nature. 2024 Sep;633(8030):608-614. doi: 10.1038/s41586-024-07931-x. Epub 2024 Sep 11. Nature. 2024. PMID: 39261734 Free PMC article.
Identification of tumor rejection antigens and the immunologic landscape of medulloblastoma.
Yang C, Trivedi V, Dyson K, Gu T, Candelario KM, Yegorov O, Mitchell DA. Yang C, et al. Genome Med. 2024 Aug 19;16(1):102. doi: 10.1186/s13073-024-01363-y. Genome Med. 2024. PMID: 39160595 Free PMC article.
Meiosis-specific decoupling of the pericentromere from the kinetochore.
Pan B, Bruno M, Macfarlan TS, Akera T. Pan B, et al. bioRxiv [Preprint]. 2024 Jul 22:2024.07.21.604490. doi: 10.1101/2024.07.21.604490. bioRxiv. 2024. PMID: 39091844 Free PMC article. Preprint.
Multi-level 3D genome organization deteriorates during breast cancer progression.
Rossini R, Oshaghi M, Nekrasov M, Bellanger A, Domaschenz R, Dijkwel Y, Abdelhalim M, Collas P, Tremethick D, Paulsen J. Rossini R, et al. bioRxiv [Preprint]. 2023 Nov 27:2023.11.26.568711. doi: 10.1101/2023.11.26.568711. bioRxiv. 2023. PMID: 38076897 Free PMC article. Preprint.
invMap: a sensitive mapping tool for long noisy reads with inversion structural variants.
Wei ZG, Bu PY, Zhang XD, Liu F, Qian Y, Wu FX. Wei ZG, et al. Bioinformatics. 2023 Dec 1;39(12):btad726. doi: 10.1093/bioinformatics/btad726. Bioinformatics. 2023. PMID: 38058196 Free PMC article.

References

1. Venter JC, Reinert K, et al. The Sequence of the Human Genome. Science. 2001;291:1145–1434.
1. Myers EW. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM. 1999;46:395–415.
1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. - PubMed
1. Manber U, Myers E. SODA'90: Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics; 1990. Suffix arrays: a new method for on-line string searches; pp. 319–327.
1. Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KHJ, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC. A Whole-Genome Assembly of Drosophila. Science. 2000;287:2196–2204. - PubMed

MeSH terms

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

SeqAn an efficient, generic C++ library for sequence analysis - PubMed (original) (raw)