Space-efficient whole genome comparisons with Burrows-Wheeler transforms - PubMed (original) (raw)
Review
Space-efficient whole genome comparisons with Burrows-Wheeler transforms
Ross A Lippert. J Comput Biol. 2005 May.
Abstract
The starting point for any alignment of mammalian genomes is the computation of exact matches satisfying various criteria. Time-efficient, O(n), data structures for this computation, such as the suffix tree, require O(n log(n)) space, several times the space of the genomes themselves. Thus, any reasonable whole-genome comparative project finds itself requiring tens of Gigabytes of RAM to maintain time-efficiency. This is beyond most modern workstations. With a new data structure, the compressed suffix array (CSA) implemented via the Burrows-Wheeler transform, we can trade time-efficiency for space-efficiency, taking O(n log(n)) time, but running in O(n) space, typically in total space less than or equal to that of the genomes themselves. If space is more expensive than time, this is an appropriate approach to consider. The most space-efficient implementation of this data structure requires 5 bits per nucleotide character to build on-line, in the worst case, and 2.5 bits per character to store once built. We present a description of this data structure and how it is used to obtain matches. An implementation (called bbbwt) is demonstrated by aligning two mammalian genomes on a modest workstation equipped with under 2 GB of free RAM in time superior to that of the implementations of other data structures.
Similar articles
- A space-efficient construction of the Burrows-Wheeler transform for genomic data.
Lippert RA, Mobarry CM, Walenz BP. Lippert RA, et al. J Comput Biol. 2005 Sep;12(7):943-51. doi: 10.1089/cmb.2005.12.943. J Comput Biol. 2005. PMID: 16201914 Review. - Graphical pan-genome analysis with compressed suffix trees and the Burrows-Wheeler transform.
Baier U, Beller T, Ohlebusch E. Baier U, et al. Bioinformatics. 2016 Feb 15;32(4):497-504. doi: 10.1093/bioinformatics/btv603. Epub 2015 Oct 26. Bioinformatics. 2016. PMID: 26504144 - Efficient maximal repeat finding using the burrows-wheeler transform and wavelet tree.
Külekci MO, Vitter JS, Xu B. Külekci MO, et al. IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):421-9. doi: 10.1109/TCBB.2011.127. Epub 2011 Sep 27. IEEE/ACM Trans Comput Biol Bioinform. 2012. PMID: 21968959 - Space efficient computation of rare maximal exact matches between multiple sequences.
Ohlebusch E, Kurtz S. Ohlebusch E, et al. J Comput Biol. 2008 May;15(4):357-77. doi: 10.1089/cmb.2007.0105. J Comput Biol. 2008. PMID: 18361760 - Computational biology: toward deciphering gene regulatory information in mammalian genomes.
Ji H, Wong WH. Ji H, et al. Biometrics. 2006 Sep;62(3):645-63. doi: 10.1111/j.1541-0420.2006.00625.x. Biometrics. 2006. PMID: 16984301 Review.
Cited by
- CHOP: haplotype-aware path indexing in population graphs.
Mokveld T, Linthorst J, Al-Ars Z, Holstege H, Reinders M. Mokveld T, et al. Genome Biol. 2020 Mar 11;21(1):65. doi: 10.1186/s13059-020-01963-y. Genome Biol. 2020. PMID: 32160922 Free PMC article. - GSAlign: an efficient sequence alignment tool for intra-species genomes.
Lin HN, Hsu WL. Lin HN, et al. BMC Genomics. 2020 Feb 24;21(1):182. doi: 10.1186/s12864-020-6569-1. BMC Genomics. 2020. PMID: 32093618 Free PMC article. - Fast and accurate short read alignment with Burrows-Wheeler transform.
Li H, Durbin R. Li H, et al. Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. Bioinformatics. 2009. PMID: 19451168 Free PMC article. - Entropic Profiler - detection of conservation in genomes using information theory.
Fernandes F, Freitas AT, Almeida JS, Vinga S. Fernandes F, et al. BMC Res Notes. 2009 May 5;2:72. doi: 10.1186/1756-0500-2-72. BMC Res Notes. 2009. PMID: 19416538 Free PMC article. - Aging Human Hematopoietic Stem Cells Manifest Profound Epigenetic Reprogramming of Enhancers That May Predispose to Leukemia.
Adelman ER, Huang HT, Roisman A, Olsson A, Colaprico A, Qin T, Lindsley RC, Bejar R, Salomonis N, Grimes HL, Figueroa ME. Adelman ER, et al. Cancer Discov. 2019 Aug;9(8):1080-1101. doi: 10.1158/2159-8290.CD-18-1474. Epub 2019 May 13. Cancer Discov. 2019. PMID: 31085557 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources