Ultrafast and memory-efficient alignment of short DNA sequences to the human genome - PubMed (original) (raw)
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Ben Langmead et al. Genome Biol. 2009.
Abstract
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source (http://bowtie.cbcb.umd.edu).
Figures
Figure 1
Burrows-Wheeler transform. (a) The Burrows-Wheeler matrix and transformation for 'acaacg'. (b) Steps taken by EXACTMATCH to identify the range of rows, and thus the set of reference suffixes, prefixed by 'aac'. (c) UNPERMUTE repeatedly applies the last first (LF) mapping to recover the original text (in red on the top line) from the Burrows-Wheeler transform (in black in the rightmost column).
Figure 2
Exact matching versus inexact alignment. Illustration of how EXACTMATCH (top) and Bowtie's aligner (bottom) proceed when there is no exact match for query 'ggta' but there is a one-mismatch alignment when 'a' is replaced by 'g'. Boxed pairs of numbers denote ranges of matrix rows beginning with the suffix observed up to that point. A red X marks where the algorithm encounters an empty range and either aborts (as in EXACTMATCH) or backtracks (as in the inexact algorithm). A green check marks where the algorithm finds a nonempty range delimiting one or more occurrences of a reportable alignment for the query.
Figure 3
The three phases of the Bowtie algorithm for the Maq-like policy. A three-phase approach finds alignments for two-mismatch cases 1 to 4 while minimizing backtracking. Phase 1 uses the mirror index and invokes the aligner to find alignments for cases 1 and 2. Phases 2 and 3 cooperate to find alignments for case 3: Phase 2 finds partial alignments with mismatches only in the hi-half, and phase 3 attempts to extend those partial alignments into full alignments. Finally, phase 3 invokes the aligner to find alignments for case 4.
Comment in
- The need for speed.
Flicek P. Flicek P. Genome Biol. 2009;10(3):212. doi: 10.1186/gb-2009-10-3-212. Epub 2009 Mar 27. Genome Biol. 2009. PMID: 19344490 Free PMC article.
Similar articles
- Fast and accurate long-read alignment with Burrows-Wheeler transform.
Li H, Durbin R. Li H, et al. Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15. Bioinformatics. 2010. PMID: 20080505 Free PMC article. - Fast and memory efficient approach for mapping NGS reads to a reference genome.
Kumar S, Agarwal S, Ranvijay. Kumar S, et al. J Bioinform Comput Biol. 2019 Apr;17(2):1950008. doi: 10.1142/S0219720019500082. J Bioinform Comput Biol. 2019. PMID: 31057068 - Ψ-RA: a parallel sparse index for genomic read alignment.
Oğuzhan Külekci M, Hon WK, Shah R, Scott Vitter J, Xu B. Oğuzhan Külekci M, et al. BMC Genomics. 2011;12 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2164-12-S2-S7. Epub 2011 Jul 27. BMC Genomics. 2011. PMID: 21989248 Free PMC article. - SRmapper: a fast and sensitive genome-hashing alignment tool.
Gontarz PM, Berger J, Wong CF. Gontarz PM, et al. Bioinformatics. 2013 Feb 1;29(3):316-21. doi: 10.1093/bioinformatics/bts712. Epub 2012 Dec 24. Bioinformatics. 2013. PMID: 23267171 - A survey of sequence alignment algorithms for next-generation sequencing.
Li H, Homer N. Li H, et al. Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review.
Cited by
- The de novo design and synthesis of yeast chromosome XIII facilitates investigations on aging.
Zhou C, Wang Y, Huang Y, An Y, Fu X, Yang D, Wang Y, Zhang J, Mitchell LA, Bader JS, Cai Y, Dai J, Boeke JD, Cai Z, Xie Z, Shen Y, Huang W. Zhou C, et al. Nat Commun. 2024 Nov 22;15(1):10139. doi: 10.1038/s41467-024-54130-3. Nat Commun. 2024. PMID: 39578428 Free PMC article. - Pooled CRISPR screens with joint single-nucleus chromatin accessibility and transcriptome profiling.
Yan RE, Corman A, Katgara L, Wang X, Xue X, Gajic ZZ, Sam R, Farid M, Friedman SM, Choo J, Raimondi I, Ganesan S, Katsevich E, Greenfield JP, Dahmane N, Sanjana NE. Yan RE, et al. Nat Biotechnol. 2024 Nov 21. doi: 10.1038/s41587-024-02475-x. Online ahead of print. Nat Biotechnol. 2024. PMID: 39572737 - Remodeling of Il4-Il13-Il5 locus underlies selective gene expression.
Nagashima H, Shayne J, Jiang K, Petermann F, Pękowska A, Kanno Y, O'Shea JJ. Nagashima H, et al. Nat Immunol. 2024 Dec;25(12):2220-2233. doi: 10.1038/s41590-024-02007-4. Epub 2024 Nov 20. Nat Immunol. 2024. PMID: 39567762 - Single-molecule states link transcription factor binding to gene expression.
Doughty BR, Hinks MM, Schaepe JM, Marinov GK, Thurm AR, Rios-Martinez C, Parks BE, Tan Y, Marklund E, Dubocanin D, Bintu L, Greenleaf WJ. Doughty BR, et al. Nature. 2024 Dec;636(8043):745-754. doi: 10.1038/s41586-024-08219-w. Epub 2024 Nov 20. Nature. 2024. PMID: 39567683 - Enhancers and genome conformation provide complex transcriptional control of a herpesviral gene.
Morgens DW, Gulyas L, Mao X, Rivera-Madera A, Souza AS, Glaunsinger BA. Morgens DW, et al. Mol Syst Biol. 2024 Nov 19. doi: 10.1038/s44320-024-00075-0. Online ahead of print. Mol Syst Biol. 2024. PMID: 39562742
References
- Down TA, Rakyan VK, Turner DJ, Flicek P, Li H, Kulesha E, Graf S, Johnson N, Herrero J, Tomazou EM, Thorne NP, Backdahl L, Herberth M, Howe KL, Jackson DK, Miretti MM, Marioni JC, Birney E, Hubbard TJ, Durbin R, Tavare S, Beck S. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nat Biotechnol. 2008;26:779–785. doi: 10.1038/nbt1414. - DOI - PMC - PubMed
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. doi: 10.1038/nature07517. - DOI - PMC - PubMed
- Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford-Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V, Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. doi: 10.1038/nature07485. - DOI - PMC - PubMed
Publication types
MeSH terms
Grants and funding
- R01 GM083873/GM/NIGMS NIH HHS/United States
- R01-LM006845/LM/NLM NIH HHS/United States
- R01 LM006845-10/LM/NLM NIH HHS/United States
- R01 HG004885/HG/NHGRI NIH HHS/United States
- R01 LM006845-09/LM/NLM NIH HHS/United States
- R01 LM006845/LM/NLM NIH HHS/United States
- R01-GM083873/GM/NIGMS NIH HHS/United States
- R01 GM083873-06/GM/NIGMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources