Minimap2: pairwise alignment for nucleotide sequences - PubMed (original) (raw)
Minimap2: pairwise alignment for nucleotide sequences
Heng Li. Bioinformatics. 2018.
Abstract
Motivation: Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment.
Availability and implementation: https://github.com/lh3/minimap2.
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures
Fig. 1.
Evaluation on aligning simulated reads. Simulated reads were mapped to the primary assembly of human genome GRCh38. A read is considered correctly mapped if its longest alignment overlaps with the true interval, and the overlap length is ≥10% of the true interval length. Read alignments are sorted by mapping quality in the descending order. For each mapping quality threshold, the fraction of alignments (out of the number of input reads) with mapping quality above the threshold and their error rate are plotted along the curve. (a) Long-read alignment evaluation. 33 088 ≥1000 bp reads were simulated using pbsim (Ono et al., 2013) with error profile sampled from file ‘m131017_060208_42213_*.1.*’ downloaded at
. The N50 read length is 11 628. Aligners were run under the default setting for SMRT reads. Kart outputted all alignments at mapping quality 60, so is not shown in the figure. It mapped nearly all reads with 4.1% of alignments being wrong, less accurate than others. (b) Short-read alignment evaluation. 10 million pairs of 150 bp reads were simulated using mason2 (Holtgrewe, 2010) with option ‘–illumina-prob-mismatch-scale 2.5’. Short-read aligners were run under the default setting except for changing the maximum fragment length to 800 bp
Similar articles
- Arioc: GPU-accelerated alignment of short bisulfite-treated reads.
Wilton R, Li X, Feinberg AP, Szalay AS. Wilton R, et al. Bioinformatics. 2018 Aug 1;34(15):2673-2675. doi: 10.1093/bioinformatics/bty167. Bioinformatics. 2018. PMID: 29554207 Free PMC article. - Accurate spliced alignment of long RNA sequencing reads.
Sahlin K, Mäkinen V. Sahlin K, et al. Bioinformatics. 2021 Dec 11;37(24):4643-4651. doi: 10.1093/bioinformatics/btab540. Bioinformatics. 2021. PMID: 34302453 Free PMC article. - LAMSA: fast split read alignment with long approximate matches.
Liu B, Gao Y, Wang Y. Liu B, et al. Bioinformatics. 2017 Jan 15;33(2):192-201. doi: 10.1093/bioinformatics/btw594. Epub 2016 Sep 25. Bioinformatics. 2017. PMID: 27667793 - A survey of mapping algorithms in the long-reads era.
Sahlin K, Baudeau T, Cazaux B, Marchet C. Sahlin K, et al. Genome Biol. 2023 Jun 1;24(1):133. doi: 10.1186/s13059-023-02972-3. Genome Biol. 2023. PMID: 37264447 Free PMC article. Review. - A comprehensive evaluation of long read error correction methods.
Zhang H, Jain C, Aluru S. Zhang H, et al. BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0. BMC Genomics. 2020. PMID: 33349243 Free PMC article. Review.
Cited by
- A chromosomal reference genome sequence for the malaria mosquito, Anopheles marshallii, Theobald, 1903.
Makanga BK, Ayala D, Rahola N, Bouafou LBA, Johnson HF, Heaton H, Wagah MG, Collins JC, Krasheninnikova K, Pelan SE, Pointon DB, Sims Y, Torrance JW, Tracey A, Uliano-Silva M, Wood JMD, von Wyschetzki K; Wellcome Sanger Institute Scientific Operations: Sequencing Operations; McCarthy SA, Neafsey DE, Makunin A, Lawniczak MKN. Makanga BK, et al. Wellcome Open Res. 2024 Sep 26;9:554. doi: 10.12688/wellcomeopenres.22989.1. eCollection 2024. Wellcome Open Res. 2024. PMID: 39507815 Free PMC article. - Time course transcriptomic profiling suggests Crp/Fnr transcriptional regulation of nosZ gene in a N2O-reducing thermophile.
Tsuchiya J, Mino S, Fujiwara F, Okuma N, Ichihashi Y, Morris RM, Nunn BL, Timmins-Schiffman E, Sawabe T. Tsuchiya J, et al. iScience. 2024 Sep 30;27(11):111074. doi: 10.1016/j.isci.2024.111074. eCollection 2024 Nov 15. iScience. 2024. PMID: 39507244 Free PMC article.
References
- Abouelhoda M.I., Ohlebusch E. (2005) Chaining algorithms for multiple genome comparison. J. Discrete Algorithms, 3, 321–341.
- Altschul S.F., Erickson B.W. (1986) Optimal sequence alignment using affine gap costs. Bull. Math. Biol., 48, 603–616. - PubMed
- Berlin K. et al. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous