ABySS | Genome Sciences Centre (original) (raw)
ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
To assemble transcriptome data, see Trans-ABySS.
Awards
June 2015, 12th [BC]2 Conference in Basel, Switzerland: ABySS was the winner of the Swiss Institute of Bioinformatics’ inaugural International Bioinformatics Resource Award. Read more....
Publications
- ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I. Genome Research, 2017 27: 768-777. (Genome Research, PubMed)
- ABySS: A parallel assembler for short read sequence data. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. Genome Research, 2009-June. (Genome Research, PubMed)
- De novo Transcriptome Assembly with ABySS. İnanç Birol, Shaun D Jackman, Cydney Nielsen, Jenny Q Qian, Richard Varhol, Greg Stazyk, Ryan D Morin, Yongjun Zhao, Martin Hirst, Jacqueline E Schein, Doug E Horsman, Joseph M Connors, Randy D Gascoyne, Marco A Marra and Steven JM Jones. Bioinformatics. 2009-June. (Bioinformatics Advance Access)
- De novo assembly and analysis of RNA-seq data. Gordon Robertson, Jacqueline Schein, Readman Chiu, Richard Corbett, Matthew Field, Shaun D Jackman, Karen Mungall, Sam Lee, Hisanaga Mark Okada, Jenny Q Qian, Malachi Griffith, Anthony Raymond, Nina Thiessen, Timothee Cezard, Yaron S Butterfield, Richard Newsome, Simon K Chan, Rong She, Richard Varhol, Baljit Kamoh, Anna-Liisa Prabhu, Angela Tam, YongJun Zhao, Richard A Moore, Martin Hirst, Marco A Marra, Steven J M Jones, Pamela A Hoodless Marco A Marra, Steven J M Jones, Pamela A Hoodless and İnanç Birol. Nature Methods. 2010-Oct. (Nature)
Current Release
All Releases
Version | Released | Description | Licenses | Status |
---|---|---|---|---|
2.1.5 | Dec 04, 2018 | Compiler fixes and increase stack size limits to avoid stack overflows. | GPLv3 | final |
2.1.4 | Nov 09, 2018 | This release provides major improvements to Bloom filter assembly contiguity and correctness. Bloom filter assemblies now have equivalent scaffold contiguity and better correctness than MPI assemblies of the same data, while still requiring less than 1/10th of the memory. On human, Bloom filter assembly times are still a few hours longer than MPI assemblies (e.g. 17 hours vs. 13 hours, using 48 threads). | GPLv3 | final |
2.1.3 | Nov 05, 2018 | This release fixes a SAM-formatting bug that broke the ABySS-LR pipeline (Tigmint/ARCS). | GPLv3 | final |
2.1.2 | Oct 24, 2018 | This release improves scaffold N50 on human by ~10%, due to implementation of a new `--median` option for `DistanceEst` (thanks to @lcoombe!). This release also adds a new `--max-cost` option for `konnector` and `abyss-sealer` that curbs indeterminately long running times, particularly at low k values. | GPLv3 | final |
2.1.1 | Sep 11, 2018 | This release provides bug fixes and modest improvements to Bloom filter assembly contiguity/correctness. Parallelization of Sealer has also been improved, thanks to contributions by @schutzekatze. | GPLv3 | final |
2.1.0 | Apr 13, 2018 | This release adds support for misassembly correction and scaffolding using linked reads, using Tigmint and ARCS. (Tigmint and ARCS must be installed separately.) In addition, simultaneous optimization of `s` (seed length) and `n` (min supporting read pairs / Chromium barcodes) is now supported during scaffolding. | GPLv3 | final |
2.0.3 | Mar 14, 2018 | This minor release provides bug fixes and improved reliability for both MPI assemblies and Bloom filter assemblies on large datasets. In addition, many usability improvements have been made to the `abyss-samtobreak` program for misasssembly assessment. | GPLv3 | final |
2.0.2 | Oct 21, 2016 | Fix compile errors with gcc-6 and boost-1.62. | GPLv3 | final |
2.0.1 | Sep 14, 2016 | This release resolves some licensing issues with that were pointed out in 2.0.0. As of 2.0.1, ABySS is now available under a standard GPL-3 license, and the libraries included under `lib/rolling-hash` and `lib/bloomfilter` are now also licensed under GPL-3. For alternative licensing terms, please contact Patrick Rebstein (prebstein at bccancer.bc.ca). | GPLv3 | final |
2.0.0 | Sep 01, 2016 | This release introduces a new Bloom filter assembly mode that enables large genome assemblies with minimal memory (e.g. 34 GB for H. sapiens with 76X coverage bfc-corrected reads). Bloom filter assemblies are currently less contiguous than the default (MPI) assembly mode but are still of high quality (e.g. 3.5 Mbp vs. 4.8 Mbp scaffold NG50 for H. sapiens). Bloom filter assembly mode is enabled by adding three 'abyss-pe' parameters (B = *Bloom filter size*, H = *number of Bloom filter hash functions*, kc = *k-mer coverage threshold*). See 'README.md' for an example. This release also updates several 'abyss-pe' parameter defaults to be more suitable for large genome assemblies with recent Illumina data. In addition, ABySS 2.0.0 includes minor usability improvements for 'abyss-sealer' and removes an unnecessary build dependency on sqlite3. | BCCA (academic use) | final |
1.9.0 | May 29, 2015 | This release introduces a new paired de Bruijn graph mode for assembly. In paired de Bruijn graph mode, ordinary k-mers are replaced by k-mer pairs, where each k-mer pair is separated by a fixed-size gap. The primary advantage of paired de Bruijn graph mode is that the span of a k-mer pair can be arbitrarily wide without consuming additional memory, and thus provides improved scalability for assemblies of long sequencing reads. This release also introduces a new tool called Sealer for closing scaffold gaps, new Konnector functionality for producing long pseudo-reads, and support for the DIDA (Distributed Indexing Disptached Alignment) parallel alignment framework. | BCCA (academic use) | final |
1.5.2 | Jul 10, 2014 | In this release we introduce Konnector, a fast and memory-efficient tool to fill the gap between paired-end reads. Konnector determines the intervening sequence by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph. A companion tool called abyss-bloom is also provided which can be used to construct reusable bloom filter files for input to Konnector; otherwise, Konnector will build an in-memory Bloom filter for one-time use. In addition to Konnector, we have fixed bugs related to compiling with GCC 4.8+ and parsing BWA output SAM files. | GPLv3 for non-commercial usage | final |
1.5.1 | May 08, 2014 | In this release we fix a compatibility issue with Trans-ABySS 1.5.0 where the output of abyss-filtergraph is not strand-specific. Also, we include additional FCC portability fixes. | GPLv3 for non-commercial usage | final |
1.5.0 | May 01, 2014 | In this release we have added full strand specific RNA-Seq support such that output contigs are correctly oriented with respect to the original transcripts sequenced. Also, there are new parameters to abyss-pe, xtip and Q, that are used to improve assembly in high coverage regions like highly expressed transcripts. Setting xtip=1 will more aggressively remove certain tips. The 'Q' parameter will prevent low quality bases from being used in the assembly. The version has been bumped to 1.5.0 to signify compatibility with Trans-ABySS 1.5.0. | GPLv3 for non-commercial usage | final |
1.3.7 | Dec 11, 2013 | Scaffolds can now be rescaffolded using long sequences such as RNA-Seq assemblies produced from Trans-ABySS. Added support for gcc 4.8+ and Mac OS X 10.9 Mavericks with clang. Finally, we've licensed ABySS under GPL for non-commercial purposes. Please read the LICENSE file for more details. | GPLv3 for non-commercial usage | final |
1.3.6 | Jul 31, 2013 | ABYSS and ABYSS-P are now ~20% faster. Fixed many portability issues and bugs, and improved some error messages. | BCCA (academic use) | final |
1.3.5 | Mar 05, 2013 | This release introduces new tools to merge overlapping read pairs, layout and merge contigs with perfect sequence overlap, and calculate contig contiguity and correctness metrics. Also, it includes updates to the existing documentation, bug fixes, and attempts to fill scaffold gaps with a consensus of all paths between contigs. | BCCA (academic use) | final |
1.3.4 | May 30, 2012 | This release eliminates two sources of misassemblies, one in the path extension logic of SimpleGraph. Two, the default value of m, which is the minimum overlap required between two contigs to merge them, is increased from 30 to 50. This release also fixes various portability issues. A new script, abyss-fatoagp, is included to create an AGP file for GenBank submission. | BCCA (academic use) | final |
1.3.3 | Mar 13, 2012 | Specify the minimum alignment length when aligning the reads to the contigs with the parameter l. Improve the scaffolding algorithm that identifies repeats. Improve the documentation. | BCCA (academic use) | final |
1.3.2 | Dec 13, 2011 | Improve distance estimates between contigs, enable scaffolding by default, and remove small shim contigs that don't add useful sequence to the assembly. The default aligner is abyss-map. MergePaths uses a non-greedy algorithm that reduces sequence duplication but may reduce contiguity. | BCCA (academic use) | final |
1.3.1 | Oct 24, 2011 | Fix a bug in KAligner and fix a compiler error for Mac OS X. | BCCA (academic use) | final |
1.3.0 | Sep 09, 2011 | Mate-pair data can be used to scaffold contigs. Specify your mate-pair libraries using the `mp' parameter of abyss-pe. | BCCA (academic use) | final |
1.2.7 | Apr 15, 2011 | Support using bwa or bowtie to align reads to contigs. New parameter, d, to specify the acceptable error of a distance estimate. | BCCA (academic use) | final |
1.2.6 | Feb 07, 2011 | Sequence variants are popped if the two variants are at least 90% similar. Contigs that overlap by fewer than k-1 bp are found and may be merged. | BCCA (academic use) | final |
1.2.5 | Nov 15, 2010 | Fix a colour-space-specific bug and a bug causing the error Assertion `fstSol.size() == 1' failed. | BCCA (academic use) | final |
1.2.4 | Oct 14, 2010 | Replace gaps of Ns that span a region of ambiguous sequence with a consensus sequence of the possible sequences that fill the gap. The consensus sequence uses IUPAC-IUB ambiguity codes. | BCCA (academic use) | final |
1.2.3 | Sep 08, 2010 | Fix two bugs that caused the error messages: Assertion `m_comm.receiveEmpty()' failed. and error: unexpected ID | BCCA (academic use) | final |
1.2.2 | Aug 25, 2010 | Merge contigs after popping bubbles. Handle multi-line FASTA sequences. Report the amount of memory used. | BCCA (academic use) | final |
1.2.1 | Jul 12, 2010 | Handle mate pair libraries with reverse-forward orientation as produced by circular, large-fragment libraries. Distance estimates are improved. | BCCA (academic use) | final |
1.2.0 | May 26, 2010 | Scaffold over gaps in coverage and unresolved repeats. Read sequence from SAM and BAM files. Set q=3 by default. Set E=0 when coverage is low (<2). Generate a Graphviz dot file of the paired-end assembly. | BCCA (academic use) | final |
1.1.2 | Feb 15, 2010 | Pop bubbles resulting from indels. Read tar files. Fix performance issues in ParseAligns by syncing KAligner threads periodically. | BCCA (academic use) | final |
1.1.1 | Jan 19, 2010 | Pop complex bubbles either completely or not at all. Choose better (typically lower) default values for the parameters e and c. | AFL | final |
1.1.0 | Dec 21, 2009 | ABySS will expand tandem repeats when it is possible to determine the exact number of the repeat. The paired-end path-finding algorithm, SimpleGraph, is multithreaded. Fixed a bug in MergePaths that could misassemble repeats larger than the paired-end fragment size. The output format of AdjList, DistanceEst and SimpleGraph has changed. | AFL | final |
1.0.9 | May 15, 2009 | Significantly reduce the memory usage of KAligner and ParseAligns. abyss-pe can read multiple input files and read FASTA or FASTQ files. | AFL | final |
1.0.8 | Apr 02, 2009 | Fix the bug causing the error Assertion `marked == split' failed. | AFL | final |
1.0.7 | Mar 31, 2009 | The parallel MPI assembler is now deterministic; it will produce the same result every time. | AFL | final |
1.0.6 | Mar 25, 2009 | Fix a race condition in the erosion algorithm. | AFL | final |
1.0.5 | Mar 11, 2009 | Portability fixes. | AFL | final |
1.0.4 | Mar 09, 2009 | Remove the need to specify the parameters -e,--erode and -b,--bubbles. Use less disk space by using pipes to avoid intermediate files. Many improvements to the paired-end algorithm. | BCCA (academic use) | final |
1.0.3 | Feb 05, 2009 | Tidy up the ends of blunt contigs. Merge blunt contigs that are connected by pairs and overlap. | BCCA (academic use) | final |
1.0.2 | Nov 21, 2008 | Include a parallel binary compiled for OpenMPI. | BCCA (academic use) | final |
1.0.16 | Nov 13, 2009 | Improve the performance and memory usage of KAligner and AdjList, particularly for very large data sets. | AFL | final |
1.0.15 | Oct 19, 2009 | New parameters, e and E, to set the coverage threshold of the erosion algorithm. Values for the parameters e and the coverage threshold, c, will be chosen automatically if they're not specified. The read length is now an optional parameter. Two important bug fixes, see below. | AFL | final |
1.0.14 | Sep 08, 2009 | Assemble multiple libraries of different fragment sizes. | AFL | final |
1.0.13 | Aug 26, 2009 | Read files compressed with gzip (.gz) or bzip2 (.bz2). | AFL | final |
1.0.12 | Aug 19, 2009 | Both ABYSS and KAligner are run only once per assembly, which speeds up the paired-end assembly stage by nearly a factor of two. The k-mer coverage information is correct in every contig file. A tool is included to convert colour-space contigs to nucleotide contigs. Discard reads that fail the chastity filter. | AFL | final |
1.0.11 | Jul 21, 2009 | Assemble colour-space reads. Read files in qseq format. KAligner is multithreaded. Integrate with Sun Grid Engine (SGE). Prevent misassemblies mediated by tandem segmental duplications. | AFL | final |
1.0.10 | Jun 18, 2009 | ParseAligns is improved to handle any number of reads as long as mate pairs are found interleaved in the same file. Merge overlapping paired-end contigs that were previously being missed in some situations. Number paired-end contigs so that their IDs do not overlap with the single-end contigs. | AFL | final |
1.0 | Aug 07, 2008 | Initial version of abyss. | BCCA (academic use) | final |