ABySS | Genome Sciences Centre (original) (raw)

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

To assemble transcriptome data, see Trans-ABySS.

Awards

June 2015, 12th [BC]2 Conference in Basel, Switzerland: ABySS was the winner of the Swiss Institute of Bioinformatics’ inaugural International Bioinformatics Resource Award. Read more....

Publications

Current Release

GitHub release page for ABySS

All Releases

Version Released Description Licenses Status
2.1.5 Dec 04, 2018 Compiler fixes and increase stack size limits to avoid stack overflows. GPLv3 final
2.1.4 Nov 09, 2018 This release provides major improvements to Bloom filter assembly contiguity and correctness. Bloom filter assemblies now have equivalent scaffold contiguity and better correctness than MPI assemblies of the same data, while still requiring less than 1/10th of the memory. On human, Bloom filter assembly times are still a few hours longer than MPI assemblies (e.g. 17 hours vs. 13 hours, using 48 threads). GPLv3 final
2.1.3 Nov 05, 2018 This release fixes a SAM-formatting bug that broke the ABySS-LR pipeline (Tigmint/ARCS). GPLv3 final
2.1.2 Oct 24, 2018 This release improves scaffold N50 on human by ~10%, due to implementation of a new `--median` option for `DistanceEst` (thanks to @lcoombe!). This release also adds a new `--max-cost` option for `konnector` and `abyss-sealer` that curbs indeterminately long running times, particularly at low k values. GPLv3 final
2.1.1 Sep 11, 2018 This release provides bug fixes and modest improvements to Bloom filter assembly contiguity/correctness. Parallelization of Sealer has also been improved, thanks to contributions by @schutzekatze. GPLv3 final
2.1.0 Apr 13, 2018 This release adds support for misassembly correction and scaffolding using linked reads, using Tigmint and ARCS. (Tigmint and ARCS must be installed separately.) In addition, simultaneous optimization of `s` (seed length) and `n` (min supporting read pairs / Chromium barcodes) is now supported during scaffolding. GPLv3 final
2.0.3 Mar 14, 2018 This minor release provides bug fixes and improved reliability for both MPI assemblies and Bloom filter assemblies on large datasets. In addition, many usability improvements have been made to the `abyss-samtobreak` program for misasssembly assessment. GPLv3 final
2.0.2 Oct 21, 2016 Fix compile errors with gcc-6 and boost-1.62. GPLv3 final
2.0.1 Sep 14, 2016 This release resolves some licensing issues with that were pointed out in 2.0.0. As of 2.0.1, ABySS is now available under a standard GPL-3 license, and the libraries included under `lib/rolling-hash` and `lib/bloomfilter` are now also licensed under GPL-3. For alternative licensing terms, please contact Patrick Rebstein (prebstein at bccancer.bc.ca). GPLv3 final
2.0.0 Sep 01, 2016 This release introduces a new Bloom filter assembly mode that enables large genome assemblies with minimal memory (e.g. 34 GB for H. sapiens with 76X coverage bfc-corrected reads). Bloom filter assemblies are currently less contiguous than the default (MPI) assembly mode but are still of high quality (e.g. 3.5 Mbp vs. 4.8 Mbp scaffold NG50 for H. sapiens). Bloom filter assembly mode is enabled by adding three 'abyss-pe' parameters (B = *Bloom filter size*, H = *number of Bloom filter hash functions*, kc = *k-mer coverage threshold*). See 'README.md' for an example. This release also updates several 'abyss-pe' parameter defaults to be more suitable for large genome assemblies with recent Illumina data. In addition, ABySS 2.0.0 includes minor usability improvements for 'abyss-sealer' and removes an unnecessary build dependency on sqlite3. BCCA (academic use) final
1.9.0 May 29, 2015 This release introduces a new paired de Bruijn graph mode for assembly. In paired de Bruijn graph mode, ordinary k-mers are replaced by k-mer pairs, where each k-mer pair is separated by a fixed-size gap. The primary advantage of paired de Bruijn graph mode is that the span of a k-mer pair can be arbitrarily wide without consuming additional memory, and thus provides improved scalability for assemblies of long sequencing reads. This release also introduces a new tool called Sealer for closing scaffold gaps, new Konnector functionality for producing long pseudo-reads, and support for the DIDA (Distributed Indexing Disptached Alignment) parallel alignment framework. BCCA (academic use) final
1.5.2 Jul 10, 2014 In this release we introduce Konnector, a fast and memory-efficient tool to fill the gap between paired-end reads. Konnector determines the intervening sequence by building a Bloom filter de Bruijn graph and searching for paths between paired-end reads within the graph. A companion tool called abyss-bloom is also provided which can be used to construct reusable bloom filter files for input to Konnector; otherwise, Konnector will build an in-memory Bloom filter for one-time use. In addition to Konnector, we have fixed bugs related to compiling with GCC 4.8+ and parsing BWA output SAM files. GPLv3 for non-commercial usage final
1.5.1 May 08, 2014 In this release we fix a compatibility issue with Trans-ABySS 1.5.0 where the output of abyss-filtergraph is not strand-specific. Also, we include additional FCC portability fixes. GPLv3 for non-commercial usage final
1.5.0 May 01, 2014 In this release we have added full strand specific RNA-Seq support such that output contigs are correctly oriented with respect to the original transcripts sequenced. Also, there are new parameters to abyss-pe, xtip and Q, that are used to improve assembly in high coverage regions like highly expressed transcripts. Setting xtip=1 will more aggressively remove certain tips. The 'Q' parameter will prevent low quality bases from being used in the assembly. The version has been bumped to 1.5.0 to signify compatibility with Trans-ABySS 1.5.0. GPLv3 for non-commercial usage final
1.3.7 Dec 11, 2013 Scaffolds can now be rescaffolded using long sequences such as RNA-Seq assemblies produced from Trans-ABySS. Added support for gcc 4.8+ and Mac OS X 10.9 Mavericks with clang. Finally, we've licensed ABySS under GPL for non-commercial purposes. Please read the LICENSE file for more details. GPLv3 for non-commercial usage final
1.3.6 Jul 31, 2013 ABYSS and ABYSS-P are now ~20% faster. Fixed many portability issues and bugs, and improved some error messages. BCCA (academic use) final
1.3.5 Mar 05, 2013 This release introduces new tools to merge overlapping read pairs, layout and merge contigs with perfect sequence overlap, and calculate contig contiguity and correctness metrics. Also, it includes updates to the existing documentation, bug fixes, and attempts to fill scaffold gaps with a consensus of all paths between contigs. BCCA (academic use) final
1.3.4 May 30, 2012 This release eliminates two sources of misassemblies, one in the path extension logic of SimpleGraph. Two, the default value of m, which is the minimum overlap required between two contigs to merge them, is increased from 30 to 50. This release also fixes various portability issues. A new script, abyss-fatoagp, is included to create an AGP file for GenBank submission. BCCA (academic use) final
1.3.3 Mar 13, 2012 Specify the minimum alignment length when aligning the reads to the contigs with the parameter l. Improve the scaffolding algorithm that identifies repeats. Improve the documentation. BCCA (academic use) final
1.3.2 Dec 13, 2011 Improve distance estimates between contigs, enable scaffolding by default, and remove small shim contigs that don't add useful sequence to the assembly. The default aligner is abyss-map. MergePaths uses a non-greedy algorithm that reduces sequence duplication but may reduce contiguity. BCCA (academic use) final
1.3.1 Oct 24, 2011 Fix a bug in KAligner and fix a compiler error for Mac OS X. BCCA (academic use) final
1.3.0 Sep 09, 2011 Mate-pair data can be used to scaffold contigs. Specify your mate-pair libraries using the `mp' parameter of abyss-pe. BCCA (academic use) final
1.2.7 Apr 15, 2011 Support using bwa or bowtie to align reads to contigs. New parameter, d, to specify the acceptable error of a distance estimate. BCCA (academic use) final
1.2.6 Feb 07, 2011 Sequence variants are popped if the two variants are at least 90% similar. Contigs that overlap by fewer than k-1 bp are found and may be merged. BCCA (academic use) final
1.2.5 Nov 15, 2010 Fix a colour-space-specific bug and a bug causing the error Assertion `fstSol.size() == 1' failed. BCCA (academic use) final
1.2.4 Oct 14, 2010 Replace gaps of Ns that span a region of ambiguous sequence with a consensus sequence of the possible sequences that fill the gap. The consensus sequence uses IUPAC-IUB ambiguity codes. BCCA (academic use) final
1.2.3 Sep 08, 2010 Fix two bugs that caused the error messages: Assertion `m_comm.receiveEmpty()' failed. and error: unexpected ID BCCA (academic use) final
1.2.2 Aug 25, 2010 Merge contigs after popping bubbles. Handle multi-line FASTA sequences. Report the amount of memory used. BCCA (academic use) final
1.2.1 Jul 12, 2010 Handle mate pair libraries with reverse-forward orientation as produced by circular, large-fragment libraries. Distance estimates are improved. BCCA (academic use) final
1.2.0 May 26, 2010 Scaffold over gaps in coverage and unresolved repeats. Read sequence from SAM and BAM files. Set q=3 by default. Set E=0 when coverage is low (<2). Generate a Graphviz dot file of the paired-end assembly. BCCA (academic use) final
1.1.2 Feb 15, 2010 Pop bubbles resulting from indels. Read tar files. Fix performance issues in ParseAligns by syncing KAligner threads periodically. BCCA (academic use) final
1.1.1 Jan 19, 2010 Pop complex bubbles either completely or not at all. Choose better (typically lower) default values for the parameters e and c. AFL final
1.1.0 Dec 21, 2009 ABySS will expand tandem repeats when it is possible to determine the exact number of the repeat. The paired-end path-finding algorithm, SimpleGraph, is multithreaded. Fixed a bug in MergePaths that could misassemble repeats larger than the paired-end fragment size. The output format of AdjList, DistanceEst and SimpleGraph has changed. AFL final
1.0.9 May 15, 2009 Significantly reduce the memory usage of KAligner and ParseAligns. abyss-pe can read multiple input files and read FASTA or FASTQ files. AFL final
1.0.8 Apr 02, 2009 Fix the bug causing the error Assertion `marked == split' failed. AFL final
1.0.7 Mar 31, 2009 The parallel MPI assembler is now deterministic; it will produce the same result every time. AFL final
1.0.6 Mar 25, 2009 Fix a race condition in the erosion algorithm. AFL final
1.0.5 Mar 11, 2009 Portability fixes. AFL final
1.0.4 Mar 09, 2009 Remove the need to specify the parameters -e,--erode and -b,--bubbles. Use less disk space by using pipes to avoid intermediate files. Many improvements to the paired-end algorithm. BCCA (academic use) final
1.0.3 Feb 05, 2009 Tidy up the ends of blunt contigs. Merge blunt contigs that are connected by pairs and overlap. BCCA (academic use) final
1.0.2 Nov 21, 2008 Include a parallel binary compiled for OpenMPI. BCCA (academic use) final
1.0.16 Nov 13, 2009 Improve the performance and memory usage of KAligner and AdjList, particularly for very large data sets. AFL final
1.0.15 Oct 19, 2009 New parameters, e and E, to set the coverage threshold of the erosion algorithm. Values for the parameters e and the coverage threshold, c, will be chosen automatically if they're not specified. The read length is now an optional parameter. Two important bug fixes, see below. AFL final
1.0.14 Sep 08, 2009 Assemble multiple libraries of different fragment sizes. AFL final
1.0.13 Aug 26, 2009 Read files compressed with gzip (.gz) or bzip2 (.bz2). AFL final
1.0.12 Aug 19, 2009 Both ABYSS and KAligner are run only once per assembly, which speeds up the paired-end assembly stage by nearly a factor of two. The k-mer coverage information is correct in every contig file. A tool is included to convert colour-space contigs to nucleotide contigs. Discard reads that fail the chastity filter. AFL final
1.0.11 Jul 21, 2009 Assemble colour-space reads. Read files in qseq format. KAligner is multithreaded. Integrate with Sun Grid Engine (SGE). Prevent misassemblies mediated by tandem segmental duplications. AFL final
1.0.10 Jun 18, 2009 ParseAligns is improved to handle any number of reads as long as mate pairs are found interleaved in the same file. Merge overlapping paired-end contigs that were previously being missed in some situations. Number paired-end contigs so that their IDs do not overlap with the single-end contigs. AFL final
1.0 Aug 07, 2008 Initial version of abyss. BCCA (academic use) final