GigaDB Dataset - DOI 10.5524/100044 (original) (raw)

SOAPdenovo2 is the latest de novo genome assembly package from BGI’s SOAP (short oligonucleotide analysis package) suite of tools (homepage here: http://soap.genomics.org.cn/). Compared to SOAPdenovo1, this new version has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closure, and is optimized for large genomes.
Using new sequencing data from the YH (Homo sapiens) diploid genome – the first sequenced Han Chinese individual, an updated assembly was produced (see dataset here: doi:10.5524/100038), with the N50 scores for the contig and scaffold being 3-fold and 50-fold longer, respectively, than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 times lower during the point of largest memory consumption.
Benchmarking with Assemblathon1 and GAGE datasets shows that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo1 and is competitive to other assemblers on both assembly length and accuracy.
In order to facilitate readers to repeat and recreate these findings, configured packages with the compressed pipelines containing all of the necessary shell scripts and tools are available from the BGI FTP server (ftp://public.genomics.org.cn/BGI/SOAPdenovo2).
The latest version of SOAPdenovo2 is available from Sourceforge: http://soapdenovo2.sourceforge.net/
These pipelines are available from our data platform as Galaxy workflows: http://galaxy.cbiit.cuhk.edu.hk/

Additional details

Read the peer-reviewed publication(s):

doi:10.5524/100044 Compiles doi:10.5524/100038
doi:10.5524/100044 IsPreviousVersionOf doi:10.5524/100148(It is a more recent version of this dataset)

Additional information:

http://soap.genomics.org.cn/

http://soapdenovo2.sourceforge.net/

http://gigagalaxy.net/library/browse_libraries?id=f2db41e1fa331b3e

Click on a table column to sort the results.

Table Settings

Sample ID Common Name Scientific Name Sample Attributes Taxonomic ID Genbank Name
YH Human Homo sapiens 9606 human

Click on a table column to sort the results.

Table Settings

File Name Description Sample ID Data Type File Format Size Release Date File Attributes Download
README.pdf Readme PDF 237.56 kB 2012-12-13 MD5 checksum: 229294a5e1034e7adf54bff8f08e9f3d
Assemblathon1_pipeline.tgz Software UNKNOWN 10.51 MB 2012-12-13 MD5 checksum: 080cc94121f37cb94116e14957431603
Bombus_impatiens_pipeline.tgz Software UNKNOWN 5.13 MB 2012-12-13 MD5 checksum: a55a8b386c64d679b35145e7f4550775
Rhodobacter_sphaeroides_pipeline.tgz Software UNKNOWN 5.12 MB 2012-12-13 MD5 checksum: 6fe888d2446b7cbba5459c783aec516c
Staphylococcus_aureus_pipeline.tgz Software UNKNOWN 4.55 MB 2012-12-13 MD5 checksum: 433c999a3bb60fd06e2ef8d5cd0f6405
YH_pipeline.tgz Software UNKNOWN 7.34 MB 2012-12-13 MD5 checksum: 4f1fee9663c6d8f30e7cf7ae8639d7a1
readme.txt Readme TEXT 300 B 2012-12-13 MD5 checksum: c668cb6623ba879fba2efbb4bb42a749
isa-tab.zip ISA-Tab files describing SOAP2 assembly of YH and other genomes ISA-Tab TEXT 6.36 kB 2014-08-12 MD5 checksum: 6c82908eaca19aae1f4dd7b2a0e42978
Funding body Awardee Award ID Comments
National Natural Science Foundation of China 90612019
National High Technology Research and Development Program of China-863 program 2012AA02A201
State Key Development Program for Basic Research of China-973 Program 2011CB809203
Shenzhen Municipal Government of China JC201005260191A
Shenzhen Key Laboratory of Trans-omics Biotechnologies CXB201108250096A
Date Action
October 16, 2015 File Assemblathon1_pipeline.tgz updated
July 9, 2018 External Link updated : http://gigagalaxy.net/library/browse\_libraries?sort=name&f-description=All&f-name=All&operation=browse&id=f2db41e1fa331b3e
July 9, 2018 External Link updated : http://gigagalaxy.net/library/browse\_libraries?id=f2db41e1fa331b3e