PacBio Corrected Reads (PBcR) Pipeline (original) (raw)

CA From Source Code

A version of source code is available (which correspond to that used in the publication). The code is compiled to support up to 64,786bp sequences for correction and assembly.

Simulation Results

Datasets

Below you will find all the datasets used for testing the assembly/correction pipeline.
E. coli K12 PacBio RS sequences were generously provided by Pacific Biosciences.
Sequencing of E. coli O157, B. tre, M. haemolytica, and S. enterica were perfomed by the USDA

  1. E. coli K12 MG1655
  1. Ecoli O157:H7
  1. B. tre
  1. M. hist
  1. F. tularensis
  1. S. enterica

Second-gen Assemblies

Below are the 454 and Illumina assemblies. 454 assemblies were generated by Newbler v2.8. Illumina assembles were generated by SPAdes v2.5.0 and MaSuRCA v1.9.5 and consensus-polished with iCORN.

  1. Validation statistics on all generated assemblies in the paper is available here
  2. E. coli K12
  1. Ecoli O157
  1. B. tre
  1. M. haemolytica
  1. F. tularensis
  1. S. enterica

Corrected Sequences and Assembly

Below are the PBcR sequences and assemblies (both hybrid and second-gen alone). CA has some randomized code. Therefore, to reproduce the exact results in the paper, you must start with the corrected fastq sequences and assemble them rather than re-running correction and assembly.

  1. Validation statistics on all generated assemblies in the paper is available here
  2. E. coli K12
  1. Ecoli O157
  1. B. tre
  1. M. haemolytica
  1. F. tularensis
  1. S. enterica