PacBio Corrected Reads (PBcR) Pipeline (original) (raw)
CA From Source Code
A version of source code is available (which correspond to that used in the publication). The code is compiled to support up to 64,786bp sequences for correction and assembly.
- Publication source (including CA, AMOS, SMRTportal, and utility scripts pre-built for Linux-amd64 machines) is available here
- List of commands used for correction and assembly is available here as well as in the distribution above.
- The latest code with updates and bug fixes can be built from the respository following the instructions here.
- For the latest usage information and announcements, please visit the PBcR wiki
Simulation Results
- An interactive KRONA chart of simulated assemblies given C1, C2, XL-C2, XL-XL, and ZL sequencing is available here.
- A list of repeat counts and maximum repeat size for all genomes is available here.
- The simulation results for # gaps remaining for a given coverage and chemistry type is available here
Datasets
Below you will find all the datasets used for testing the assembly/correction pipeline.
E. coli K12 PacBio RS sequences were generously provided by Pacific Biosciences.
Sequencing of E. coli O157, B. tre, M. haemolytica, and S. enterica were perfomed by the
- E. coli K12 MG1655
- PacBio RS sequencing and 454 data are available at the SRA
- MiSeq Sequences are available at the Illumina scientific data website
- 200X Filtered fastq sequences direct link: fastq
- MiSeq 100X frg
- 454 50X frg
- CCS 25X frg
- Ecoli O157:H7
- Raw sequences available at the SRA
- 200X Filtered fastq sequences direct link: fastq
- MiSeq 100X tar.gz
- 454 40X tar.gz
- B. tre
- Raw sequences available at the SRA
- 200X Filtered fastq sequences direct link: fastq
- MiSeq 100X tar.gz
- 454 50X frg
- CCS 25X frg
- M. hist
- Raw sequences available at the SRA
- 200X Filtered fastq sequences direct link: fastq
- MiSeq 100X tar.gz
- CCS 25X frg
- F. tularensis
- Raw sequences available at the SRA
- All Filtered fastq sequences direct link: fastq
- 200X Filtered fastq sequences direct link: fastq
- MiSeq 100X tar.gz
- 454 50X frg
- S. enterica
Second-gen Assemblies
Below are the 454 and Illumina assemblies. 454 assemblies were generated by Newbler v2.8. Illumina assembles were generated by SPAdes v2.5.0 and MaSuRCA v1.9.5 and consensus-polished with iCORN.
- Validation statistics on all generated assemblies in the paper is available here
- E. coli K12
- MiSeq SPAdes assembly
- MiSeq SPAdes iCORN assembly
- MiSeq MaSuRCA assembly
- MiSeq MaSuRCA iCORN assembly
- 454 Newbler assembly
- Ecoli O157
- MiSeq SPAdes assembly
- MiSeq SPAdes iCORN assembly
- MiSeq MaSuRCA assembly
- MiSeq MaSuRCA iCORN assembly
- 454 Newbler assembly
- B. tre
- MiSeq SPAdes assembly
- MiSeq SPAdes iCORN assembly
- MiSeq MaSuRCA assembly
- MiSeq MaSuRCA iCORN assembly
- 454 Newbler assembly
- M. haemolytica
- MiSeq SPAdes assembly
- MiSeq SPAdes iCORN assembly
- MiSeq MaSuRCA assembly
- MiSeq MaSuRCA iCORN assembly
- F. tularensis
- MiSeq SPAdes assembly
- MiSeq SPAdes iCORN assembly
- MiSeq MaSuRCA assembly
- MiSeq MaSuRCA iCORN assembly
- 454 Newbler assembly
- S. enterica
Corrected Sequences and Assembly
Below are the PBcR sequences and assemblies (both hybrid and second-gen alone). CA has some randomized code. Therefore, to reproduce the exact results in the paper, you must start with the corrected fastq sequences and assemble them rather than re-running correction and assembly.
- Validation statistics on all generated assemblies in the paper is available here
- E. coli K12
- PBcR Corrected sequences (via MiSeq)
- Assembly sequences (via MiSeq)
- PBcR Corrected sequences (via 454)
- Assembly sequences (via 454)
- PBcR Corrected sequences (via CCS)
- Assembly sequences (via CCS)
- PBcR Corrected sequences (via self-correction)
- Assembly sequences (via self-correction)
- Ecoli O157
- PBcR Corrected sequences (via MiSeq)
- Assembly sequences (via MiSeq)
- PBcR Corrected sequences (via 454)
- Assembly sequences (via 454)
- PBcR Corrected sequences (via self-correction)
- Assembly sequences (via self-correction)
- B. tre
- PBcR Corrected sequences (via MiSeq)
- Assembly sequences (via MiSeq)
- PBcR Corrected sequences (via 454)
- Assembly sequences (via 454)
- PBcR Corrected sequences (via CCS)
- Assembly sequences (via CCS)
- PBcR Corrected sequences (via self-correction)
- Assembly sequences (via self-correction)
- M. haemolytica
- PBcR Corrected sequences (via MiSeq)
- Assembly sequences (via MiSeq)
- PBcR Corrected sequences (via CCS)
- Assembly sequences (via CCS)
- PBcR Corrected sequences (via self-correction)
- Assembly sequences (via self-correction)
- F. tularensis
- PBcR Corrected sequences (via MiSeq)
- Assembly sequences (via MiSeq)
- PBcR Corrected sequences (via 454)
- Assembly sequences (via 454)
- PBcR Corrected sequences (via self-correction)
- Assembly sequences (via self-correction)
- PBcR All Data Corrected sequences (via self-correction)
- Assembly All Data sequences (via self-correction)
- S. enterica