GitHub - elswob/SCUBAT: Scaffolding Contigs Using Blat And Transcripts (original) (raw)

#Overview

SCUBAT (Scaffolding Contigs Using BLAT And Transcripts) uses any set of transcripts to identify cases where a transcript is split over multiple genome fragments and attempts to use this information to scaffold the genome.

#Software Requirements

CAP3 - to assemble scaffolds

GNU parallel - to run the CAP3 assemblies in parallel

#Data requirments

  1. A genome in FASTA format
  2. A set of transcripts in FASTA format
  3. A BLAT psl file aligning the transcripts to the genome

#Quick guide

scubat.pl -t genome.fa -q transcripts.fa -p blat_output.psl

To see all options just run scubat.pl

#Details

SCUBAT procedes via 6 steps:

###1) Align the transcripts to the genome (not part of the script):

###2) Identify informative split transcripts:

###3) Create scaffolds:

###4) Cluster scaffolds into groups and assemble:

###5) Filter the assemblies:

###6) Create new contig set: