Omitting slash mates (/1 or /2) in read names before QNAME truncating · Issue #265 · BenLangmead/bowtie2 (original) (raw)

In printReadName, read names are truncated by following order:

  1. Omit slash mate (/1 or /2) from the end of a read name
  2. Keep first 255 characters
  3. Ignore characters after the first whitespace

However, read names in some fastq files (maybe following the old illumina format) can have both slash mates and additional strings. For example, these are the heads of the paired-end fastq files I got from the ENCODE.

@BI:SL-HEA:C3BFGACXX:2:2316:10503:45861/1 1:X:0:CATAGCGA
TAGGGTTAGGGTTAGGGTTAGGGTT
+
@@@FFADDFHHCFHJJJCGDHIJGG
@BI:SL-HEA:C3BFGACXX:2:2316:10503:45861/2 1:X:0:CATAGCGA
TAACCCTAACCCTAACCCTAACCCT
+
CCCFFFFFHHHHHJJIJIIJJJJII

So I think "slash mates" should be omitted after the other truncations to match read names as pairs.