automatic config problem with paired samples numerically named with _samplenum · Issue #1919 · bcbio/bcbio-nextgen (original) (raw)
Not sure how to eloquently describe this problem. Imagine I have PE sequencing data for samples 1-12 named like this, unfortunately with no leading zeros.
sample_1_1.fastq.gz sample_1_2.fastq.gz
sample_2_1.fastq.gz sample_2_2.fastq.gz
sample_3_1.fastq.gz sample_3_2.fastq.gz
sample_4_1.fastq.gz sample_4_2.fastq.gz
sample_5_1.fastq.gz sample_5_2.fastq.gz
sample_6_1.fastq.gz sample_6_2.fastq.gz
sample_7_1.fastq.gz sample_7_2.fastq.gz
sample_8_1.fastq.gz sample_8_2.fastq.gz
sample_10_1.fastq.gz sample_10_2.fastq.gz
sample_11_1.fastq.gz sample_11_2.fastq.gz
sample_12_1.fastq.gz sample_12_2.fastq.gz
When attempting automated configuration based on a CSV file using -w template
, I get warnings that bcbio is adding minimal metadata for samples _1
and _2
, and looking at the yaml file created, the files:
list is incorrectly created.
I imagine it's something to do with how the template generation script is looking for _1.fastq.gz
and _2.fastq.gz
, but is getting confused by the _1
and _2
in the sample names themselves.
In any case, my workaround was to simply rename the files or symlink them without the _
between "sample" and the number. But it's probably not-that-edge-of-a-case potentially worth addressing, or making it at least obvious what's happening -- it took me a few minutes to figure out what the issue was.