LCR regions still in final VCF · Issue #3609 · bcbio/bcbio-nextgen (original) (raw)

My germline WGS run completed successfully, however, I double checked the LCR regions using a bedtools intersectBed of the final vcf and the provided LCR.bed.gz, and variants were still present.

intersectBed -wa -a RF_0890-joint-gatk-haplotype-annotated.vcf.gz -b /shared/workspace/software/bcbio/genomes/Hsapiens/hg38/coverage/problem_regions/repeats/LCR.bed.gz | head

chr1	94808	.	CCTTTT	C,<NON_REF>	7.6	PASS	BaseQRankSum=0.319;ClippingRankSum=-1.282;DP=7;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MMQ=39,23,60;MQ0=0;MQRankSum=-1.282;RAW_MQandDP=8545,7;ReadPosRankSum=1.15	GT:AD:DP:GQ:PL:SB	0/1:4,1,0:5:15:15,0,165,27,168,195:0,4,0,1
chr1	269239	.	TA	T,TAA,TAAA,<NON_REF>	78.6	PASS	BaseQRankSum=0.613;ClippingRankSum=-0.52;DP=63;ExcessHet=3.0103;MLEAC=0,1,0,0;MLEAF=0,0.5,0,0;MMQ=60,40,40,45,60;MQ0=0;MQRankSum=-4.102;RAW_MQandDP=150027,63;ReadPosRankSum=0.745	GT:AD:DP:GQ:PL:SB	0/2:40,2,8,3,0:53:86:86,191,1174,0,807,839,113,1058,923,1387,211,1076,885,1128,1126:23,17,4,9
chr1	832736	.	AGTTTT	A,<NON_REF>	887.0	PASS	DP=30;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1,0;MMQ=60,60,60;MQ0=0;RAW_MQandDP=107881,30	GT:AD:DP:GQ:PL:SB	1/1:0,20,0:20:61:901,61,0,901,61,901:0,0,11,9
chr1	1016060	.	CT	C,<NON_REF>	165.6	PASS	BaseQRankSum=-1.208;ClippingRankSum=0.083;DP=24;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MMQ=60,60,60;MQ0=0;MQRankSum=0;RAW_MQandDP=82281,24;ReadPosRankSum=-0.207	GT:AD:DP:GQ:PL:SB	0/1:8,11,0:19:99:173,0,144,197,172,369:5,3,4,7
chr1	1028320	.	GCC	G,<NON_REF>	661.0	PASS	DP=15;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1,0;MMQ=60,60,60;MQ0=0;RAW_MQandDP=54000,15	GT:AD:DP:GQ:PL:SB	1/1:0,15,0:15:45:675,45,0,675,45,675:0,0,0,15
chr1	1048767	.	CAAAAAAAAAAA	C,<NON_REF>	388.6	PASS	BaseQRankSum=2.617;ClippingRankSum=0.083;DP=21;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MMQ=60,60,60;MQ0=0;MQRankSum=0;RAW_MQandDP=74916,21;ReadPosRankSum=0.327	GT:AD:DP:GQ:PL:SB	0/1:8,10,1:19:99:396,0,306,420,336,756:4,4,8,3
chr1	1190111	.	CA	C,<NON_REF>	0.0	PASS	BaseQRankSum=-0.587;ClippingRankSum=1.658;DP=31;ExcessHet=3.0103;MLEAC=0,0;MLEAF=0,0;MMQ=60,60,60;MQ0=0;MQRankSum=0;RAW_MQandDP=111600,31;ReadPosRankSum=0.501	GT:AD:DP:GQ:PL:SB	0/0:23,2,0:25:25:0,25,499,68,505,548:13,10,1,1
chr1	1214845	.	CA	C,CAA,<NON_REF>	0.0	PASS	BaseQRankSum=1.363;ClippingRankSum=-0.679;DP=38;ExcessHet=3.0103;MLEAC=0,0,0;MLEAF=0,0,0;MMQ=60,60,60,60;MQ0=0;MQRankSum=0;RAW_MQandDP=136800,38;ReadPosRankSum=0	GT:AD:DP:GQ:PL:SB	0/0:27,3,2,1:33:20:0,20,636,46,554,639,93,626,641,701:13,14,3,3
chr1	1251356	.	AT	A,<NON_REF>	344.6	PASS	BaseQRankSum=-1.245;ClippingRankSum=-1.57;DP=25;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.5,0;MMQ=60,60,60;MQ0=0;MQRankSum=-0.687;RAW_MQandDP=88418,25;ReadPosRankSum=2.163	GT:AD:DP:GQ:PL:SB	0/1:5,17,1:23:56:352,0,56,367,107,475:4,1,7,11
chr1	1256888	.	GA	G,TA,<NON_REF>	1.8	PASS	BaseQRankSum=-0.18;ClippingRankSum=0.566;DP=12;ExcessHet=3.0103;MLEAC=0,0,0;MLEAF=0,0,0;MMQ=60,60,60,60;MQ0=0;MQRankSum=0;RAW_MQandDP=37354,12;ReadPosRankSum=-0.484	GT:AD:DP:GQ:PL:SB	0/1:4,2,1,0:7:9:9,0,79,29,69,286,38,91,205,202:3,1,2,1

I even see the bed file being used in the pipeline from the commands log, for example:

bcbio-nextgen-commands.log:[2021-12-03T13:28Z] bedtools subtract -nonamecheck -a /scratch/germline/RF_0890/gatk-haplotype/chr7/RF_0890-joint-chr7_152549199_159345973-regions.bed -b /shared/workspace/software/bcbio/genomes/Hsapiens/hg38/coverage/problem_regions/repeats/LCR.bed.gz > /scratch/germline/RF_0890/bcbiotx/tmpwxxyfsbq/RF_0890-joint-chr7_152549199_159345973-regions-nolcr.bed
cd $workspace && bcbio_nextgen.py \
    -w template $template_yaml \
    $metadata \
    $workspace \
    --workdir $workspace

bcbio_nextgen.py \
 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>w</mi><mi>o</mi><mi>r</mi><mi>k</mi><mi>s</mi><mi>p</mi><mi>a</mi><mi>c</mi><mi>e</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">&quot;</mi></mrow><annotation encoding="application/x-tex">workspace/&quot;</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.02691em;">w</span><span class="mord mathnormal" style="margin-right:0.02778em;">or</span><span class="mord mathnormal" style="margin-right:0.03148em;">k</span><span class="mord mathnormal">s</span><span class="mord mathnormal">p</span><span class="mord mathnormal">a</span><span class="mord mathnormal">ce</span><span class="mord">/&quot;</span></span></span></span>{base_meta%.*}"/config/"${base_meta%.*}".yaml \
    -n 16 \
    --workdir $workspace
details:
- algorithm:
    aligner: bwa
    mark_duplicates: true
    #recalibrate: gatk
    exclude_regions: [altcontigs, highdepth, lcr]
    variantcaller: gatk-haplotype
    tools_on: [gatk4, gvcf, qualimap_full]
    tools_off: [gemini]
    svcaller: cnvkit
    svprioritize: actionable/ACMG56
    effects: false
  analysis: variant2
  genome_build: hg38
upload:
  dir: final

Please attach (10MB max): bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.