VarScan - Variant Detection in Massively Parallel Sequencing Data (original) (raw)

VarScan User's Manual

VarScan is coded in Java, and should be executed from the command line (Terminal, in Linux/UNIX/OSX, or Command Prompt in MS Windows). For variant calling, you will need a pileup file. See the How to Build A Pileup File section for details. Running VarScan with no arguments prints the usage information. Because some fields changed as of VarScan v2.2.3, we are providing updated documentations for the current release. For documentation of v2.2.2 and prior, see below.

VarScan Documentation (v2.2.3 and later)

USAGE: java -jar VarScan.jar  [COMMAND] [OPTIONS]

COMMANDS:

**Single-sample Calling:**
[**pileup2snp**](#v2.3%5Fpileup2snp) [pileup file]
[**pileup2indel**](#v2.3%5Fpileup2indel) [pileup file]
[**pileup2cns**](#v2.3%5Fpileup2cns) [pileup file]

**Multi-sample Calling:**
[**mpileup2snp**](#v2.3%5Fmpileup2snp) [mpileup file]
[**mpileup2indel**](#v2.3%5Fmpileup2indel) [mpileup file]
[**mpileup2cns**](#v2.3%5Fmpileup2cns) [mpileup file]

**Tumor-normal Comparison:**
[**somatic**](#v2.3%5Fsomatic)	[normal pileup] [tumor pileup] or [normal-tumor mpileup]
[**copynumber**](#v2.3%5Fcopynumber) [normal pileup] [tumor pileup] or [normal-tumor mpileup]

**Variant Filtering:**
[**filter**](#v2.3%5Ffilter) [variants file]
[**somaticFilter**](#v2.3%5FsomaticFilter) [mutations file]

**Utility Functions:**
[**limit**](#v2.3%5Flimit) [variants file] 
[**readcounts**](#v2.3%5Freadcounts) [pileup file]
[**compare**](#v2.3%5Fcompare)	[file1] [file2]

pileup2snp

This command calls SNPs from a pileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar pileup2snp [pileup file] OPTIONS
    pileup file - The SAMtools pileup file

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]
    
OUTPUT
Tab-delimited SNP calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Cons		Consensus genotype of sample in IUPAC format.
Reads1		reads supporting reference allele
Reads2		reads supporting variant allele
VarFreq		frequency of variant allele by read count
Strands1	strands on which reference allele was observed
Strands2	strands on which variant allele was observed
Qual1		average base quality of reference-supporting read bases
Qual2		average base quality of variant-supporting read bases
Pvalue		Significance of variant read count vs. expected baseline error
MapQual1	Average map quality of ref reads (only useful if in pileup)
MapQual2	Average map quality of var reads (only useful if in pileup)
Reads1Plus	Number of reference-supporting reads on + strand
Reads1Minus	Number of reference-supporting reads on - strand
Reads2Plus	Number of variant-supporting reads on + strand
Reads2Minus	Number of variant-supporting reads on - strand
VarAllele	Most frequent non-reference allele observed 

pileup2indel

This command calls indels from a pileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar pileup2indel [pileup file] OPTIONS
    pileup file - The SAMtools pileup file

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]

OUTPUT
Tab-delimited indel calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Cons		Consensus genotype of sample; */(var) indicates heterozygous
Reads1		reads supporting reference allele
Reads2		reads supporting variant allele
VarFreq		frequency of variant allele by read count
Strands1	strands on which reference allele was observed
Strands2	strands on which variant allele was observed
Qual1		average base quality of reference-supporting read bases
Qual2		average base quality of variant-supporting read bases
Pvalue		Significance of variant read count vs. expected baseline error
MapQual1	Average map quality of ref reads (only useful if in pileup)
MapQual2	Average map quality of var reads (only useful if in pileup)
Reads1Plus	Number of reference-supporting reads on + strand
Reads1Minus	Number of reference-supporting reads on - strand
Reads2Plus	Number of variant-supporting reads on + strand
Reads2Minus	Number of variant-supporting reads on - strand
VarAllele	Most frequent non-reference allele observed 

pileup2cns

This command makes consensus calls (SNP/Indel/Reference) from a pileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar pileup2cns [pileup file] OPTIONS
    pileup file - The SAMtools pileup file

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]

OUTPUT
Tab-delimited consensus calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Cons		Consensus genotype of sample; */(var) indicates heterozygous
Reads1		reads supporting reference allele
Reads2		reads supporting variant allele
VarFreq		frequency of variant allele by read count
Strands1	strands on which reference allele was observed
Strands2	strands on which variant allele was observed
Qual1		average base quality of reference-supporting read bases
Qual2		average base quality of variant-supporting read bases
Pvalue		Significance of variant read count vs. expected baseline error
MapQual1	Average map quality of ref reads (only useful if in pileup)
MapQual2	Average map quality of var reads (only useful if in pileup)
Reads1Plus	Number of reference-supporting reads on + strand
Reads1Minus	Number of reference-supporting reads on - strand
Reads2Plus	Number of variant-supporting reads on + strand
Reads2Minus	Number of variant-supporting reads on - strand
VarAllele	Most frequent non-reference allele observed 

mpileup2snp

This command calls SNPs from an mpileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar mpileup2snp [mpileup file] OPTIONS
    mpileup file - The SAMtools mpileup file

OPTIONS:
--min-coverage	Minimum read depth at a position to make a call [8]
--min-reads2	Minimum supporting reads at a position to call variants [2]
--min-avg-qual	Minimum base quality at a position to count a read [15]
--min-var-freq	Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom	Minimum frequency to call homozygote [0.75]
--p-value	Default p-value threshold for calling variants [99e-02]
--strand-filter	Ignore variants with >90% support on one strand [1]
--output-vcf	If set to 1, outputs in VCF format
--variants	Report only variant (SNP/indel) positions (mpileup2cns only) [0]

    
OUTPUT
Tab-delimited SNP calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref			reference allele at this position
Var			variant allele observed
PoolCall	Cross-sample call using all data (Cons:Cov:Reads1:Reads2:Freq:P-value)
        Cons - consensus genotype in IUPAC format
        Cov - total depth of coverage
        Reads1 - number of reads supporting reference
        Reads2 - number of reads supporting variant
        Freq - the variant allele frequency by read count
        P-value - FET p-value of observed reads vs expected non-variant
StrandFilt	Information to look for strand bias using all reads (R1+:R1-:R2+:R2-:pval)
        R1+ = reference supporting reads on forward strand
        R1- = reference supporting reads on reverse strand
        R2+ = variant supporting reads on forward strand
        R2- = variant supporting reads on reverse strand
        pval = FET p-value for strand distribution, R1 versus R2
SamplesRef	Number of samples called reference (wildtype)
SamplesHet	Number of samples called heterozygous-variant
SamplesHom	Number of samples called homozygous-variant
SamplesNC	Number of samples not covered / not called
SampleCalls	The calls for each sample in the mpileup, space-delimited
            Each sample has six values separated by colons:
        Cons - consensus genotype in IUPAC format
        Cov - total depth of coverage
        Reads1 - number of reads supporting reference
        Reads2 - number of reads supporting variant
        Freq - the variant allele frequency by read count
        P-value - FET p-value of observed reads vs expected non-variant	

mpileup2indel

This command calls indels from a mpileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar mpileup2indel [mpileup file] OPTIONS
    mpileup file - The SAMtools mpileup file

OPTIONS:
--min-coverage	Minimum read depth at a position to make a call [8]
--min-reads2	Minimum supporting reads at a position to call variants [2]
--min-avg-qual	Minimum base quality at a position to count a read [15]
--min-var-freq	Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom	Minimum frequency to call homozygote [0.75]
--p-value	Default p-value threshold for calling variants [99e-02]
--strand-filter	Ignore variants with >90% support on one strand [1]
--output-vcf	If set to 1, outputs in VCF format
--variants	Report only variant (SNP/indel) positions (mpileup2cns only) [0]

    
OUTPUT
Tab-delimited SNP calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref			reference allele at this position
Var			variant allele observed
PoolCall	Cross-sample call using all data (Cons:Cov:Reads1:Reads2:Freq:P-value)
            Cons - consensus genotype in IUPAC format
            Cov - total depth of coverage
            Reads1 - number of reads supporting reference
            Reads2 - number of reads supporting variant
            Freq - the variant allele frequency by read count
            P-value - FET p-value of observed reads vs expected non-variant
StrandFilt	Information to look for strand bias using all reads, format R1+:R1-:R2+:R2-:pval
            R1+ = reference supporting reads on forward strand
            R1- = reference supporting reads on reverse strand
            R2+ = variant supporting reads on forward strand
            R2- = variant supporting reads on reverse strand
            pval = FET p-value for strand distribution, R1 versus R2
SamplesRef	Number of samples called reference (wildtype)
SamplesHet	Number of samples called heterozygous-variant
SamplesHom	Number of samples called homozygous-variant
SamplesNC	Number of samples not covered / not called
SampleCalls	The calls for each sample in the mpileup, space-delimited
            Each sample has six values separated by colons:
        Cons - consensus genotype in IUPAC format
        Cov - total depth of coverage
        Reads1 - number of reads supporting reference
        Reads2 - number of reads supporting variant
        Freq - the variant allele frequency by read count
        P-value - FET p-value of observed reads vs expected non-variant	

mpileup2cns

This command makes consensus calls (SNP/Indel/Reference) from a mpileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar mpileup2cns [mpileup file] OPTIONS
    mpileup file - The SAMtools mpileup file

OPTIONS:
--min-coverage	Minimum read depth at a position to make a call [8]
--min-reads2	Minimum supporting reads at a position to call variants [2]
--min-avg-qual	Minimum base quality at a position to count a read [15]
--min-var-freq	Minimum variant allele frequency threshold [0.01]
--min-freq-for-hom	Minimum frequency to call homozygote [0.75]
--p-value	Default p-value threshold for calling variants [99e-02]
--strand-filter	Ignore variants with >90% support on one strand [1]
--output-vcf	If set to 1, outputs in VCF format
--variants	Report only variant (SNP/indel) positions (mpileup2cns only) [0]

    
OUTPUT
Tab-delimited SNP calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref			reference allele at this position
Var			variant allele observed
PoolCall	Cross-sample call using all data (Cons:Cov:Reads1:Reads2:Freq:P-value)
            Cons - consensus genotype in IUPAC format
            Cov - total depth of coverage
            Reads1 - number of reads supporting reference
            Reads2 - number of reads supporting variant
            Freq - the variant allele frequency by read count
            P-value - FET p-value of observed reads vs expected non-variant
StrandFilt	Information to look for strand bias using all reads, format R1+:R1-:R2+:R2-:pval
            R1+ = reference supporting reads on forward strand
            R1- = reference supporting reads on reverse strand
            R2+ = variant supporting reads on forward strand
            R2- = variant supporting reads on reverse strand
            pval = FET p-value for strand distribution, R1 versus R2
SamplesRef	Number of samples called reference (wildtype)
SamplesHet	Number of samples called heterozygous-variant
SamplesHom	Number of samples called homozygous-variant
SamplesNC	Number of samples not covered / not called
SampleCalls	The calls for each sample in the mpileup, space-delimited
            Each sample has six values separated by colons:
        Cons - consensus genotype in IUPAC format
        Cov - total depth of coverage
        Reads1 - number of reads supporting reference
        Reads2 - number of reads supporting variant
        Freq - the variant allele frequency by read count
        P-value - FET p-value of observed reads vs expected non-variant	

somatic

This command calls variants and identifies their somatic status (Germline/LOH/Somatic) usingpileup files from a matched tumor-normal pair.

USAGE: java -jar VarScan.jar somatic [normal_pileup] [tumor_pileup] [output] OPTIONS
    normal_pileup - The SAMtools pileup file for Normal
    tumor_pileup - The SAMtools pileup file for Tumor
    output - Output base name for SNP and indel output

You can also give it a single mpileup file with normal and tumor data.

USAGE: java -jar VarScan.jar somatic [normal-tumor.mpileup] [output] --mpileup 1 OPTIONS
    normal-tumor.mpileup - The SAMtools mpileup file with normal and then tumor
    output - Output base name for SNP and indel output

Both formats of the command share these common options:

OPTIONS:
--output-snp - Output file for SNP calls [default: output.snp]
--output-indel - Output file for indel calls [default: output.indel]
--min-coverage - Minimum coverage in normal and tumor to call variant [8]
--min-coverage-normal - Minimum coverage in normal to call somatic [8]
--min-coverage-tumor - Minimum coverage in tumor to call somatic [6]
--min-var-freq - Minimum variant frequency to call a heterozygote [0.10]
--min-freq-for-hom	Minimum frequency to call homozygote [0.75]
--normal-purity - Estimated purity (non-tumor content) of normal sample [1.00]
--tumor-purity - Estimated purity (tumor content) of tumor sample [1.00]
--p-value - P-value threshold to call a heterozygote [0.99]
--somatic-p-value - P-value threshold to call a somatic site [0.05]
--strand-filter - If set to 1, removes variants with >90% strand bias
--validation - If set to 1, outputs all compared positions even if non-variant

Note that more specific options (e.g. min-coverage-normal) will override the default or specificied value of less specific options (e.g. min-coverage).

The normal and tumor purity values should be a value between 0 and 1. The default (1) implies that the normal is 100% pure with no contaminating tumor cells, and the tumor is 100% pure with no contaminating stromal or other non-malignant cells. You would change tumor-purity to something less than 1 if you have a low-purity tumor sample and thus expect lower variant allele frequencies for mutations. You would change normal-purity to something less than 1 only if it's possible that there will be some tumor content in your "normal" sample, e.g. adjacent normal tissue for a solid tumor, malignant blood cells in the skin punch normal for some liquid tumors, etc.

There are two p-value options. One (p-value) is the significance threshold for the first-pass algorithm that determines, for each position, if either normal or tumor is variant at that position. The second (somatic-p-value) is more important; this is the threshold below which read count differences between tumor and normal are deemed significant enough to classify the sample as a somatic mutation or an LOH event. In the case of a shared (germline) variant, this p-value is used to determine if the combined normal and tumor evidence differ significantly enough from the null hypothesis (no variant with same coverage) to report the variant. See the somatic mutation calling section for details.

OUTPUT
Two tab-delimited files (SNPs and Indels) with the following columns:
chrom					chromosome name
position				position (1-based from the pileup)
ref						reference allele at this position
var						variant allele at this position
normal_reads1			reads supporting reference allele
normal_reads2			reads supporting variant allele
normal_var_freq			frequency of variant allele by read count
normal_gt				genotype call for Normal sample
tumor_reads1			reads supporting reference allele
tumor_reads2			reads supporting variant allele
tumor_var_freq			frequency of variant allele by read count
tumor_gt				genotype call for Tumor sample
somatic_status			status of variant (Germline, Somatic, or LOH)	
variant_p_value			Significance of variant read count vs. baseline error rate
somatic_p_value			Significance of tumor read count vs. normal read count
tumor_reads1_plus       Ref-supporting reads from + strand in tumor
tumor_reads1_minus      Ref-supporting reads from - strand in tumor
tumor_reads2_plus       Var-supporting reads from + strand in tumor
tumor_reads2_minus		Var-supporting reads from - strand in tumor

copynumber

This command calls variants and identifies their somatic status (Germline/LOH/Somatic) usingpileup files from a matched tumor-normal pair.

USAGE: java -jar VarScan.jar copynumber [normal_pileup] [tumor_pileup] [output] OPTIONS
    normal_pileup - The SAMtools pileup file for Normal
    tumor_pileup - The SAMtools pileup file for Tumor
    output - Output base name for SNP and indel output

You can also give it a single mpileup file with normal and tumor data.

USAGE: java -jar VarScan.jar copynumber [normal-tumor.mpileup] [output] --mpileup 1 OPTIONS
    normal-tumor.mpileup - The SAMtools mpileup file with normal and then tumor
    output - Output base name for SNP and indel output

Both formats of the command share these common options:

OPTIONS:
--min-base-qual - Minimum base quality to count for coverage [20]
--min-map-qual - Minimum read mapping quality to count for coverage [20]
--min-coverage - Minimum coverage threshold for copynumber segments [20]
--min-segment-size - Minimum number of consecutive bases to report a segment [10]
--max-segment-size - Max size before a new segment is made [100]
--p-value - P-value threshold for significant copynumber change-point [0.01]
--data-ratio - The normal/tumor input data ratio for copynumber adjustment [1.0]

Note: The data ratio is intended to help you account for overall differences in the amount of sequencing coverage between normal and tumor, which might otherwise give the appearance of global copy number differences. If normal has more data than tumor, set this to something greater than 1. If tumor has more data than normal, adjust it to something below 1. A basic formula for data ratio might be something like _ratio = normal_unique_bp / tumor_unique_bp_where unique base pairs are computed as mapped_non_dup_reads * read_length.

OUTPUT
chrom				Chromosome name
chr_start			Region start position (1-based from the pileup)
chr_stop			Region stop position (1-based from the pileup)
num_positions		Size of the region in base pairs
normal_depth		Average normal sequence depth for the region
tumor_depth			Average tumor sequence depth for the region
log2_ratio			Log-base-2 ratio of: adjusted tumor depth over normal depth
gc_content			Estimated GC content of the region (0-100)

The raw regions reported by VarScan are delineated by drops in coverage or changes in the tumor/normal ratio, so there are many small, nearby regions with similar copy number. It is therefore recommended that raw VarScan copynumber output be processed with circular binary segmentation (CBS) or a similar algorithm, which will generate larger segments delineated by statistically significant change points. See the copy number calling section for details.

filter

This command filters variants in a file by coverage, supporting reads, variant frequency, or average base quality. It is for use with output from pileup2snp or pileup2indel.

USAGE: java -jar VarScan.jar filter [variants file] OPTIONS
    variants file - A file of SNP or indel calls from VarScan pileup2snp or pileup2indel

OPTIONS:
--min-coverage	Minimum read depth at a position to make a call [10]
--min-reads2	Minimum supporting reads at a position to call variants [2]
--min-strands2	Minimum # of strands on which variant observed (1 or 2) [1]
--min-avg-qual	Minimum average base quality for variant-supporting reads [20]
--min-var-freq	Minimum variant allele frequency threshold [0.20]
--p-value	Default p-value threshold for calling variants [1e-01]
--indel-file	File of indels for filtering nearby SNPs, from pileup2indel command
--output-file	File to contain variants passing filters

somaticFilter

This command filters somatic mutation calls to remove clusters of false positives and SNV calls near indels. Note: this is a basic filter. More advanced filtering strategies consider mapping quality, read mismatches, soft-trimming, and other factors when deciding whether or not to filter a variant. See the VarScan 2 publication (Koboldt et al, Genome Research, Feb 2012) for details.

USAGE: java -jar VarScan.jar somaticFilter [mutations file] OPTIONS
    mutations file - A file of SNVs from VarScan somatic

    OPTIONS:
    --min-coverage  Minimum read depth [10]
    --min-reads2    Minimum supporting reads for a variant [2]
    --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
    --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
    --min-var-freq  Minimum variant allele frequency threshold [0.20]
    --p-value       Default p-value threshold for calling variants [1e-01]
    --indel-file    File of indels for filtering nearby SNPs
    --output-file   Optional output file for filtered variants

limit

This command limits variants in a file to a set of positions or regions

USAGE: java -jar VarScan.jar limit [infile] OPTIONS infile - A file of chromosome-positions, tab-delimited

    OPTIONS
    --positions-file - a file of chromosome-positions, tab delimited
    --regions-file - a file of chromosome-start-stops, tab delimited
    --output-file - Output file for the matching variants

readcounts

This command reports the read counts for each base at positions in a pileup file

USAGE: java -jar VarScan.jar readcounts [pileup file] OPTIONS pileup file - The SAMtools pileup file

    OPTIONS:
    --variants-file A list of variants at which to report readcounts
    --output-file   Output file to contain the readcounts
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-base-qual Minimum base quality at a position to count a read [30]

compare

This command performs set-comparison operations on two files of variants.

USAGE: java -jar VarScan.jar compare [file1] [file2] [type] [output] OPTIONS file1 - A file of chromosome-positions, tab-delimited file2 - A file of chromosome-positions, tab-delimited type - Type of comparison [intersect|merge|unique1|unique2] output - Output file for the comparison result

For detailed usage information, see the VarScan JavaDoc.

VarScan Documentation (v2.2.2 and before)

USAGE: java -jar VarScan.jar  [COMMAND] [OPTIONS]

COMMANDS
[**pileup2snp**](#v2.2%5Fpileup2snp) [pileup file]
[**pileup2indel**](#v2.2%5Fpileup2indel) [pileup file]
[**pileup2cns**](#v2.2%5Fpileup2cns) [pileup file]
[**somatic**](#v2.2%5Fsomatic)	[normal pileup] [tumor pileup]
[**filter**](#v2.2%5Ffilter) [variants file]
[**somaticFilter**](#v2.2%5FsomaticFilter) [mutations file]
[**limit**](#v2.2%5Flimit) [variants file] 
[**readcounts**](#v2.2%5Freadcounts) [pileup file]
[**compare**](#v2.2%5Fcompare)	[file1] [file2]

pileup2snp

This command calls SNPs from a pileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar pileup2snp [pileup file] OPTIONS
    pileup file - The SAMtools pileup file

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [10]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]
    
OUTPUT
Tab-delimited SNP calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Var		variant allele at this position
Reads1		reads supporting reference allele
Reads2		reads supporting variant allele
VarFreq		frequency of variant allele by read count
Strands1	strands on which reference allele was observed
Strands2	strands on which variant allele was observed
Qual1		average base quality of reference-supporting read bases
Qual2		average base quality of variant-supporting read bases
Pvalue		Significance of variant read count vs. expected baseline error

pileup2indel

This command calls indels from a pileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar pileup2indel [pileup file] OPTIONS
    pileup file - The SAMtools pileup file

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]

OUTPUT
Tab-delimited indel calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Var		variant allele at this position
Reads1		reads supporting reference allele
Reads2		reads supporting variant allele
VarFreq		frequency of variant allele by read count
Strands1	strands on which reference allele was observed
Strands2	strands on which variant allele was observed
Qual1		average base quality of reference-supporting read bases
Qual2		average base quality of variant-supporting read bases
Pvalue		Significance of variant read count vs. expected baseline error

pileup2cns

This command makes consensus calls (SNP/Indel/Reference) from a pileup file based on user-defined parameters:

USAGE: java -jar VarScan.jar pileup2cns [pileup file] OPTIONS
    pileup file - The SAMtools pileup file

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]

OUTPUT
Tab-delimited consensus calls with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Var		consensus call (reference, IUPAC SNP code, or indel)
Reads1		reads supporting reference allele
Reads2		reads supporting variant allele
VarFreq		frequency of variant allele by read count
Strands1	strands on which reference allele was observed
Strands2	strands on which variant allele was observed
Qual1		average base quality of reference-supporting read bases
Qual2		average base quality of variant-supporting read bases
Pvalue		Significance of variant read count vs. expected baseline error

somatic

This command calls variants and identifies their somatic status (Germline/LOH/Somatic) usingpileup files from a matched tumor-normal pair.

USAGE: java -jar VarScan.jar somatic [normal_pileup] [tumor_pileup] [output] OPTIONS
    normal_pileup - The SAMtools pileup file for Normal
    tumor_pileup - The SAMtools pileup file for Tumor
    output - Output base name for SNP and indel output
    
OPTIONS:
--output-snp	Output file for SNP calls [output.snp]
--output-indel	Output file for indel calls [output.indel]
--min-coverage	Minimum coverage in normal and tumor to call variant [10]
--min-coverage-normal	Minimum coverage in normal to call somatic [10]
--min-coverage-tumor	Minimum coverage in tumor to call somatic [5]
--min_var_freq	Minimum variant frequency to call a heterozygote [0.20]
--p-value	P-value threshold to call a heterozygote [1.0e-01]
--somatic-p-value	P-value threshold to call a somatic site [1.0e-04]

OUTPUT
Two tab-delimited files (SNPs and Indels) with the following columns:
Chrom		chromosome name
Position	position (1-based)
Ref		reference allele at this position
Var		variant allele at this position
Normal_Reads1	reads supporting reference allele
Normal_Reads2	reads supporting variant allele
Normal_VarFreq	frequency of variant allele by read count
Normal_Gt	genotype call for Normal sample
Tumor_Reads1	reads supporting reference allele
Tumor_Reads2	reads supporting variant allele
Tumor_VarFreq	frequency of variant allele by read count
Tumor_Gt	genotype call for Tumor sample
Somatic_Status	status of variant (Germline, Somatic, or LOH)	
Pvalue		Significance of variant read count vs. expected baseline error
Somatic_Pvalue	Significance of tumor read count vs. normal read count

filter

This command filters variants in a file by coverage, supporting reads, variant frequency, or average base quality

USAGE: java -jar VarScan.jar filter [variants file] OPTIONS
    variants file - A file of SNP or indel calls from VarScan

    OPTIONS:
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-reads2    Minimum supporting reads at a position to call variants [2]
    --min-avg-qual  Minimum base quality at a position to count a read [15]
    --min-var-freq  Minimum variant allele frequency threshold [0.01]
    --p-value       Default p-value threshold for calling variants [99e-02]

somaticFilter

This command filters somatic mutation calls to remove clusters of false positives and SNV calls near indels.

USAGE: java -jar VarScan.jar somaticFilter [mutations file] OPTIONS
    mutations file - A file of SNVs from VarScan somatic

    OPTIONS:
    --min-coverage  Minimum read depth [10]
    --min-reads2    Minimum supporting reads for a variant [2]
    --min-strands2  Minimum # of strands on which variant observed (1 or 2) [1]
    --min-avg-qual  Minimum average base quality for variant-supporting reads [20]
    --min-var-freq  Minimum variant allele frequency threshold [0.20]
    --p-value       Default p-value threshold for calling variants [1e-01]
    --indel-file    File of indels for filtering nearby SNPs
    --output-file   Optional output file for filtered variants

limit

This command limits variants in a file to a set of positions or regions

USAGE: java -jar VarScan.jar limit [infile] OPTIONS infile - A file of chromosome-positions, tab-delimited

    OPTIONS
    --positions-file - a file of chromosome-positions, tab delimited
    --regions-file - a file of chromosome-start-stops, tab delimited
    --output-file - Output file for the matching variants

readcounts

This command reports the read counts for each base at positions in a pileup file

USAGE: java -jar VarScan.jar readcounts [pileup file] OPTIONS pileup file - The SAMtools pileup file

    OPTIONS:
    --variants-file A list of variants at which to report readcounts
    --output-file   Output file to contain the readcounts
    --min-coverage  Minimum read depth at a position to make a call [8]
    --min-base-qual Minimum base quality at a position to count a read [30]

compare

This command performs set-comparison operations on two files of variants.

USAGE: java -jar VarScan.jar compare [file1] [file2] [type] [output] OPTIONS file1 - A file of chromosome-positions, tab-delimited file2 - A file of chromosome-positions, tab-delimited type - Type of comparison [intersect|merge|unique1|unique2] output - Output file for the comparison result

For detailed usage information, see the VarScan JavaDoc.

How to Build a SAMtools (m)pileup File

The variant calling features of VarScan for single samples (pileup2snp, pileup2indel, pileup2cns) and multiple samples (mpileup2snp, mpileup2indel, mpileup2cns, and somatic) expect input in SAMtools pileup or mpileup format. In current versions of SAMtools, the "pileup" command has now been replaced with the "mpileup" command. For a single sample, these operate in a very similar fashion, except that mpileup applies BAQ adjustments by default, and the output is identical. When you give it multiple BAM files, however, SAMtools mpileup generates a multi-sample pileup format that must be processed with the mpileup2* commands in VarScan. To build a mpileup file, you will need:

Generate a mpileup file with the following command:

samtools mpileup -f [reference sequence] [BAM file(s)] >myData.mpileup

Note, to save disk space and file I/O, you can redirect mpileup output directly to VarScan with a "pipe" command. For example:

One sample: samtools mpileup -f reference.fasta myData.bam | java -jar VarScan.v2.2.jar pileup2snp

Multiple samples: samtools mpileup -f reference.fasta sample1.bam sample2.bam | java -jar VarScan.v2.2.jar pileup2snp