cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data - PubMed (original) (raw)

cgpCaVEManWrapper: Simple Execution of CaVEMan in Order to Detect Somatic Single Nucleotide Variants in NGS Data

David Jones et al. Curr Protoc Bioinformatics. 2016.

Abstract

CaVEMan is an expectation maximization-based somatic substitution-detection algorithm that is written in C. The algorithm analyzes sequence data from a test sample, such as a tumor relative to a reference normal sample from the same patient and the reference genome. It performs a comparative analysis of the tumor and normal sample to derive a probabilistic estimate for putative somatic substitutions. When combined with a set of validated post-hoc filters, CaVEMan generates a set of somatic substitution calls with high recall and positive predictive value. Here we provide instructions for using a wrapper script called cgpCaVEManWrapper, which runs the CaVEMan algorithm and additional downstream post-hoc filters. We describe both a simple one-shot run of cgpCaVEManWrapper and a more in-depth implementation suited to large-scale compute farms. © 2016 by John Wiley & Sons, Inc.

Keywords: SNV; cancer; sequencing; somatic; substitution.

Copyright © 2016 John Wiley & Sons, Inc.

PubMed Disclaimer

Figures

Figure 15.10.1

Figure 15.10.1

cgpCavEManWrapper workflow. If –

p/-i

options are omitted, individual componenets are automatically executed. On restart, the workflow will automatically recover to the last successful point if killed for any reason.

Figure 15.10.2

Figure 15.10.2

UCSC Table Browser settings for generation of the

centromeric_repeats.bed.gz

file.

Figure 15.10.3

Figure 15.10.3

UCSC Table Browser filter settings for generation of the

centromeric_repeats.bed.gz

file.

Similar articles

Cited by

References

Literature Cited

    1. Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, Heisler LE, Beck TA, Simpson JT, Tonon L, Sertier AS, et al. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun. 2015;6:10001. doi: 10.1038/ncomms10001. - DOI - PMC - PubMed
    1. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. - DOI - PMC - PubMed
    1. Do CB, Batzoglou S. What is the expectation maximization algorithm? Nat Biotechno. 2008;26:897–899. doi: 10.1038/nbt1406. - DOI - PubMed
    1. Hsi-Yang Fritz M, Leinonen R, Cochrane G, Birney E. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 2011;21:734–740. doi: 10.1101/gr.114819.110. - DOI - PMC - PubMed
    1. Li H. Tabix: Fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011;27:718–719. doi: 10.1093/bioinformatics/btq671. - DOI - PMC - PubMed

Internet Resources

    1. https://github.com/cancerit
      Repository for Wellcome Trust Sanger Institute Cancer Genome Project public projects.
    1. ftp://ftp.sanger.ac.uk/pub/cancer/support-files/CPIB/
      FTP site for reference and example data listed in this unit.
    1. https://genome.ucsc.edu/cgi-bin/hgTables
      UCSC Genome Browser Table Browser
    1. http://icgc.org
      ICGC/TCGA Pancancer project site.
    1. http://vcftools.github.io/specs.html
      VCF format.

MeSH terms

LinkOut - more resources