appreci8: a pipeline for precise variant calling integrating 8 tools - PubMed (original) (raw)

. 2018 Dec 15;34(24):4205-4212.

doi: 10.1093/bioinformatics/bty518.

Mohsen Karimi 2, Aniek O de Graaf 3, Christian Rohde 4, Stefanie Göllner 4, Julian Varghese 1, Jan Ernsting 1, Gunilla Walldin 5, Bert A van der Reijden 3, Carsten Müller-Tidow 4, Luca Malcovati 6, Eva Hellström-Lindberg 5, Joop H Jansen 3, Martin Dugas 1

Affiliations

appreci8: a pipeline for precise variant calling integrating 8 tools

Sarah Sandmann et al. Bioinformatics. 2018.

Abstract

Motivation: The application of next-generation sequencing in research and particularly in clinical routine requires valid variant calling results. However, evaluation of several commonly used tools has pointed out that not a single tool meets this requirement. False positive as well as false negative calls necessitate additional experiments and extensive manual work. Intelligent combination and output filtration of different tools could significantly improve the current situation.

Results: We developed appreci8, an automatic variant calling pipeline for calling single nucleotide variants and short indels by combining and filtering the output of eight open-source variant calling tools, based on a novel artifact- and polymorphism score. Appreci8 was trained on two data sets from patients with myelodysplastic syndrome, covering 165 Illumina samples. Subsequently, appreci8's performance was tested on five independent data sets, covering 513 samples. Variation in sequencing platform, target region and disease entity was considered. All calls were validated by re-sequencing on the same platform, a different platform or expert-based review. Sensitivity of appreci8 ranged between 0.93 and 1.00, while positive predictive value ranged between 0.65 and 1.00. In all cases, appreci8 showed superior performance compared to any evaluated alternative approach.

Availability and implementation: Appreci8 is freely available at https://hub.docker.com/r/wwuimi/appreci8/. Sequencing data (BAM files) of the 678 patients analyzed with appreci8 have been deposited into the NCBI Sequence Read Archive (BioProjectID: 388411; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA388411).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Overview of the analysis performed by appreci8

Fig. 2.

Fig. 2.

General principle of filtration with appreci8. Calls are classified as ‘Mutations’, ‘Polymorphism’ or ‘Artifact’ on the basis of an artifact- and a polymorphism score

Fig. 3.

Fig. 3.

Relation between positive predictive value and sensitivity in case of GATK, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools, VarDict, the combined output of all tools (eight tools), single-appreci8 and appreci8 in training sets 1 and 2

Fig. 4.

Fig. 4.

Relation between positive predictive value and sensitivity in case of GATK, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools, VarDict, the combined output of all tools (eight tools), single-appreci8 and appreci8 in test sets 1–5

Similar articles

Cited by

References

    1. Aken B.L. et al. (2016) The Ensembl gene annotation system. Database (Oxford), 2016, baw093. - PMC - PubMed
    1. Ashley E.A. (2016) Towards precision medicine. Nat. Rev. Genet., 17, 507–522. - PubMed
    1. Bragg L.M. et al. (2013) Shining a light on dark sequencing: charcterising errors in Ion Torrent PGM data. PLoS Comput. Biol., 9, e1003031.. - PMC - PubMed
    1. Choi Y. et al. (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One, 7, e46688.. - PMC - PubMed
    1. Cibulskis K. et al. (2013) Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol., 31, 213–219. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources