vcfr: a package to manipulate and visualize variant call format data in R - PubMed (original) (raw)
vcfr: a package to manipulate and visualize variant call format data in R
Brian J Knaus et al. Mol Ecol Resour. 2017 Jan.
Abstract
Software to call single-nucleotide polymorphisms or related genetic variants has converged on the variant call format (VCF) as the output format of choice. This has created a need for tools to work with VCF files. While an increasing number of software exists to read VCF data, many only extract the genotypes without including the data associated with each genotype that describes its quality. We created the r package vcfr to address this issue. We developed a VCF file exploration tool implemented in the r language because r provides an interactive experience and an environment that is commonly used for genetic data analysis. Functions to read and write VCF files into r as well as functions to extract portions of the data and to plot summary statistics of the data are implemented. vcfr further provides the ability to visualize how various parameterizations of the data affect the results. Additional tools are included to integrate sequence (fasta) and annotation data (GFF) for visualization of genomic regions such as chromosomes. Conversion functions translate data from the vcfr data structure to formats used by other r genetics packages. Computationally intensive functions are implemented in C++ to improve performance. Use of these tools is intended to facilitate VCF data exploration, including intuitive methods for data quality control and easy export to other r packages for further analysis. vcfr thus provides essential, novel tools currently not available in r.
Keywords: data visualization; high-throughput sequencing; quality control; variant call format specification.
Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Similar articles
- snpfiltr: An R package for interactive and reproducible SNP filtering.
DeRaad DA. DeRaad DA. Mol Ecol Resour. 2022 Aug;22(6):2443-2453. doi: 10.1111/1755-0998.13618. Epub 2022 Apr 24. Mol Ecol Resour. 2022. PMID: 35398990 - Towards an integrated ecosystem of R packages for the analysis of population genetic data.
Paradis E, Gosselin T, Grünwald NJ, Jombart T, Manel S, Lapp H. Paradis E, et al. Mol Ecol Resour. 2017 Jan;17(1):1-4. doi: 10.1111/1755-0998.12636. Mol Ecol Resour. 2017. PMID: 27860406 No abstract available. - stratag: An r package for manipulating, summarizing and analysing population genetic data.
Archer FI, Adams PE, Schneiders BB. Archer FI, et al. Mol Ecol Resour. 2017 Jan;17(1):5-11. doi: 10.1111/1755-0998.12559. Epub 2016 Jul 20. Mol Ecol Resour. 2017. PMID: 27327208 - A guide to the application of Hill numbers to DNA-based diversity analyses.
Alberdi A, Gilbert MTP. Alberdi A, et al. Mol Ecol Resour. 2019 Jul;19(4):804-817. doi: 10.1111/1755-0998.13014. Epub 2019 May 5. Mol Ecol Resour. 2019. PMID: 30947383 Review. - A review of spline function procedures in R.
Perperoglou A, Sauerbrei W, Abrahamowicz M, Schmid M. Perperoglou A, et al. BMC Med Res Methodol. 2019 Mar 6;19(1):46. doi: 10.1186/s12874-019-0666-3. BMC Med Res Methodol. 2019. PMID: 30841848 Free PMC article. Review.
Cited by
- Discovery of a new gall-inducing species, Aciurinaluminaria (Insecta, Diptera, Tephritidae) via multi-trait integrative taxonomy.
Baine Q, White B, Martinson VG, Martinson EO. Baine Q, et al. Zookeys. 2024 Oct 7;1214:217-236. doi: 10.3897/zookeys.1214.130171. eCollection 2024. Zookeys. 2024. PMID: 39434781 Free PMC article. - Genetic control of flowering in greater yam (Dioscorea alata L.).
Cormier F, Martin G, Vignes H, Lachman L, Cornet D, Faure Y, Maledon E, Mournet P, Arnau G, Chaïr H. Cormier F, et al. BMC Plant Biol. 2021 Apr 1;21(1):163. doi: 10.1186/s12870-021-02941-7. BMC Plant Biol. 2021. PMID: 33794780 Free PMC article. - Rapid sequence evolution driven by transposable elements at a virulence locus in a fungal wheat pathogen.
Singh NK, Badet T, Abraham L, Croll D. Singh NK, et al. BMC Genomics. 2021 May 27;22(1):393. doi: 10.1186/s12864-021-07691-2. BMC Genomics. 2021. PMID: 34044766 Free PMC article. - Clinical-grade whole genome sequencing-based haplarithmisis enables all forms of preimplantation genetic testing.
Janssen AEJ, Koeck RM, Essers R, Cao P, van Dijk W, Drüsedau M, Meekels J, Yaldiz B, van de Vorst M, de Koning B, Hellebrekers DMEI, Stevens SJC, Sun SM, Heijligers M, de Munnik SA, van Uum CMJ, Achten J, Hamers L, Naghdi M, Vissers LELM, van Golde RJT, de Wert G, Dreesen JCFM, de Die-Smulders C, Coonen E, Brunner HG, van den Wijngaard A, Paulussen ADC, Zamani Esteki M. Janssen AEJ, et al. Nat Commun. 2024 Sep 2;15(1):7164. doi: 10.1038/s41467-024-51508-1. Nat Commun. 2024. PMID: 39223156 Free PMC article. - Pulmonary artery embolism: comprehensive transcriptomic analysis in understanding the pathogenic mechanisms of the disease.
Gromadziński L, Paukszto Ł, Lepiarczyk E, Skowrońska A, Lipka A, Makowczenko KG, Łopieńska-Biernat E, Jastrzębski JP, Holak P, Smoliński M, Majewska M. Gromadziński L, et al. BMC Genomics. 2023 Jan 9;24(1):10. doi: 10.1186/s12864-023-09110-0. BMC Genomics. 2023. PMID: 36624378 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous