GitHub - BioGenies/tidysq: tidy processing of biological sequences in R (original) (raw)

tidysq

Overview

tidysq contains tools for analysis and manipulation of biological sequences (including amino acid and nucleic acid – e.g. RNA, DNA – sequences). Two major features of this package are:

effective compression of sequence data, allowing to fit larger datasets in R,
compatibility with most of tidyverse universe, especially dplyrand vctrs packages, making analyses tidier.

Getting started

Try our quick start vignette orour exhaustive documentation.

Installation

The easiest way to install tidysq package is to download its latest version from CRAN repository:

install.packages("tidysq")

Alternatively, it is possible to download the development version directly from GitHub repository:

install.packages("devtools")

devtools::install_github("BioGenies/tidysq")

Example usage

sq_ami <- sqibble$sq sq_ami #> basic amino acid sequences list: #> [1] PGGGKVQIVYKPV <13> #> [2] NLKHQPGGGKVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVE <43> #> [3] NLKHQPGGGKVQIVYKEVD <19> #> [4] GKVQIVYK <8> #> [5] VQIVYK <6> #> [6] DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV <40> #> [7] VPHQKLVFFAEDVGS <15> #> [8] VHPQKLVFFAEDVGS <15> #> [9] VHHPKLVFFAEDVGS <15> #> [10] VHHQPLVFFAEDVGS <15> #> printed 10 out of 421

Subsequences can be extracted with bite()

bite(sq_ami, 5:10) #> Warning in CPP_bite(x, indices, NA_letter, on_warning): some sequences are #> subsetted with index bigger than length - NA introduced #> basic amino acid sequences list: #> [1] KVQIVY <6> #> [2] QPGGGK <6> #> [3] QPGGGK <6> #> [4] IVYK!! <6> #> [5] YK!!!! <6> #> [6] RHDSGY <6> #> [7] KLVFFA <6> #> [8] KLVFFA <6> #> [9] KLVFFA <6> #> [10] PLVFFA <6> #> printed 10 out of 421

There are also more traditional functions

reverse(sq_ami) #> basic amino acid sequences list: #> [1] VPKYVIQVKGGGP <13> #> [2] EVQGGGPKHHINGLSGCKSTVKSLDVPKYVIQVKGGGPQHKLN <43> #> [3] DVEKYVIQVKGGGPQHKLN <19> #> [4] KYVIQVKG <8> #> [5] KYVIQV <6> #> [6] VVGGVMLGIIAGKNSGVDEAFFVLKQHHVEYGSDHRFEAD <40> #> [7] SGVDEAFFVLKQHPV <15> #> [8] SGVDEAFFVLKQPHV <15> #> [9] SGVDEAFFVLKPHHV <15> #> [10] SGVDEAFFVLPQHHV <15> #> printed 10 out of 421

find_motifs() returns a whole tibble of useful informations

An example of dplyr integration:

library(dplyr)

tidysq integrates well with dplyr verbs

Citation

For citation type:

or use:

Michal Burdukiewicz, Dominik Rafacz, Laura Bakala, Jadwiga Slowik, Weronika Puchala, Filip Pietluch, Katarzyna Sidorczuk, Stefan Roediger and Leon Eyrich Jessen (2021). tidysq: Tidy Processing and Analysis of Biological Sequences. R package version 1.1.3.