tidyfst: Tidy

Verbs for Fast Data Manipulation (original) (raw)

downloads

download downloads downloads

ZENODO DOI JOSS DOI

Overview

tidyfst is a toolkit of tidy data manipulation verbs with_data.table_ as the backend . Combining the merits of syntax elegance from dplyr and computing performance from_data.table_, tidyfst intends to provide users with state-of-the-art data manipulation tools with least pain. This package is an extension of data.table, while enjoying a tidy syntax, it also wraps combinations of efficient functions to facilitate frequently-used data operations. Also, tidyfst would introduce more tidy data verbs from other packages, including but not limited to_tidyverse_ and data.table. If you are a dplyr_user but have to use data.table for speedy computation, or_data.table user looking for readable coding syntax,tidyfst is designed for you (and me of course). For further details and tutorials, see vignettes. BothChineseand Englishtutorials could be found there.

Till now, tidyfst has an API that might even transcend its predecessors (e.g. select_dtcould accept nearly anything for super column selection). Enjoy the efficient data operations in tidyfst !

PS: For extreme performance in tidy syntax, try _tidyfst_’s mirror package tidyft.

Features

Installation

install.packages("tidyfst")

Example

library(tidyfst)

iris %>%
  mutate_dt(group = Species,sl = Sepal.Length,sw = Sepal.Width) %>%
  select_dt(group,sl,sw) %>%
  filter_dt(sl > 5) %>%
  arrange_dt(group,sl) %>%
  distinct_dt(sl,.keep_all = T) %>%
  summarise_dt(sw = max(sw),by = group)
#>         group  sw
#>        <fctr> <num>
#> 1:     setosa 4.4
#> 2: versicolor 3.4
#> 3:  virginica 3.8

iris %>%
  count_dt(Species) %>%
  add_prop()
#>       Species     n      prop prop_label
#>        <fctr> <int>     <num>     <char>
#> 1:     setosa    50 0.3333333      33.3%
#> 2: versicolor    50 0.3333333      33.3%
#> 3:  virginica    50 0.3333333      33.3%

iris[3:8,] %>%
  mutate_when(Petal.Width == .2,
              one = 1,Sepal.Length=2)
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species one
#>          <num>       <num>        <num>       <num>  <fctr> <num>
#> 1:          2.0         3.2          1.3         0.2  setosa   1
#> 2:          2.0         3.1          1.5         0.2  setosa   1
#> 3:          2.0         3.6          1.4         0.2  setosa   1
#> 4:          5.4         3.9          1.7         0.4  setosa  NA
#> 5:          4.6         3.4          1.4         0.3  setosa  NA
#> 6:          2.0         3.4          1.5         0.2  setosa   1

Future plans

tidyfst will keep up with the updatesof data.table , in the next step would introduce more new features to improve the performance and flexibility to facilitate fast data manipulation in tidy syntax.

Vignettes

Cheat sheet

Suggested citation

Huang et al., (2020). tidyfst: Tidy Verbs for Fast Data Manipulation. Journal of Open Source Software, 5(52), 2388, https://doi.org/10.21105/joss.02388

Acknowledgement

The author of maditr,Gregory Demin and the author offst, Marcus Klik have helped me a lot in the development of this work. It is so lucky to have them (and many other selfless contributors) in the same open source community of R.