GitHub - LucyMcGowan/tidycode (original) (raw)

tidycode

The goal of tidycode is to allow users to analyze R expressions in a tidy way.

Installation

You can install tidycode from CRAN with:

install.packages("tidycode")

You can install the development version of tidycode from github with:

install.packages("remotes")

remotes::install_github("LucyMcGowan/tidycode")

Example

Read in existing code

Using the matahari package, we can read in existing code, either as a string or a file, and turn it into a matahari tibble usingmatahari::dance_recital().

code <- " library(broom) library(glue) m <- lm(mpg ~ am, data = mtcars) t <- tidy(m) glue_data(t, 'The point estimate for term {term} is {estimate}.') "

m <- matahari::dance_recital(code)

Alternatively, you may already have a matahari tibble that was recorded during an R session.

Load the tidycode library.

We can use the expressions from this matahari tibble to extract the names of the packages included. We can also create a data frame that will include all functions of the packages included.

(pkg_names <- ls_packages(m$expr)) #> [1] "broom" "glue" pkg_functions <- get_package_functions(m$expr)

Create a data frame of your expressions, splitting each into individual functions.

u <- unnest_calls(m, expr)

Merge in the package names

u <- u %>% dplyr::left_join(pkg_functions) %>% dplyr::select(func, args, line, package) #> Joining, by = "func" u #> # A tibble: 8 x 4 #> func args line package #>
#> 1 library <list [1]> 1 base
#> 2 library <list [1]> 2 base
#> 3 <- <list [2]> 3 base
#> 4 lm <named list [2]> 3 stats
#> 5 ~ <list [2]> 3 base
#> 6 <- <list [2]> 4 base
#> 7 tidy <list [1]> 4 broom
#> 8 glue_data <list [2]> 5 glue

Add in the function classifications!

u %>% dplyr::inner_join( get_classifications("crowdsource", include_duplicates = FALSE) ) #> Joining, by = "func" #> # A tibble: 8 x 5 #> func args line package classification #>
#> 1 library <list [1]> 1 base setup
#> 2 library <list [1]> 2 base setup
#> 3 <- <list [2]> 3 base data cleaning #> 4 lm <named list [2]> 3 stats modeling
#> 5 ~ <list [2]> 3 base modeling
#> 6 <- <list [2]> 4 base data cleaning #> 7 tidy <list [1]> 4 broom modeling
#> 8 glue_data <list [2]> 5 glue communication

We can also remove a list of “stopwords”. We have a function,get_stopfuncs() that lists common “stopwords”, frequently used operators, like %>% and +.

u %>% dplyr::inner_join( get_classifications("crowdsource", include_duplicates = FALSE) ) %>% dplyr::anti_join(get_stopfuncs()) %>% dplyr::select(func, classification) #> Joining, by = "func" #> Joining, by = "func" #> # A tibble: 5 x 2 #> func classification #>
#> 1 library setup
#> 2 library setup
#> 3 lm modeling
#> 4 tidy modeling
#> 5 glue_data communication