GitHub - tidyverse/dbplyr: Database (DBI) backend for dplyr (original) (raw)
dbplyr 
Overview
dbplyr is the database backend for dplyr. It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL.
To learn more about why you might use dbplyr instead of writing SQL, seevignette("sql")
. To learn more about the details of the SQL translation, see vignette("translation-verb")
andvignette("translation-function")
.
Installation
The easiest way to get dbplyr is to install the whole tidyverse:
install.packages("tidyverse")
Alternatively, install just dbplyr:
install.packages("dbplyr")
Or the development version from GitHub:
install.packages("pak")
pak::pak("tidyverse/dbplyr")
Usage
dbplyr is designed to work with database tables as if they were local data frames. To demonstrate this I’ll first create an in-memory SQLite database and copy over a dataset:
library(dplyr, warn.conflicts = FALSE)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars)
Note that you don’t actually need to load dbplyr with library(dbplyr)
; dplyr automatically loads it for you when it sees you working with a database. Database connections are coordinated by the DBI package. Learn more at https://dbi.r-dbi.org/
Now you can retrieve a table using tbl()
(see ?tbl_dbi
for more details). Printing it just retrieves the first few rows:
mtcars2 <- tbl(con, "mtcars")
mtcars2
#> # Source: table<mtcars
> [?? x 11]
#> # Database: sqlite 3.45.0 [:memory:]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ more rows
All dplyr calls are evaluated lazily, generating SQL that is only sent to the database when you request the data.
lazily generates query
summary <- mtcars2 %>% group_by(cyl) %>% summarise(mpg = mean(mpg, na.rm = TRUE)) %>% arrange(desc(mpg))
see query
summary %>% show_query()
#>
#> SELECT cyl
, AVG(mpg
) AS mpg
#> FROM mtcars
#> GROUP BY cyl
#> ORDER BY mpg
DESC
execute query and retrieve results
summary %>% collect() #> # A tibble: 3 × 2 #> cyl mpg #> #> 1 4 26.7 #> 2 6 19.7 #> 3 8 15.1
Code of Conduct
Please note that the dbplyr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.