Class dm and basic operations (original) (raw)
2025-07-02
The goal of the {dm} package and the dm class that comes with it, is to make your life easier when you are dealing with data from several different tables.
Let’s take a look at the dm class.
Class dm
The dm class consists of a collection of tables and metadata about the tables, such as
- the names of the tables
- the names of the columns of the tables
- the primary and foreign keys of the tables to link the tables together
- the data (either as data frames or as references to database tables)
All tables in a dm must be obtained from the same data source; csv files and spreadsheets would need to be imported to data frames in R.
Examples of dm objects
There are currently three options available for creating adm object. The relevant functions for creatingdm objects are:
dm()as_dm()new_dm()dm_from_con()
To illustrate these options, we will now create the samedm in several different ways. We can use the tables from the well-known {nycflights13} package.
Pass the tables directly
Create a dm object directly by providing data frames todm():
library(nycflights13)
library(dm)
dm(airlines, airports, flights, planes, weather)Start with an empty dm
Start with an empty dm object that has been created withdm() or new_dm(), and add tables to that object:
library(nycflights13)
library(dm)
empty_dm <- dm()
empty_dm
dm(empty_dm, airlines, airports, flights, planes, weather)Coerce a list of tables
Turn a named list of tables into a dm withas_dm():
as_dm(list(
airlines = airlines,
airports = airports,
flights = flights,
planes = planes,
weather = weather
))Turn tables from a src into a dm
Squeeze all (or a subset of) tables belonging to a srcobject into a dm using dm_from_con():
sqlite_con <- dbplyr::nycflights13_sqlite()
flights_dm <- dm_from_con(sqlite_con)
flights_dmThe function dm_from_con(con, table_names = NULL)includes all available tables on a source in the dm object. This means that you can use this, for example, on a postgres database that you access via DBI::dbConnect(RPostgres::Postgres())(with the appropriate arguments dbname, host,port, …), to produce a dm object with all the tables on the database.
Low-level construction
Another way of creating a dm object is callingnew_dm() on a list of tbl objects:
base_dm <- new_dm(list(
airlines = airlines,
airports = airports,
flights = flights,
planes = planes,
weather = weather
))
base_dmThis constructor is optimized for speed and does not perform integrity checks. Use with caution, validate usingdm_validate() if necessary.
Access tables
We can get the list of tables with dm_get_tables() and the src object with dm_get_con().
In order to pull a specific table from a dm, use:
But how can we use {dm}-functions to manage the primary keys of the tables in a dm object?
Primary keys of dm objects
Some useful functions for managing primary key settings are:
dm_add_pk()dm_get_all_pks()dm_rm_pk()dm_enum_pk_candidates()
If you created a dm object according to the examples in“Examples of dm objects”, your object does not yet have any primary keys set. So let’s add one.
We use the nycflights13 tables, i.e. flights_dm from above.
dm_has_pk(flights_dm, airports)
flights_dm_with_key <- dm_add_pk(flights_dm, airports, faa)
flights_dm_with_keyThe dm now has a primary key:
dm_has_pk(flights_dm_with_key, airports)To get an overview over all tables with primary keys, usedm_get_all_pks():
dm_get_all_pks(flights_dm_with_key)Remove a primary key:
dm_rm_pk(flights_dm_with_key, airports) %>%
dm_has_pk(airports)If you still need to get to know your data better, and it is already available in the form of a dm object, you can use thedm_enum_pk_candidates() function in order to get information about which columns of the table are unique keys:
dm_enum_pk_candidates(flights_dm_with_key, airports)The flights table does not have any one-column primary key candidates:
dm_enum_pk_candidates(flights_dm_with_key, flights) %>% dplyr::count(candidate)dm_add_pk() has a check argument. If set toTRUE, the function checks if the column of the table given by the user is unique. For performance reasons, the default ischeck = FALSE. See also [dm_examine_constraints()] for checking all constraints in a dm.
try(
dm_add_pk(flights_dm, airports, tzone, check = TRUE)
)Foreign keys
Useful functions for managing foreign key relations include:
dm_add_fk()dm_get_all_fks()dm_rm_fk()dm_enum_fk_candidates()
Now it gets (even more) interesting: we want to define relations between different tables. With the dm_add_fk() function you can define which column of which table points to another table’s column.
This is done by choosing a foreign key from one table that will point to a primary key of another table. The primary key of the referred table must be set with dm_add_pk(). dm_add_fk() will find the primary key column of the referenced table by itself and make the indicated column of the child table point to it.
flights_dm_with_key %>% dm_add_fk(flights, origin, airports)This will throw an error:
try(
flights_dm %>% dm_add_fk(flights, origin, airports)
)Let’s create a dm object with a foreign key relation to work with later on:
flights_dm_with_fk <- dm_add_fk(flights_dm_with_key, flights, origin, airports)What if we tried to add another foreign key relation fromflights to airports to the object? Columndest might work, since it also contains airport codes:
try(
flights_dm_with_fk %>% dm_add_fk(flights, dest, airports, check = TRUE)
)Checks are opt-in and executed only if check = TRUE. You can still add a foreign key with the default check = FALSE. See also dm_examine_constraints() for checking all constraints in a dm.
Get an overview of all foreign key relations withdm_get_all_fks():
dm_get_all_fks(dm_nycflights13(cycle = TRUE))Remove foreign key relations with dm_rm_fk() (parametercolumns = NULL means that all relations will be removed, with a message):
try(
flights_dm_with_fk %>%
dm_rm_fk(table = flights, column = dest, ref_table = airports) %>%
dm_get_all_fks(c(flights, airports))
)
flights_dm_with_fk %>%
dm_rm_fk(flights, origin, airports) %>%
dm_get_all_fks(c(flights, airports))
flights_dm_with_fk %>%
dm_rm_fk(flights, columns = NULL, airports) %>%
dm_get_all_fks(c(flights, airports))Since the primary keys are defined in the dm object, you do not usually need to provide the referenced column name ofref_table.
Another function for getting to know your data better (cf. dm_enum_pk_candidates() in “Primary keys of dm objects”) isdm_enum_fk_candidates(). Use it to get an overview over foreign key candidates that point from one table to another:
dm_enum_fk_candidates(flights_dm_with_key, weather, airports)