Abstract Categories (original) (raw)

Previous: Crunch internals

There are number of areas where Crunch needs to represent an object as belonging to one of many categories. The simplest and most common example of this is the categories of a categorical variable. For a categorical variable, the values of the variable can be one of a limited set of categories and those categories are specified in the Crunch API as metadata about the variable. These categoricals are similar to R’sfactors but are richer because Crunch categoricals can have any number of missing values (compared to just NA forfactors), as well as a numeric representation that is separate from the category ids (which is useful for things like income bins, where you might put the middle of the bin as the value).

Moving beyond just categorical variables, we have a need to be able to represent a number of different properties, transformations, etc. in a category-like way. One concrete example is used heavily in order to add subtotals and headings to representations of categorical variables. In order to do this, we have two families of S4 classes:AbstractCategory and AbstractCategoriesAlthough subtotals and headings was the initial motivation for the new classes, they will allow for other types of representations and manipulations in the future.

AbstractCategories

The core classes that all other classes inherit from areAbstractCategory and AbstractCategories. The first, AbstractCategory, is designed to represent a single category, which might have a number of properties about it (what those are will be explained in more detail below). The second,AbstractCategories is designed to hold more than oneAbstractCategory together to form a coherent group. As a simple, example: an AbstractCategories for binned income could have 5 AbstractCategorys: <$25,000, 25,000−25,000-25,00049,999, 50,000−50,000-50,00099,999, 100,000−100,000-100,000199,999, >$200,000. This could be represented in R as:

income <- AbstractCategories(AbstractCategory(name = "<$25,000"),
                             AbstractCategory(name = "$25,000-$49,999"),
                             AbstractCategory(name = "$50,000-$99,999"),
                             AbstractCategory(name = "$100,000-$199,999"),
                             AbstractCategory(name = ">$200,000"))

An alternate (and less typing) way to instantiate this sameAbstractCategories is to send lists, and the constructor takes care of calling the AbstractCategory class on each (as below). Each of the child-classes of AbstractCategories(described in the sections below) have their own mapping of plural container to singular entity constructor in the same way, so passingCategories a list will result in a Categoriesobject full of Category objects.

income <- AbstractCategories(list(name = "<$25,000"),
                             list(name = "$25,000-$49,999"),
                             list(name = "$50,000-$99,999"),
                             list(name = "$100,000-$199,999"),
                             list(name = ">$200,000"))

Finally, there’s a data argument, if you already have a list of AbstractCategorys (or simply named lists!) you want to pass in (the same thing could also be accomplished withdo.call):

income_list <- list(list(name = "<$25,000"),
                    list(name = "$25,000-$49,999"),
                    list(name = "$50,000-$99,999"),
                    list(name = "$100,000-$199,999"),
                    list(name = ">$200,000"))
income <- AbstractCategories(data=income_list)

Methods

Any methods that are defined for the abstract classes will function on the subclasses as well. Child classes might have special over-ride methods defined for them, but for the most part, if a method can be used on AbstractCategories or AbstractCategory it can be used on the child classes as well.

AbstractCategories inherits from list andAbstractCategory inherits from namedList so many of the same methods will be work with both of them. This includes using [, [[, [<-, and[[<- to get and set subsets ofAbstractCategories and $, and [[to get the properties in an AbstractCategory.

lapply has also been defined forAbstractCategories for easily iterating over all members.modifyCats also allows for modifying oneAbstractCategories object by updating with new information from a second AbstractCategories object in the same way that modifyList works, but crucially it does not recurse into the AbstractCategory objects themselves.

Finally, there are a few custom methods that return the values of the properties as either a vector of that property for each member (when using the plural versions against AbstractCategories) or a vector (typically of length one) for a single member (when using the singular versions against AbstractCategory).

names returns the names associated with eachAbstractCategory in an AbstractCategoriesobject. And name returns the names associated with anAbstractCategory object. ids andid patterns the exact same way.

Categories

Categories from a categorical variable are represented by theCategories and Category classes. They inherit directly from AbstractCategories and Categoryrespectively. For these, each Category must have aname and an id, they optionally can have anumeric_value, missing, andselected property.

Methods

Insertions

Insertions allow users to insert new categories into a variable or a CrunchCube for display purposes. This is useful when the user would like to show things like aggregates (e.g. subtotals) without manipulating the underlying data (or creating a new variable). Insertions are defined as part of the Crunch API (see the Transforms section below for an explanation about where Insertions live). The Insertions class is designed to mirror the Crunch API for insertions as closely as possible. Insertions and Insertion inherit directly from AbstractCategories and Categoryrespectively.

Insertions must have a name and ananchor. The name is just likeCategory names, and is used as the label to display. Theanchor is the id of the category after which the insertion should be placed.

Since insertions can represent a number of different aggregations, they also can have function and argsproperties. The function property is a character describing the aggregation to use (e.g. "subtotal") and theargs property is a vector of the category ids to use as operands for the function.

The Insertion class has two child classes:Subtotal and Heading. TheInsertions class can contain anything that inherits fromInsertion. Therefor an Insertions object might include Insertions, Subtotals, andHeadings.

Methods

Subtotals and Headings

Subtotals and headings are both types of insertions. Because of this Subtotal and Heading classes inherit from Insertion rather than directly fromAbstractCategory. These classes are designed to hold known types of Insertions to make it easier to work with Insertions (for example: testing which insertion to style in what way when usingprettyPrint functions). Additionally, these classes have slightly more user-friendly names (e.g. after instead ofanchor), and they accept either ids ornames to refer to specific Categorys.

Subtotal

A Subtotal must have name,after, and categories properties.name is the same as other abstract categories.after is similar to anchor but can be either a category id or a category name after which the subtotal should be placed. categories is either the category ids or a category names to subtotal.

Methods

The same as Insertion, however some have customizations: * func always returns the string "subtotal"(because by definition a Subtotal object is anInsertion with function="subtotal") *anchor and arguments both have an optionvar_items which is required if the Subtotal is using category names instead of ids in the after orcategories properties. Supplying the categories is required in order to translate from category names toids which are required to be a well-formedInsertion.

Heading

A Heading must have name andafter properties. Both of which have the same interpretation as Subtotal above.

Methods

The same as Subtotal for anchor.func and arguments return NA

As a concrete example, let’s take the following categories:

feeling_cats <- Categories(
    list(name = "Very Happy", id = 1),
    list(name = "Somewhat Happy", id = 2),
    list(name = "Neither Happy nor Unhappy", id = 3),
    list(name = "Somewhat Unhappy", id = 4),
    list(name = "Very Unhappy", id = 5)
)
feeling_cats
##   id                      name value missing
## 1  1                Very Happy    NA   FALSE
## 2  2            Somewhat Happy    NA   FALSE
## 3  3 Neither Happy nor Unhappy    NA   FALSE
## 4  4          Somewhat Unhappy    NA   FALSE
## 5  5              Very Unhappy    NA   FALSE

And make some subtotals and headings to use as insertions:

feeling_subtotals <- Insertions(
    Heading(name = "How I feel about cheese", position = "top"),
    Subtotal(name = "Generally Happy", after = "Somewhat Happy", 
        categories = c("Very Happy", "Somewhat Happy")),
    Subtotal(name = "Generally Unhappy", after = 5, 
        categories = c(4, 5))
)

Notice that the “Generally Happy” subtotal is made specifying category names for after andcategories:

feeling_subtotals[[2]]$after
## [1] "Somewhat Happy"
feeling_subtotals[[2]]$categories
## [1] "Very Happy"     "Somewhat Happy"

Where as the “Generally Unhappy” subtotal uses ids:

feeling_subtotals[[3]]$after
## [1] 5
feeling_subtotals[[3]]$categories
## [1] 4 5

Converting from Subtotal/Heading to Insertion

Since the Crunch API does not have a distinction betweenSubtotals Headings, and otherInsertions, we sometimes need to convert fromSubtotals or Headings toInsertions. This is accomplished with the methodmakeInsertion(). This method takes a Subtotalor Heading and returns a valid Insertion. If the Subtotal or Heading has categoryname references instead of ids, then you must include a Categories object as the var_itemsargument. In general, this is only needed before sending a heterogeneous set of Insertions to the Crunch API.

Using the examples we used before, we can see how this works:

feeling_insertions <- Insertions(data = lapply(feeling_subtotals, makeInsertion, var_items = feeling_cats))

Now, all of the Subtotals and Heading fromfeeling_subtotals are proper Insertions:

sapply(feeling_insertions, class)
## [1] "Insertion" "Insertion" "Insertion"

This means that the after property has been translated into anchor, and the function andargs properties have been filled in appropriately:

feeling_insertions[[3]]$anchor
## [1] 5
feeling_insertions[[3]]$`function`
## [1] "subtotal"
feeling_insertions[[3]]$args
## [1] 4 5

Because Insertions are required to use categoryids only, the new all-Insertionsfeeling_insertions has translated the “Generally Happy” subtotal’s category names to ids:

feeling_insertions[[2]]$anchor
## [1] 2
feeling_insertions[[2]]$args
## [1] 1 2

Converting from Insertion to Subtotal/Heading

Since the Crunch API does not have a distinction betweenSubtotals Headings, and otherInsertions when we get data about Insertions from the API, we need to change the classes for theInsertions that the crunch package knows about. To do this, we can use either subtypeInsertions to change the types of all of the members of an Insertionsobject, or subtypeInsertion to change the type of a singleInsertion object.

These functions work by inspecting the Insertion and determining if it can be identified as one of the known child classes ofInsertion (namely: Subtotal orHeading).

Using the same example above, we can convert back from allInsertions to the subtypes:

feeling_subtotals_again <- subtypeInsertions(feeling_insertions)
sapply(feeling_subtotals_again, class)
## [1] "Heading"  "Subtotal" "Subtotal"

Inheritance

There are two sets of inheritance: one for containers and one for members: Classes inherit from those immediately to their left

top-level classes 1st children 2nd children
containers AnstractCategories Categories
AnstractCategories Insertions
members AbstractCategory Category
AbstractCategory Insertion Subtotal
AbstractCategory Insertion Heading

Transforms

The Transforms class and set of functions is not an abstract category at all, but rather it mirrors the Crunch API’s set of transformations that are allowed on a variable or CrunchCube. One of the possible transformations are insertions (which is whereInsertions are stored). Currently the crunchpackage doesn’t support other transformations.