When you think class(.) == *
, think again! - The R Blog (original) (raw)
Historical relict: R matrix
is not an array
In a recent discussion on the R-devel
mailing list, in a thread started on July 8,head.matrix can return 1000s of columns – limit to n or add new argument?Michael Chirico and then Gabe Becker where proposing to generalize the head()
and tail()
utility functions, and Gabe noted that current (pre R-4.x.y) head() would not treat arrayspecially.I’ve replied, notingthat R currently typically needs both a matrix
and an array
method:
Note however the following historical quirk :
sapply(setNames(,1:5),
function(K) inherits(array(7, dim=1:K), "array"))
((As I hope this will change, I explicitely put the current R 3.x.y result rather than evaluating the above R chunk: ))
1 2 3 4 5
TRUE FALSE TRUE TRUE TRUE
Note that matrix
objects are not array
s in that (inheritance) sense, even though — many useRs may not be aware of —
identical(
matrix(47, 2,3), # NB " n, n+1 " is slightly special
array (47, 2:3))
## [1] TRUE
all matrices can equivalently constructed by array(.)
though slightly more clumsily in the case of matrix(*, byrow=TRUE)
.
Note that because of that, base R itself has three functions where the matrix
and the array
methods are identical, as I wrote in the post:The consequence of that is that currently, “often” foo.matrix
is just a copy of foo.array
in the case the latter exists, with base
examples of foo in {unique, duplicated, anyDuplicated} .
for(e in expression(unique, duplicated, anyDuplicated)) { # `e` is a `symbol`
f.m <- get(paste(e, "matrix", sep="."))
f.a <- get(paste(e, "array", sep="."))
stopifnot(is.function(f.m),
identical(f.m, f.a))
}
In R 4.0.0, will a matrix()
be an "array"
?
In that same post, I’ve also asked
Is this something we should consider changing for R 4.0.0 – to have it TRUE also for 2d-arrays aka matrix objects ??
In the mean time, I’ve tentatively answered “yes” to my own question, and started investigating some of the consequences. From what I found, in too eager (unit) tests, some even written by myself, I was reminded that I had wanted to teach more people about an underlying related issue where we’ve seen many unsafe useR’s use R unsafely:
If you think class(.) == *
, think again: Rather inherits(., *)
…. or is(., *)
Most non-beginning R users are aware of inheritance between classes, and even more generally that R objects, at least conceptually, are of more than one “kind”. E.g, pi
is both "numeric"
and "double"
or 1:2
is both integer
and numeric
. They may know that time-date objects come in two forms: The ?DateTimeClasses(or ?POSIXt) help pagedescribes POSIXct
and POSIXlt
and says
"POSIXct"
is more convenient for including in data frames, and"POSIXlt"
is closer to human-readable forms. A virtual class"POSIXt"
exists from which both of the classes inherit …
and for example
class(tm <- Sys.time())
## [1] "POSIXct" "POSIXt"
shows that class(.)
is of length two here, something breaking a if(class(x) == "....") ..
call.
Formal Classes: S4
R’s formal class system, called S4
(implemented mainly in the standard R package methods
) provides functionality and tools to implement rich class inheritance structures, made use of heavily in package Matrix, or in the Bioconductor project with it’s 1800+ R “software” packages. Bioconductor even builds on core packages providing much used S4 classes, e.g.,Biostrings,S4Vectors,XVector,IRanges, andGenomicRanges. See alsoCommon Bioconductor Methods and Classes.
Within the formal S4 class system, where extension and inheritance are important and often widely used, an expression such as
if (class(obj) == "matrix") { ..... } # *bad* - do not copy !
is particularly unuseful, as obj
could well be of a class that extends matrix, and S4 using programmeRs learn early to rather use
if (is(obj, "matrix")) { ..... } # *good* !!!
Note that the Bioconductor guidelines for package developers have warned about the misuse of class(.) == *
, see the sectionR Code and Best Practices
Informal “Classical” Classes: S3
R was created as dialect or implementation of S, see Wikipedia’s R History, and for S, the “White Book” (Chambers & Hastie, 1992) introduced a convenient relatively simple object orientation (OO), later coined S3
because the white book introduced S version 3 (where the blue book described S version 2, and the green book S version 4, i.e., S4
).
The white book also introduced formulas, data frames, etc, and in some cases also the idea that some S objects could be particular cases of a given class, and in that sense extend that class. Examples, in R, too, have been multivariate time series ("mts"
) extending (simple) time series ("ts"
), or multivariate or generalized linear models ("mlm"
or "glm"
) extending normal linear models "lm"
.
The “Workaround”: class(.)[1]
So, some more experienced and careful programmers have been replacing class(x)
by class(x)[1]
(or class(x)[1L]
) in such comparisons, e.g., in a good and widely lauded useR! 2018 talk.
In some cases, this is good enough, and it is also what R’s data.class(.)
function does (among other), or the (user hidden) methods:::.class1(.)
.
However, programmeRs should be aware that this is just a workaround and leads to their working _incorrectly_in cases where typical S3 inheritance is used: In some situtation it is very natural to slightly modify or extend a function fitme()
whose result is of class "fitme"
, typically by writingfitmeMore()
, say, whose value would be of class c("fMore", "fitme")
such that almost all “fitme” methods would continue to work, but the author of fitmeMore()
would additionally provide a print()
method, i.e., provide method function print.fMore()
.
But if other users work with class(.)[1]
and have provided code for the caseclass(.)[1] == "fitme"
that code would wrongly not apply to the new "fMore"
objects.
The only correct solution is to work with inherits(., "fitme")
as that would apply to all objects it should.
In a much depended on CRAN package, the following line (slightly obfuscated) which should efficiently determine list entries of a certain class
isC <- vapply(args, class, "") == "__my_class__"
was found (and notified to the package maintainer) to need correction to
isC <- vapply(args, inherits, TRUE, what = "__my_class__")
Summary:
Instead class(x) == "foo"
, you should use inherits(x, "foo")
or maybe alternatively is(x, "foo")
Corollary:
switch(class(x)[1],
"class_1" = { ..... },
"class_2" = { ..... },
.......,
.......,
"class_10" = { ..... },
stop(" ... invalid class:", class(x)))
may look clean, but is is almost always not good enough, as it is (typically) wrong, e.g., when class(x)
is c("class_7", "class_2")
.
References
- R Core Team (2019). R Help pages:
- For S3,class or inherits
- For S4, e.g.,Basic use of S4 Methods and Classes, andis.
- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)The New S Language (the blue book, introducing S version 2 (
S2
)); Wadsworth & Brooks/Cole. - Chambers, J. M. and Hastie, T. J. eds (1992)Statistical Models in S (the white book, introducing S version 3 (
S3
); Chapman & Hall, London. - Chambers, John M. (1998)Programming with Data (the green book, for
S4
original); Springer. - Chambers, John M. (2008)Software for Data Analysis: Programming with R (
S4
etc for R); Springer.