Segfaults: "numeric(0)" input in several fast statistical functions · Issue #101 · SebKrantz/collapse (original) (raw)

While working on some data cleaning, a numeric(0) vector popped out after base subsetting. This led to a segfault when I tried to use the zero length vector with fsum(). Afterwards I tested all other fast statistical functions with the following results:

collapse::fsum(numeric(0)) # Segfault: address 0x55f7363efff8, cause 'memory not mapped' collapse::fprod(numeric(0)) # Segfault: address 0x557a14cc8ff8, cause 'memory not mapped' collapse::fmean(numeric(0)) # Segfault: address 0x559c7bdfaff8, cause 'memory not mapped collapse::fmedian(numeric(0)) # Output: [1] 4.668247e-310 collapse::fmode(numeric(0)) # Output: [1] 4.668247e-310 collapse::fvar(numeric(0)) # Output: [1] 0 (but the equivalent 'stats::var(numeric(0))' returns NA collapse::fsd(numeric(0)) # Output: [1] 0 (but the equivalent 'stats::sd(numeric(0))' returns NA collapse::fmin(numeric(0)) # Segfault: address 0x55ef52e99ff8, cause 'memory not mapped' collapse::fmax(numeric(0)) # Segfault: address 0x5586c9c93ff8, cause 'memory not mapped' collapse::fnth(numeric(0)) # Output: [1] 4.668247e-310 collapse::ffirst(numeric(0)) # Output: [1] 4.668247e-310 collapse::flast(numeric(0)) # Output: [1] 0 (inconsistent data types compared to 'ffirst()') collapse::fNobs(numeric(0)) # Output: [1] 0 collapse::fNdistinct(numeric(0)) # Output: [1] 0

fNobs() and fNdistinct() both return the expected output. For the others, as I've commented, they either:

  1. Cause a crash.
  2. Inconsistent output compared to the stats package (e.g. stats::median(numeric(0)) returns NA, not 0).
  3. Internally inconsistent output (e.g. ffirst() and flast() return different data types).

I've attached a gdb session with the backtrace for the segfault with fsum().
debug_session.txt

Session info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bullseye/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8          LC_NUMERIC=C
 [3] LC_TIME=en_IL.utf8        LC_COLLATE=en_IL.utf8
 [5] LC_MONETARY=en_IL.utf8    LC_MESSAGES=en_IL.utf8
 [7] LC_PAPER=en_IL.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IL.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.3           parallel_4.0.3           RcppArmadillo_0.10.1.2.0
[4] Rcpp_1.0.5               collapse_1.4.2

Package compiled with gcc:

gcc (Debian 10.2.0-19) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.