R6-Based Flexible Framework for Permutation Tests (original) (raw)
Overview
This R package implements several non-parametric tests in chapters 1-5 of Higgins (2004), including tests for one sample, two samples, k samples, paired comparisons, blocked designs, trends and association. Built with Rcpp for efficiency and R6 for flexible, object-oriented design, it provides a unified framework for performing or creating custom permutation tests.
Installation
Install the stable version from CRAN:
Install the development version from Github:
# install.packages("remotes")
remotes::install_github("qddyy/LearnNonparam")
Usage
- Construct a test object
- from some R6 class directly
- using the
pmt
(permutation test) wrapper
# recommended for a unified API
t <- pmt("twosample.wilcoxon", n_permu = 1e6)
- Provide it with samples
- Check the results
- Modify some settings and observe the change
t$type <- "asymp"
t$p_value
See pmts()
for tests implemented in this package.
key | class | test |
---|---|---|
onesample.quantile | Quantile | Quantile Test |
onesample.cdf | CDF | Inference on Cumulative Distribution Function |
twosample.difference | Difference | Two-Sample Test Based on Mean or Median |
twosample.wilcoxon | Wilcoxon | Two-Sample Wilcoxon Test |
twosample.scoresum | ScoreSum | Two-Sample Test Based on Sum of Scores |
twosample.ansari | AnsariBradley | Ansari-Bradley Test |
twosample.siegel | SiegelTukey | Siegel-Tukey Test |
twosample.rmd | RatioMeanDeviance | Ratio Mean Deviance Test |
twosample.ks | KolmogorovSmirnov | Two-Sample Kolmogorov-Smirnov Test |
ksample.oneway | OneWay | One-Way Test for Equal Means |
ksample.kw | KruskalWallis | Kruskal-Wallis Test |
ksample.jt | JonckheereTerpstra | Jonckheere-Terpstra Test |
multcomp.studentized | Studentized | Multiple Comparison Based on Studentized Statistic |
paired.sign | Sign | Two-Sample Sign Test |
paired.difference | PairedDifference | Paired Comparison Based on Differences |
rcbd.oneway | RCBDOneWay | One-Way Test for Equal Means in RCBD |
rcbd.friedman | Friedman | Friedman Test |
rcbd.page | Page | Page Test |
association.corr | Correlation | Test for Association Between Paired Samples |
table.chisq | ChiSquare | Chi-Square Test on Contingency Table |
Extending
define_pmt
allows users to define new permutation tests. Take the two-sample Wilcoxon test as an example:
t_custom <- define_pmt(
# this is a two-sample permutation test
inherit = "twosample",
statistic = function(x, y) {
# (optional) pre-calculate certain constants that remain invariant during permutation
m <- length(x)
n <- length(y)
# return a closure to calculate the test statistic
function(x, y) sum(x) / m - sum(y) / n
},
# reject the null hypothesis when the test statistic is too large or too small
rejection = "lr", n_permu = 1e5
)
Also, the statistic can be written in C++. Leveraging Rcpp sugars and C++14 features, only minor modifications are needed to make it compatible with C++ syntax.
t_cpp <- define_pmt(
inherit = "twosample", rejection = "lr", n_permu = 1e5,
statistic = "[](const auto& x, const auto& y) {
auto m = x.length();
auto n = y.length();
return [=](const auto& x, const auto& y) {
return sum(x) / m - sum(y) / n;
};
}"
)
It’s easy to check that t_custom
and t_cpp
are equivalent:
Performance
coin is a commonly used R package for performing permutation tests. Below is a benchmark:
library(coin)
data <- c(x, y)
group <- factor(c(rep("x", length(x)), rep("y", length(y))))
options(LearnNonparam.pmt_progress = FALSE)
benchmark <- microbenchmark::microbenchmark(
R = t_custom$test(x, y),
Rcpp = t_cpp$test(x, y),
coin = wilcox_test(data ~ group, distribution = approximate(nresample = 1e5, parallel = "no"))
)
It can be seen that C++ brings significantly better performance than pure R, even surpassing the coin
package (under sequential execution). However, all tests in this package are currently written in R with no plans for migration to C++ in the future. This is because the primary goal of this package is not to maximize performance but to offer a flexible framework for permutation tests.
References
Higgins, J. J. 2004. An Introduction to Modern Nonparametric Statistics. Duxbury Advanced Series. Brooks/Cole.