R6-Based Flexible Framework for Permutation Tests (original) (raw)

Overview

This R package implements several non-parametric tests in chapters 1-5 of Higgins (2004), including tests for one sample, two samples, k samples, paired comparisons, blocked designs, trends and association. Built with Rcpp for efficiency and R6 for flexible, object-oriented design, it provides a unified framework for performing or creating custom permutation tests.

Installation

Install the stable version from CRAN:

Install the development version from Github:

# install.packages("remotes")
remotes::install_github("qddyy/LearnNonparam")

Usage

Construct a test object
- from some R6 class directly
- using the pmt (permutation test) wrapper

# recommended for a unified API  
t <- pmt("twosample.wilcoxon", n_permu = 1e6)

Provide it with samples
Check the results
Modify some settings and observe the change

t$type <- "asymp"  
t$p_value

See pmts() for tests implemented in this package.

key	class	test
onesample.quantile	Quantile	Quantile Test
onesample.cdf	CDF	Inference on Cumulative Distribution Function
twosample.difference	Difference	Two-Sample Test Based on Mean or Median
twosample.wilcoxon	Wilcoxon	Two-Sample Wilcoxon Test
twosample.scoresum	ScoreSum	Two-Sample Test Based on Sum of Scores
twosample.ansari	AnsariBradley	Ansari-Bradley Test
twosample.siegel	SiegelTukey	Siegel-Tukey Test
twosample.rmd	RatioMeanDeviance	Ratio Mean Deviance Test
twosample.ks	KolmogorovSmirnov	Two-Sample Kolmogorov-Smirnov Test
ksample.oneway	OneWay	One-Way Test for Equal Means
ksample.kw	KruskalWallis	Kruskal-Wallis Test
ksample.jt	JonckheereTerpstra	Jonckheere-Terpstra Test
multcomp.studentized	Studentized	Multiple Comparison Based on Studentized Statistic
paired.sign	Sign	Two-Sample Sign Test
paired.difference	PairedDifference	Paired Comparison Based on Differences
rcbd.oneway	RCBDOneWay	One-Way Test for Equal Means in RCBD
rcbd.friedman	Friedman	Friedman Test
rcbd.page	Page	Page Test
association.corr	Correlation	Test for Association Between Paired Samples
table.chisq	ChiSquare	Chi-Square Test on Contingency Table

Extending

define_pmt allows users to define new permutation tests. Take the two-sample Wilcoxon test as an example:

t_custom <- define_pmt(
    # this is a two-sample permutation test
    inherit = "twosample",
    statistic = function(x, y) {
        # (optional) pre-calculate certain constants that remain invariant during permutation
        m <- length(x)
        n <- length(y)
        # return a closure to calculate the test statistic
        function(x, y) sum(x) / m - sum(y) / n
    },
    # reject the null hypothesis when the test statistic is too large or too small
    rejection = "lr", n_permu = 1e5
)

Also, the statistic can be written in C++. Leveraging Rcpp sugars and C++14 features, only minor modifications are needed to make it compatible with C++ syntax.

t_cpp <- define_pmt(
    inherit = "twosample", rejection = "lr", n_permu = 1e5,
    statistic = "[](const auto& x, const auto& y) {
        auto m = x.length();
        auto n = y.length();
        return [=](const auto& x, const auto& y) {
            return sum(x) / m - sum(y) / n;
        };
    }"
)

It’s easy to check that t_custom and t_cpp are equivalent:

Performance

coin is a commonly used R package for performing permutation tests. Below is a benchmark:

library(coin)

data <- c(x, y)
group <- factor(c(rep("x", length(x)), rep("y", length(y))))

options(LearnNonparam.pmt_progress = FALSE)
benchmark <- microbenchmark::microbenchmark(
    R = t_custom$test(x, y),
    Rcpp = t_cpp$test(x, y),
    coin = wilcox_test(data ~ group, distribution = approximate(nresample = 1e5, parallel = "no"))
)

It can be seen that C++ brings significantly better performance than pure R, even surpassing the coin package (under sequential execution). However, all tests in this package are currently written in R with no plans for migration to C++ in the future. This is because the primary goal of this package is not to maximize performance but to offer a flexible framework for permutation tests.

References

Higgins, J. J. 2004. An Introduction to Modern Nonparametric Statistics. Duxbury Advanced Series. Brooks/Cole.