GitHub - lorismichel/drf: Distributional Random Forests (Cevid et al., 2020) (original) (raw)
Distributional Random Forests
A package for forest-based conditional distribution estimation of a possibly multivariate response. The estimated distribution is in a simple form which allows for simple and fast computation of different functionals of the conditional distributions such as, for example, conditional quantiles, conditional correlations or conditional probability statements. One can do a heterogeneity adjustment with DRF by obtaining the weighting function which describes the relevance of each training point for a given test point and which can further be used as an input to some other method.
This repository started as a fork from the grf repository, which is itself forked from ranger repository. We sincerely thank the authors of both repositories for their useful and free packages.
Installation
The latest release of the package can be installed through CRAN (soon):
The development version can be installed from github
devtools::install_github("lorismichel/drf",subdir = "r-package/drf")
Another installation possibility is to clone the repo, and then within the r-package folder run
Note that to install from source, a compiler that implements C++11 is required (clang 3.3 or higher, or g++ 4.8 or higher). If installing on Windows, the RTools toolchain is also required.
Usage Example
require(drf)
generate data
n = 1000 p = 10 d = 2 X <- matrix(rnorm(np), ncol=p) Y <- matrix(rnorm(nd), ncol=d) Y[,1] = Y[,1] + X[,1] #mean shift of Y1 based on X1 Y[,2] = Y[,2] * X[,2] #variance shift of Y2 based on X2
fit model
fit <- drf(X = X, Y = Y, num.trees = 2000, splitting.rule = "FourierMMD") #those are the default values fit #prints variable importance
#generate test data X_test <- matrix(rnorm(100*p), ncol=p)
estimated conditional distribution represented via weights
predict(fit, newdata = X_test)
many distributional functionals are implemented and do not need to be manually computed from the weights
predict(fit, newdata = X_test, functional = "mean")
covariance matrix at a fixed test point
predict(fit, newdata = rep(0, p), functional = "cov")$cov[1,,]
we can transform the response beforehand to obtain more complicated quantities
predict(fit, newdata = X_test, functional = "quantile", quantiles=c(0.1, 0.9), transformation = function(y) c(sin(y[1]), y[1]*y[2], y[2]^2))