GitHub - lorismichel/drf: Distributional Random Forests (Cevid et al., 2020) (original) (raw)

Distributional Random Forests

A package for forest-based conditional distribution estimation of a possibly multivariate response. The estimated distribution is in a simple form which allows for simple and fast computation of different functionals of the conditional distributions such as, for example, conditional quantiles, conditional correlations or conditional probability statements. One can do a heterogeneity adjustment with DRF by obtaining the weighting function which describes the relevance of each training point for a given test point and which can further be used as an input to some other method.

This repository started as a fork from the grf repository, which is itself forked from ranger repository. We sincerely thank the authors of both repositories for their useful and free packages.

Installation

The latest release of the package can be installed through CRAN (soon):

The development version can be installed from github

devtools::install_github("lorismichel/drf",subdir = "r-package/drf")

Another installation possibility is to clone the repo, and then within the r-package folder run

Note that to install from source, a compiler that implements C++11 is required (clang 3.3 or higher, or g++ 4.8 or higher). If installing on Windows, the RTools toolchain is also required.

Usage Example

require(drf)

generate data

n = 1000 p = 10 d = 2 X <- matrix(rnorm(np), ncol=p) Y <- matrix(rnorm(nd), ncol=d) Y[,1] = Y[,1] + X[,1] #mean shift of Y1 based on X1 Y[,2] = Y[,2] * X[,2] #variance shift of Y2 based on X2

fit model

fit <- drf(X = X, Y = Y, num.trees = 2000, splitting.rule = "FourierMMD") #those are the default values fit #prints variable importance

#generate test data X_test <- matrix(rnorm(100*p), ncol=p)

estimated conditional distribution represented via weights

predict(fit, newdata = X_test)

many distributional functionals are implemented and do not need to be manually computed from the weights

predict(fit, newdata = X_test, functional = "mean")

covariance matrix at a fixed test point

predict(fit, newdata = rep(0, p), functional = "cov")$cov[1,,]

we can transform the response beforehand to obtain more complicated quantities

predict(fit, newdata = X_test, functional = "quantile", quantiles=c(0.1, 0.9), transformation = function(y) c(sin(y[1]), y[1]*y[2], y[2]^2))