README (original) (raw)
utiml: Utilities for Multi-label Learning
The utiml package is a framework to support multi-label processing, like Mulan on Weka.
The main methods available on this package are organized in the groups: - Classification methods - Evaluation methods - Pre-process utilities - Sampling methods - Threshold methods
Instalation
The installation process is similar to other packages available on CRAN:
install.packages("utiml")
This will also install mldr. To run the examples in this document, you also need to install the packages:
# Base classifiers (SVM and Random Forest)
install.packages(c("e1071", "randomForest"))
Install via github (development version)
devtools::install_github("rivolli/utiml")
Multi-label Classification
Running Binary Relevance Method
library(utiml)
# Create two partitions (train and test) of toyml multi-label dataset
ds <- create_holdout_partition(toyml, c(train=0.65, test=0.35))
# Create a Binary Relevance Model using e1071::svm method
brmodel <- br(ds$train, "SVM", seed=123)
# Predict
prediction <- predict(brmodel, ds$test)
# Show the predictions
head(as.bipartition(prediction))
head(as.ranking(prediction))
# Apply a threshold
newpred <- rcut_threshold(prediction, 2)
# Evaluate the models
result <- multilabel_evaluate(ds$tes, prediction, "bipartition")
thresres <- multilabel_evaluate(ds$tes, newpred, "bipartition")
# Print the result
print(round(cbind(Default=result, RCUT=thresres), 3))
Running Ensemble of Classifier Chains
library(utiml)
# Create three partitions (train, val, test) of emotions dataset
partitions <- c(train = 0.6, val = 0.2, test = 0.2)
ds <- create_holdout_partition(emotions, partitions, method="iterative")
# Create an Ensemble of Classifier Chains using Random Forest (randomForest package)
eccmodel <- ecc(ds$train, "RF", m=3, cores=parallel::detectCores(), seed=123)
# Predict
val <- predict(eccmodel, ds$val, cores=parallel::detectCores())
test <- predict(eccmodel, ds$test, cores=parallel::detectCores())
# Apply a threshold
thresholds <- scut_threshold(val, ds$val, cores=parallel::detectCores())
new.val <- fixed_threshold(val, thresholds)
new.test <- fixed_threshold(test, thresholds)
# Evaluate the models
measures <- c("subset-accuracy", "F1", "hamming-loss", "macro-based")
result <- cbind(
Test = multilabel_evaluate(ds$tes, test, measures),
TestWithThreshold = multilabel_evaluate(ds$tes, new.test, measures),
Validation = multilabel_evaluate(ds$val, val, measures),
ValidationWithThreshold = multilabel_evaluate(ds$val, new.val, measures)
)
print(round(result, 3))
More examples and details are available on functions documentations and vignettes, please refer to the documentation.
How to cite?
@article{RJ-2018-041,
author = {Adriano Rivolli and Andre C. P. L. F. de Carvalho},
title = {{The utiml Package: Multi-label Classification in R}},
year = {2018},
journal = {{The R Journal}},
doi = {10.32614/RJ-2018-041},
url = {https://doi.org/10.32614/RJ-2018-041},
pages = {24--37},
volume = {10},
number = {2}
}