GitHub - mlr-org/mlr3: mlr3: Machine Learning in R - next generation (original) (raw)
mlr3 
Efficient, object-oriented programming on the building blocks of machine learning. Successor of mlr.
Resources (for users and developers)
- We have written a book. This should be the central entry point to the package.
- The mlr-org website includes for example agallery with case studies.
- Reference manual
- FAQ
- Ask questions on Stackoverflow (tag #mlr3)
- Extension Learners
- Recommended core regression, classification, and survival learners are in mlr3learners
- All others are inmlr3extralearners
- Use the learner search to get a simple overview
- Cheatsheets
- Videos:
- Courses/Lectures
- The course Introduction to Machine learning (I2ML) is a free and open flipped classroom course on the basics of machine learning.
mlr3
is used in thedemosandexercises.
- The course Introduction to Machine learning (I2ML) is a free and open flipped classroom course on the basics of machine learning.
- Templates/Tutorials
- mlr3-targets: Tutorial showcasing how to use {mlr3} withtargets for reproducible ML workflow automation.
- List of extension packages
- mlr-outreach contains public talks and slides resources.
- Wiki: Contains mainly information for developers.
Installation
Install the last release from CRAN:
Install the development version from GitHub:
install.packages("pak")
pak::pak("mlr-org/mlr3")
If you want to get started with mlr3
, we recommend installing themlr3verse meta-package which installsmlr3
and some of the most important extension packages:
install.packages("mlr3verse")
Example
Constructing Learners and Tasks
library(mlr3)
create learning task
task_penguins = as_task_classif(species ~ ., data = palmerpenguins::penguins) task_penguins
## <TaskClassif:palmerpenguins::penguins> (344 x 8)
## * Target: species
## * Properties: multiclass
## * Features (7):
## - int (3): body_mass_g, flipper_length_mm, year
## - dbl (2): bill_depth_mm, bill_length_mm
## - fct (2): island, sex
load learner and set hyperparameter
learner = lrn("classif.rpart", cp = .01)
Basic train + predict
train/test split
split = partition(task_penguins, ratio = 0.67)
train the model
learner$train(task_penguins, split$train_set)
predict data
prediction = learner$predict(task_penguins, split$test_set)
calculate performance
prediction$confusion
## truth
## response Adelie Chinstrap Gentoo
## Adelie 146 5 0
## Chinstrap 6 63 1
## Gentoo 0 0 123
measure = msr("classif.acc") prediction$score(measure)
## classif.acc
## 0.9651163
Resample
3-fold cross validation
resampling = rsmp("cv", folds = 3L)
run experiments
rr = resample(task_penguins, learner, resampling)
access results
rr$score(measure)[, .(task_id, learner_id, iteration, classif.acc)]
## task_id learner_id iteration classif.acc
## 1: palmerpenguins::penguins classif.rpart 1 0.8956522
## 2: palmerpenguins::penguins classif.rpart 2 0.9478261
## 3: palmerpenguins::penguins classif.rpart 3 0.9649123
## classif.acc
## 0.9361302
Extension Packages
Consult thewiki for short descriptions and links to the respective repositories.
For beginners, we strongly recommend to install and load themlr3verse package for a better user experience.
Why a rewrite?
mlr was first released toCRAN in 2013. Its core design and architecture date back even further. The addition of many features has led to a feature creep which makesmlr hard to maintain and hard to extend. We also think that while mlr was nicely extensible in some parts (learners, measures, etc.), other parts were less easy to extend from the outside. Also, many helpful R libraries did not exist at the timemlr was created, and their inclusion would result in non-trivial API changes.
Design principles
- Only the basic building blocks for machine learning are implemented in this package.
- Focus on computation here. No visualization or other stuff. That can go in extra packages.
- Overcome the limitations of R’s S3 classes with the help ofR6.
- Embrace R6 for a clean OO-design, object state-changes and reference semantics. This might be less “traditional R”, but seems to fit
mlr
nicely. - Embrace data.tablefor fast and convenient data frame computations.
- Combine
data.table
andR6
, for this we will make heavy use of list columns in data.tables. - Defensive programming and type safety. All user input is checked withcheckmate. Return types are documented, and mechanisms popular in base R which “simplify” the result unpredictably (e.g.,
sapply()
ordrop
argument in[.data.frame
) are avoided. - Be light on dependencies.
mlr3
requires the following packages at runtime:- parallelly: Helper functions for parallelization. No extra recursive dependencies.
- future.apply: Resampling and benchmarking is parallelized with thefuture abstraction interfacing many parallel backends.
- backports: Ensures backward compatibility with older R releases. Developed by members of the
mlr
team. No recursive dependencies. - checkmate: Fast argument checks. Developed by members of the
mlr
team. No extra recursive dependencies. - mlr3misc: Miscellaneous functions used in multiple mlr3 extension packages. Developed by the
mlr
team. - paradox: Descriptions for parameters and parameter sets. Developed by the
mlr
team. No extra recursive dependencies. - R6: Reference class objects. No recursive dependencies.
- data.table: Extension of R’s
data.frame
. No recursive dependencies. - digest (via
mlr3misc
): Hash digests. No recursive dependencies. - uuid: Create unique string identifiers. No recursive dependencies.
- lgr: Logging facility. No extra recursive dependencies.
- mlr3measures: Performance measures. No extra recursive dependencies.
- mlbench: A collection of machine learning data sets. No dependencies.
- palmerpenguins: A classification data set about penguins, used on examples and provided as a toy task. No dependencies.
- Reflections: Objects are queryable for properties and capabilities, allowing you to program on them.
- Additional functionality that comes with extra dependencies:
Contributing to mlr3
This R package is licensed under theLGPL-3. If you encounter problems using this software (lack of documentation, misleading or wrong documentation, unexpected behavior, bugs, …) or just want to suggest features, please open an issue in the issue tracker. Pull requests are welcome and will be included at the discretion of the maintainers.
Please consult the wiki for astyle guide, aroxygen guide and a pull request guide.
Citing mlr3
If you use mlr3, please cite our JOSS article:
@Article{mlr3,
title = {{mlr3}: A modern object-oriented machine learning framework in {R}},
author = {Michel Lang and Martin Binder and Jakob Richter and Patrick Schratz and Florian Pfisterer and Stefan Coors and Quay Au and Giuseppe Casalicchio and Lars Kotthoff and Bernd Bischl},
journal = {Journal of Open Source Software},
year = {2019},
month = {dec},
doi = {10.21105/joss.01903},
url = {https://joss.theoj.org/papers/10.21105/joss.01903},
}