[R] Introductory vignette is outdated · Issue #10746 · dmlc/xgboost (original) (raw)

The R vignette " Understand your dataset with XGBoost" is quite outdated after all the changes that XGBoost has implemented since it was written.

For example, it writes:

XGBoost manages only numeric vectors.

What to do when you have categorical data?
To answer the question above we will convert categorical variables to numeric one.

But categorical features are now supported.

The method we are going to see is usually called one-hot encoding.

(which makes sense given that it refers to a discretized version of a numeric variable)

But XGBoost can already do the one-hot encoding through parameters like max_cat_to_onehot.