lightgbm.cv — LightGBM 4.6.0.99 documentation (original) (raw)
lightgbm.cv(params, train_set, num_boost_round=100, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, feval=None, init_model=None, fpreproc=None, seed=0, callbacks=None, eval_train_metric=False, return_cvbooster=False)[source]
Perform the cross-validation with given parameters.
Parameters:
- params (dict) – Parameters for training. Values passed through
params
take precedence over those supplied via arguments. - train_set (Dataset) – Data to be trained on.
- num_boost_round (int , optional ( default=100 )) – Number of boosting iterations.
- folds (generator or iterator of ( train_idx , test_idx ) tuples , scikit-learn splitter object or None , optional ( default=None )) – If generator or iterator, it should yield the train and test indices for each fold. If object, it should be one of the scikit-learn splitter classes (https://scikit-learn.org/stable/modules/classes.html#splitter-classes) and have
split
method. This argument has highest priority over other data split arguments. - nfold (int , optional ( default=5 )) – Number of folds in CV.
- stratified (bool , optional ( default=True )) – Whether to perform stratified sampling.
- shuffle (bool , optional ( default=True )) – Whether to shuffle before splitting data.
- metrics (str , list of str , or None , optional ( default=None )) – Evaluation metrics to be monitored while CV. If not None, the metric in
params
will be overridden. - feval (callable , list of callable , or None , optional ( default=None )) –
Customized evaluation function. Each evaluation function should accept two parameters: preds, eval_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.predsnumpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes]. If custom objective function is used, predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task in this case.
eval_dataDataset
A
Dataset
to evaluate.eval_namestr
The name of evaluation function (without whitespace).
eval_resultfloat
The eval result.
is_higher_betterbool
Is eval result higher better, e.g. AUC is
is_higher_better
.
To ignore the default metric corresponding to the used objective, setmetrics
to the string"None"
. - init_model (str , pathlib.Path , Booster or None , optional ( default=None )) – Filename of LightGBM model or Booster instance used for continue training.
- fpreproc (callable or None , optional ( default=None )) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those.
- seed (int , optional ( default=0 )) – Seed used to generate the folds (passed to numpy.random.seed).
- callbacks (list of callable , or None , optional ( default=None )) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
- eval_train_metric (bool , optional ( default=False )) – Whether to display the train metric in progress. The score of the metric is calculated again after each training step, so there is some impact on performance.
- return_cvbooster (bool , optional ( default=False )) – Whether to return Booster models trained on each fold through
CVBooster
.
Note
A custom objective function can be provided for the objective
parameter. It should accept two parameters: preds, train_data and return (grad, hess).
predsnumpy 1-D array or numpy 2-D array (for multi-class task)
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
train_dataDataset
The training dataset.
gradnumpy 1-D array or numpy 2-D array (for multi-class task)
The value of the first order derivative (gradient) of the loss with respect to the elements of preds for each sample point.
hessnumpy 1-D array or numpy 2-D array (for multi-class task)
The value of the second order derivative (Hessian) of the loss with respect to the elements of preds for each sample point.
For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes], and grad and hess should be returned in the same format.
Returns:
eval_results – History of evaluation results of each metric. The dictionary has the following format: {‘valid metric1-mean’: [values], ‘valid metric1-stdv’: [values], ‘valid metric2-mean’: [values], ‘valid metric2-stdv’: [values], …}. If return_cvbooster=True
, also returns trained boosters wrapped in a CVBooster
object via cvbooster
key. If eval_train_metric=True
, also returns the train metric history. In this case, the dictionary has the following format: {‘train metric1-mean’: [values], ‘valid metric1-mean’: [values], ‘train metric2-mean’: [values], ‘valid metric2-mean’: [values], …}.
Return type:
dict