lightgbm.cv — LightGBM 4.6.0.99 documentation (original) (raw)

lightgbm.cv(params, train_set, num_boost_round=100, folds=None, nfold=5, stratified=True, shuffle=True, metrics=None, feval=None, init_model=None, fpreproc=None, seed=0, callbacks=None, eval_train_metric=False, return_cvbooster=False)[source]

Perform the cross-validation with given parameters.

Parameters:

params (dict) – Parameters for training. Values passed through params take precedence over those supplied via arguments.
train_set (Dataset) – Data to be trained on.
num_boost_round (int , optional ( default=100 )) – Number of boosting iterations.
folds (generator or iterator of ( train_idx , test_idx ) tuples , scikit-learn splitter object or None , optional ( default=None )) – If generator or iterator, it should yield the train and test indices for each fold. If object, it should be one of the scikit-learn splitter classes (https://scikit-learn.org/stable/modules/classes.html#splitter-classes) and have split method. This argument has highest priority over other data split arguments.
nfold (int , optional ( default=5 )) – Number of folds in CV.
stratified (bool , optional ( default=True )) – Whether to perform stratified sampling.
shuffle (bool , optional ( default=True )) – Whether to shuffle before splitting data.
metrics (str , list of str , or None , optional ( default=None )) – Evaluation metrics to be monitored while CV. If not None, the metric in params will be overridden.
feval (callable , list of callable , or None , optional ( default=None )) –
Customized evaluation function. Each evaluation function should accept two parameters: preds, eval_data, and return (eval_name, eval_result, is_higher_better) or list of such tuples.

predsnumpy 1-D array or numpy 2-D array (for multi-class task)

The predicted values. For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes]. If custom objective function is used, predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task in this case.

eval_dataDataset

A Dataset to evaluate.

eval_namestr

The name of evaluation function (without whitespace).

eval_resultfloat

The eval result.

is_higher_betterbool

Is eval result higher better, e.g. AUC is is_higher_better.
To ignore the default metric corresponding to the used objective, set metrics to the string "None".
init_model (str , pathlib.Path , Booster or None , optional ( default=None )) – Filename of LightGBM model or Booster instance used for continue training.
fpreproc (callable or None , optional ( default=None )) – Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those.
seed (int , optional ( default=0 )) – Seed used to generate the folds (passed to numpy.random.seed).
callbacks (list of callable , or None , optional ( default=None )) – List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
eval_train_metric (bool , optional ( default=False )) – Whether to display the train metric in progress. The score of the metric is calculated again after each training step, so there is some impact on performance.
return_cvbooster (bool , optional ( default=False )) – Whether to return Booster models trained on each fold through CVBooster.

Note

A custom objective function can be provided for the objective parameter. It should accept two parameters: preds, train_data and return (grad, hess).

predsnumpy 1-D array or numpy 2-D array (for multi-class task)

The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.

train_dataDataset

The training dataset.

gradnumpy 1-D array or numpy 2-D array (for multi-class task)

The value of the first order derivative (gradient) of the loss with respect to the elements of preds for each sample point.

hessnumpy 1-D array or numpy 2-D array (for multi-class task)

The value of the second order derivative (Hessian) of the loss with respect to the elements of preds for each sample point.

For multi-class task, preds are numpy 2-D array of shape = [n_samples, n_classes], and grad and hess should be returned in the same format.

Returns:

eval_results – History of evaluation results of each metric. The dictionary has the following format: {‘valid metric1-mean’: [values], ‘valid metric1-stdv’: [values], ‘valid metric2-mean’: [values], ‘valid metric2-stdv’: [values], …}. If return_cvbooster=True, also returns trained boosters wrapped in a CVBooster object via cvbooster key. If eval_train_metric=True, also returns the train metric history. In this case, the dictionary has the following format: {‘train metric1-mean’: [values], ‘valid metric1-mean’: [values], ‘train metric2-mean’: [values], ‘valid metric2-mean’: [values], …}.

Return type:

dict