CatBoost Parameters and Hyperparameters (original) (raw)

Last Updated : 3 Nov, 2025

CatBoost (Categorical Boosting) is a useful machine learning algorithm based on gradient boosting that can handle both numerical and categorical data efficiently. It builds a series of decision trees, where each new tree helps correct the mistakes made by the previous ones. This step-by-step improvement makes CatBoost highly accurate and reliable for different types of prediction tasks.

Let's understand CatBoost parameters and Hyperparameter Tuning.

1. CatBoost Parameters

Model parameters are internal configurations that the model learns during training. They define how the trees split and how leaf values are adjusted.

**Important Parameters:

2. CatBoost Hyperparameters

Hyperparameters are defined before training and govern how the algorithm behaves. Choosing the right combination of these directly affects model performance, generalization and training time.

**Categories of Hyperparameter:

**Important Hyperparameters:

3. Hyperparameter Tuning

Hyperparameter tuning is the process of finding the most effective set of parameters to maximize model accuracy and minimize errors.

**Steps for Tuning:

  1. **Define Search Space: Specify possible ranges for parameters such as learning_rate ∈ [0.01, 0.3] or depth ∈ [4, 10].
  2. **Set Objective Function: Choose a performance metric to optimize, like accuracy, AUC or RMSE.
  3. **Choose Search Strategy: Use techniques such as grid search, random search, Bayesian optimization or Optuna.
  4. **Run and Evaluate: Train multiple model configurations, compare performance and select the best-performing combination.

Implementation

Step 1: Installation

Python `

!pip install catboost

`

Step 2: Importing Libraries

We will import the necessary libraries like pandas, scikit learn and catboost.

Python `

import pandas as pd from sklearn.model_selection import train_test_split from catboost import CatBoostClassifier, Pool, cv from sklearn.metrics import accuracy_score

`

Step 3: Load and Prepare Data

We will use the IRIS dataset here.

You can download dataset from here.

Python `

data = pd.read_csv("iris.csv")

X = data.drop('class', axis=1) y = data['class']

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)

`

Step 4: Train CatBoost Model

We will train the model.

Python `

model = CatBoostClassifier(iterations=500, learning_rate=0.1, depth=6, loss_function='MultiClass', random_state=42, verbose=0) model.fit(X_train, y_train)

`

Step 5: Evaluation

We will evaluate the model.

Python `

y_pred = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%")

`

**Output:

Accuracy: 100.00%

Step 6: Cross-Validation for Robust Evaluation

Python `

pool = Pool(X, label=y) params = {'iterations': 1000, 'learning_rate': 0.01, 'depth': 3, 'loss_function': 'MultiClass', 'random_seed': 42}

cv_results, cv_models = cv(pool=pool, params=params, fold_count=5, verbose=False, return_models=True)

print(cv_results.head())

`

**Output:

Step 7: Mean Loss

We will check mean loss for an example.

Python `

mean_loss = cv_results['test-MultiClass-mean'].iloc[-1] print(f"Mean Loss: {mean_loss * 100:.2f}%")

`

**Output:

Mean Loss: 14.60%

We can see our model is working fine.

Advantages

Limitations