Tuning Guide — Extension for Scikit-learn* 2025.4 documentation (original) (raw)

Extension for Scikit-learn*

The performance of some algorithms changes based on the parameters that are used. This section denotes the details of such cases.

Refer to Supported Algorithms to see the full list of algorithms, parameters, and data formats supported in Extension for Scikit-learn*.

TSNE

TSNE algorithm consists of two components: KNN and Gradient Descent. The overall acceleration of TSNE depends on the acceleration of each of these algorithms.

The KNN part of the algorithm supports all parameters except:
- metric != ‘euclidean’ or ‘minkowski’ with p != 2
The Gradient Descent part of the algorithm supports all parameters except:
- n_components = 3
- method = ‘exact’
- verbose != 0

To get better performance, use parameters supported by both components.

Random Forest

Random Forest models accelerated with Extension for Scikit-learn* and using the hist splitting method discretize training data by creating a histogram with a configurable number of bins. The following keyword arguments can be used to influence the created histogram.

Keyword argument	Possible values	Default value	Description
maxBins	[0, inf)	256	Number of bins in the histogram with the discretized training data. The value 0 disables data discretization.
minBinSize	[1, inf)	5	Minimum number of training data points in each bin after discretization.
binningStrategy	quantiles, averages	quantiles	Selects the algorithm used to calculate bin edges. quantilesresults in bins with a similar amount of training data points. averagesdivides the range of values observed in the training data set into equal-width bins of size (max - min) / maxBins.

Note that using discretized training data can greatly accelerate model training times, especially for larger data sets. However, due to the reduced fidelity of the data, the resulting model can present worse performance metrics compared to a model trained on the original data. In such cases, the number of bins can be increased with the maxBins parameter, or binning can be disabled entirely by setting maxBins=0.