estimate_bandwidth (original) (raw)
sklearn.cluster.estimate_bandwidth(X, *, quantile=0.3, n_samples=None, random_state=0, n_jobs=None)[source]#
Estimate the bandwidth to use with the mean-shift algorithm.
This function takes time at least quadratic in n_samples
. For large datasets, it is wise to subsample by setting n_samples
. Alternatively, the parameter bandwidth
can be set to a small value without estimating it.
Parameters:
Xarray-like of shape (n_samples, n_features)
Input points.
quantilefloat, default=0.3
Should be between [0, 1] 0.5 means that the median of all pairwise distances is used.
n_samplesint, default=None
The number of samples to use. If not given, all samples are used.
random_stateint, RandomState instance, default=None
The generator used to randomly select the samples from input points for bandwidth estimation. Use an int to make the randomness deterministic. See Glossary.
n_jobsint, default=None
The number of parallel jobs to run for neighbors search.None
means 1 unless in a joblib.parallel_backend context.-1
means using all processors. See Glossaryfor more details.
Returns:
bandwidthfloat
The bandwidth parameter.
Examples
import numpy as np from sklearn.cluster import estimate_bandwidth X = np.array([[1, 1], [2, 1], [1, 0], ... [4, 7], [3, 5], [3, 6]]) estimate_bandwidth(X, quantile=0.5) np.float64(1.61...)