OutputCodeClassifier (original) (raw)

class sklearn.multiclass.OutputCodeClassifier(estimator, *, code_size=1.5, random_state=None, n_jobs=None)[source]#

(Error-Correcting) Output-Code multiclass strategy.

Output-code based strategies consist in representing each class with a binary code (an array of 0s and 1s). At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen. The main advantage of these strategies is that the number of classifiers used can be controlled by the user, either for compressing the model (0 < code_size < 1) or for making the model more robust to errors (code_size > 1). See the documentation for more details.

Read more in the User Guide.

Parameters:

estimatorestimator object

An estimator object implementing fit and one ofdecision_function or predict_proba.

code_sizefloat, default=1.5

Percentage of the number of classes to be used to create the code book. A number between 0 and 1 will require fewer classifiers than one-vs-the-rest. A number greater than 1 will require more classifiers than one-vs-the-rest.

random_stateint, RandomState instance, default=None

The generator used to initialize the codebook. Pass an int for reproducible output across multiple function calls. See Glossary.

n_jobsint, default=None

The number of jobs to use for the computation: the multiclass problems are computed in parallel.

None means 1 unless in a joblib.parallel_backend context.-1 means using all processors. See Glossaryfor more details.

Attributes:

**estimators_**list of int(n_classes * code_size) estimators

Estimators used for predictions.

**classes_**ndarray of shape (n_classes,)

Array containing labels.

**code_book_**ndarray of shape (n_classes, len(estimators_))

Binary array containing the code of each class.

**n_features_in_**int

Number of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

Added in version 0.24.

**feature_names_in_**ndarray of shape (n_features_in_,)

Names of features seen during fit. Only defined if the underlying estimator exposes such an attribute when fit.

Added in version 1.0.

References

[1]

“Solving multiclass learning problems via error-correcting output codes”, Dietterich T., Bakiri G., Journal of Artificial Intelligence Research 2, 1995.

[2]

“The error coding method and PICTs”, James G., Hastie T., Journal of Computational and Graphical statistics 7, 1998.

[3]

“The Elements of Statistical Learning”, Hastie T., Tibshirani R., Friedman J., page 606 (second-edition) 2008.

Examples

from sklearn.multiclass import OutputCodeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=100, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) clf = OutputCodeClassifier( ... estimator=RandomForestClassifier(random_state=0), ... random_state=0).fit(X, y) clf.predict([[0, 0, 0, 0]]) array([1])

fit(X, y, **fit_params)[source]#

Fit underlying estimators.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

Data.

yarray-like of shape (n_samples,)

Multi-class targets.

**fit_paramsdict

Parameters passed to the estimator.fit method of each sub-estimator.

Added in version 1.4: Only available if enable_metadata_routing=True. SeeMetadata Routing User Guide for more details.

Returns:

selfobject

Returns a fitted instance of self.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Added in version 1.4.

Returns:

routingMetadataRouter

A MetadataRouter encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict multi-class targets using underlying estimators.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

Data.

Returns:

yndarray of shape (n_samples,)

Predicted multi-class targets.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:

scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict

Estimator parameters.

Returns:

selfestimator instance

Estimator instance.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OutputCodeClassifier [source]#

Request metadata passed to the score method.

Note that this method is only relevant ifenable_metadata_routing=True (see sklearn.set_config). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside aPipeline. Otherwise it has no effect.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:

selfobject

The updated object.

Gallery examples#