MultiLabelBinarizer (original) (raw)
class sklearn.preprocessing.MultiLabelBinarizer(*, classes=None, sparse_output=False)[source]#
Transform between iterable of iterables and a multilabel format.
Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.
Parameters:
classesarray-like of shape (n_classes,), default=None
Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).
sparse_outputbool, default=False
Set to True if output binary array is desired in CSR sparse format.
Attributes:
**classes_**ndarray of shape (n_classes,)
A copy of the classes
parameter when provided. Otherwise it corresponds to the sorted set of classes found when fitting.
See also
Encode categorical features using a one-hot aka one-of-K scheme.
Examples
from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer() mlb.fit_transform([(1, 2), (3,)]) array([[1, 1, 0], [0, 0, 1]]) mlb.classes_ array([1, 2, 3])
mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}]) array([[0, 1, 1], [1, 0, 0]]) list(mlb.classes_) ['comedy', 'sci-fi', 'thriller']
A common mistake is to pass in a list, which leads to the following issue:
mlb = MultiLabelBinarizer() mlb.fit(['sci-fi', 'thriller', 'comedy']) MultiLabelBinarizer() mlb.classes_ array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't', 'y'], dtype=object)
To correct this, the list of labels should be passed in as:
mlb = MultiLabelBinarizer() mlb.fit([['sci-fi', 'thriller', 'comedy']]) MultiLabelBinarizer() mlb.classes_ array(['comedy', 'sci-fi', 'thriller'], dtype=object)
Fit the label sets binarizer, storing classes_.
Parameters:
yiterable of iterables
A set of labels (any orderable and hashable object) for each sample. If the classes
parameter is set, y
will not be iterated.
Returns:
selfobject
Fitted estimator.
Fit the label sets binarizer and transform the given label sets.
Parameters:
yiterable of iterables
A set of labels (any orderable and hashable object) for each sample. If the classes
parameter is set, y
will not be iterated.
Returns:
y_indicator{ndarray, sparse matrix} of shape (n_samples, n_classes)
A matrix such that y_indicator[i, j] = 1
iff classes_[j]
is in y[i]
, and 0 otherwise. Sparse matrix will be of CSR format.
get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Returns:
routingMetadataRequest
A MetadataRequest encapsulating routing information.
get_params(deep=True)[source]#
Get parameters for this estimator.
Parameters:
deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
paramsdict
Parameter names mapped to their values.
inverse_transform(yt)[source]#
Transform the given indicator matrix into label sets.
Parameters:
yt{ndarray, sparse matrix} of shape (n_samples, n_classes)
A matrix containing only 1s ands 0s.
Returns:
ylist of tuples
The set of labels for each sample such that y[i]
consists ofclasses_[j]
for each yt[i, j] == 1
.
set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output APIfor an example on how to use the API.
Parameters:
transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform
and fit_transform
.
"default"
: Default output format of a transformer"pandas"
: DataFrame output"polars"
: Polars outputNone
: Transform configuration is unchanged
Added in version 1.4: "polars"
option was added.
Returns:
selfestimator instance
Estimator instance.
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Parameters:
**paramsdict
Estimator parameters.
Returns:
selfestimator instance
Estimator instance.
Transform the given label sets.
Parameters:
yiterable of iterables
A set of labels (any orderable and hashable object) for each sample. If the classes
parameter is set, y
will not be iterated.
Returns:
y_indicatorarray or CSR matrix, shape (n_samples, n_classes)
A matrix such that y_indicator[i, j] = 1
iff classes_[j]
is iny[i]
, and 0 otherwise.