MissingIndicator (original) (raw)

class sklearn.impute.MissingIndicator(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]#

Binary indicators for missing values.

Note that this component typically should not be used in a vanillaPipeline consisting of transformers and a classifier, but rather could be added using aFeatureUnion orColumnTransformer.

Read more in the User Guide.

Added in version 0.20.

Parameters:

missing_valuesint, float, str, np.nan or None, default=np.nan

The placeholder for the missing values. All occurrences ofmissing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_valuesshould be set to np.nan, since pd.NA will be converted to np.nan.

features{‘missing-only’, ‘all’}, default=’missing-only’

Whether the imputer mask should represent all or a subset of features.

If 'missing-only' (default), the imputer mask will only represent features containing missing values during fit time.
If 'all', the imputer mask will represent all features.

sparsebool or ‘auto’, default=’auto’

Whether the imputer mask format should be sparse or dense.

If 'auto' (default), the imputer mask will be of same type as input.
If True, the imputer mask will be a sparse matrix.
If False, the imputer mask will be a numpy array.

error_on_newbool, default=True

If True, transform will raise an error when there are features with missing values that have no missing values infit. This is applicable only when features='missing-only'.

Attributes:

**features_**ndarray of shape (n_missing_features,) or (n_features,)

The features indices which will be returned when callingtransform. They are computed during fit. Iffeatures='all', features_ is equal to range(n_features).

**n_features_in_**int

Number of features seen during fit.

Added in version 0.24.

**feature_names_in_**ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when Xhas feature names that are all strings.

Added in version 1.0.

Examples

import numpy as np from sklearn.impute import MissingIndicator X1 = np.array([[np.nan, 1, 3], ... [4, 0, np.nan], ... [8, 1, 0]]) X2 = np.array([[5, 1, np.nan], ... [np.nan, 2, 3], ... [2, 4, 0]]) indicator = MissingIndicator() indicator.fit(X1) MissingIndicator() X2_tr = indicator.transform(X2) X2_tr array([[False, True], [ True, False], [False, False]])

fit(X, y=None)[source]#

Fit the transformer on X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

Input data, where n_samples is the number of samples andn_features is the number of features.

yIgnored

Not used, present for API consistency by convention.

Returns:

selfobject

Fitted estimator.

fit_transform(X, y=None)[source]#

Generate missing values indicator for X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

The input data to complete.

yIgnored

Not used, present for API consistency by convention.

Returns:

Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)

The missing indicator for input data. The data type of Xtwill be boolean.

get_feature_names_out(input_features=None)[source]#

Get output feature names for transformation.

Parameters:

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns:

feature_names_outndarray of str objects

Transformed feature names.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output APIfor an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged

Added in version 1.4: "polars" option was added.

Returns:

selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict

Estimator parameters.

Returns:

selfestimator instance

Estimator instance.

transform(X)[source]#

Generate missing values indicator for X.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features)

The input data to complete.

Returns:

Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)

The missing indicator for input data. The data type of Xtwill be boolean.