MissingIndicator (original) (raw)
class sklearn.impute.MissingIndicator(*, missing_values=nan, features='missing-only', sparse='auto', error_on_new=True)[source]#
Binary indicators for missing values.
Note that this component typically should not be used in a vanillaPipeline consisting of transformers and a classifier, but rather could be added using aFeatureUnion orColumnTransformer.
Read more in the User Guide.
Added in version 0.20.
Parameters:
missing_valuesint, float, str, np.nan or None, default=np.nan
The placeholder for the missing values. All occurrences ofmissing_values
will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values
should be set to np.nan
, since pd.NA
will be converted to np.nan
.
features{‘missing-only’, ‘all’}, default=’missing-only’
Whether the imputer mask should represent all or a subset of features.
- If
'missing-only'
(default), the imputer mask will only represent features containing missing values during fit time. - If
'all'
, the imputer mask will represent all features.
sparsebool or ‘auto’, default=’auto’
Whether the imputer mask format should be sparse or dense.
- If
'auto'
(default), the imputer mask will be of same type as input. - If
True
, the imputer mask will be a sparse matrix. - If
False
, the imputer mask will be a numpy array.
error_on_newbool, default=True
If True
, transform will raise an error when there are features with missing values that have no missing values infit. This is applicable only when features='missing-only'
.
Attributes:
**features_**ndarray of shape (n_missing_features,) or (n_features,)
The features indices which will be returned when callingtransform. They are computed during fit. Iffeatures='all'
, features_
is equal to range(n_features)
.
**n_features_in_**int
Number of features seen during fit.
Added in version 0.24.
**feature_names_in_**ndarray of shape (n_features_in_
,)
Names of features seen during fit. Defined only when X
has feature names that are all strings.
Added in version 1.0.
Examples
import numpy as np from sklearn.impute import MissingIndicator X1 = np.array([[np.nan, 1, 3], ... [4, 0, np.nan], ... [8, 1, 0]]) X2 = np.array([[5, 1, np.nan], ... [np.nan, 2, 3], ... [2, 4, 0]]) indicator = MissingIndicator() indicator.fit(X1) MissingIndicator() X2_tr = indicator.transform(X2) X2_tr array([[False, True], [ True, False], [False, False]])
Fit the transformer on X
.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
Input data, where n_samples
is the number of samples andn_features
is the number of features.
yIgnored
Not used, present for API consistency by convention.
Returns:
selfobject
Fitted estimator.
fit_transform(X, y=None)[source]#
Generate missing values indicator for X
.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
The input data to complete.
yIgnored
Not used, present for API consistency by convention.
Returns:
Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of Xt
will be boolean.
get_feature_names_out(input_features=None)[source]#
Get output feature names for transformation.
Parameters:
input_featuresarray-like of str or None, default=None
Input features.
- If
input_features
isNone
, thenfeature_names_in_
is used as feature names in. Iffeature_names_in_
is not defined, then the following input feature names are generated:["x0", "x1", ..., "x(n_features_in_ - 1)"]
. - If
input_features
is an array-like, theninput_features
must matchfeature_names_in_
iffeature_names_in_
is defined.
Returns:
feature_names_outndarray of str objects
Transformed feature names.
get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Returns:
routingMetadataRequest
A MetadataRequest encapsulating routing information.
get_params(deep=True)[source]#
Get parameters for this estimator.
Parameters:
deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
paramsdict
Parameter names mapped to their values.
set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output APIfor an example on how to use the API.
Parameters:
transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform
and fit_transform
.
"default"
: Default output format of a transformer"pandas"
: DataFrame output"polars"
: Polars outputNone
: Transform configuration is unchanged
Added in version 1.4: "polars"
option was added.
Returns:
selfestimator instance
Estimator instance.
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Parameters:
**paramsdict
Estimator parameters.
Returns:
selfestimator instance
Estimator instance.
Generate missing values indicator for X
.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)
The input data to complete.
Returns:
Xt{ndarray, sparse matrix} of shape (n_samples, n_features) or (n_samples, n_features_with_missing)
The missing indicator for input data. The data type of Xt
will be boolean.