make_column_selector (original) (raw)

class sklearn.compose.make_column_selector(pattern=None, *, dtype_include=None, dtype_exclude=None)[source]#

Create a callable to select columns to be used withColumnTransformer.

make_column_selector can select columns based on datatype or the columns name with a regex. When using multiple selection criteria, allcriteria must match for a column to be selected.

For an example of how to use make_column_selector within aColumnTransformer to select columns based on data type (i.e.dtype), refer toColumn Transformer with Mixed Types.

Parameters:

patternstr, default=None

Name of columns containing this regex pattern will be included. If None, column selection will not be selected based on pattern.

dtype_includecolumn dtype or list of column dtypes, default=None

A selection of dtypes to include. For more details, seepandas.DataFrame.select_dtypes.

dtype_excludecolumn dtype or list of column dtypes, default=None

A selection of dtypes to exclude. For more details, seepandas.DataFrame.select_dtypes.

Returns:

selectorcallable

Callable for column selection to be used by aColumnTransformer.

See also

ColumnTransformer

Class that allows combining the outputs of multiple transformer objects used on column subsets of the data into a single feature space.

Examples

from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import make_column_transformer from sklearn.compose import make_column_selector import numpy as np import pandas as pd
X = pd.DataFrame({'city': ['London', 'London', 'Paris', 'Sallisaw'], ... 'rating': [5, 3, 4, 5]})
ct = make_column_transformer( ... (StandardScaler(), ... make_column_selector(dtype_include=np.number)), # rating ... (OneHotEncoder(), ... make_column_selector(dtype_include=object))) # city ct.fit_transform(X)
array([[ 0.90453403, 1. , 0. , 0. ], [-1.50755672, 1. , 0. , 0. ], [-0.30151134, 0. , 1. , 0. ], [ 0.90453403, 0. , 0. , 1. ]])

__call__(df)[source]#

Callable for column selection to be used by aColumnTransformer.

Parameters:

dfdataframe of shape (n_features, n_samples)

DataFrame to select columns from.