make_column_selector (original) (raw)
class sklearn.compose.make_column_selector(pattern=None, *, dtype_include=None, dtype_exclude=None)[source]#
Create a callable to select columns to be used withColumnTransformer.
make_column_selector can select columns based on datatype or the columns name with a regex. When using multiple selection criteria, allcriteria must match for a column to be selected.
For an example of how to use make_column_selector within aColumnTransformer to select columns based on data type (i.e.dtype
), refer toColumn Transformer with Mixed Types.
Parameters:
patternstr, default=None
Name of columns containing this regex pattern will be included. If None, column selection will not be selected based on pattern.
dtype_includecolumn dtype or list of column dtypes, default=None
A selection of dtypes to include. For more details, seepandas.DataFrame.select_dtypes.
dtype_excludecolumn dtype or list of column dtypes, default=None
A selection of dtypes to exclude. For more details, seepandas.DataFrame.select_dtypes.
Returns:
selectorcallable
Callable for column selection to be used by aColumnTransformer.
See also
Class that allows combining the outputs of multiple transformer objects used on column subsets of the data into a single feature space.
Examples
from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import make_column_transformer from sklearn.compose import make_column_selector import numpy as np import pandas as pd
X = pd.DataFrame({'city': ['London', 'London', 'Paris', 'Sallisaw'], ... 'rating': [5, 3, 4, 5]})
ct = make_column_transformer( ... (StandardScaler(), ... make_column_selector(dtype_include=np.number)), # rating ... (OneHotEncoder(), ... make_column_selector(dtype_include=object))) # city ct.fit_transform(X)
array([[ 0.90453403, 1. , 0. , 0. ], [-1.50755672, 1. , 0. , 0. ], [-0.30151134, 0. , 1. , 0. ], [ 0.90453403, 0. , 0. , 1. ]])
Callable for column selection to be used by aColumnTransformer.
Parameters:
dfdataframe of shape (n_features, n_samples)
DataFrame to select columns from.