pandas.get_dummies — pandas 2.2.3 documentation (original) (raw)
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)[source]#
Convert categorical variable into dummy/indicator variables.
Each variable is converted in as many 0/1 variables as there are different values. Columns in the output are each named after a value; if the input is a DataFrame, the name of the original variable is prepended to the value.
Parameters:
dataarray-like, Series, or DataFrame
Data of which to get dummy indicators.
prefixstr, list of str, or dict of str, default None
String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternatively, prefixcan be a dictionary mapping column names to prefixes.
prefix_sepstr, default ‘_’
If appending prefix, separator/delimiter to use. Or pass a list or dictionary as with prefix.
dummy_nabool, default False
Add a column to indicate NaNs, if False NaNs are ignored.
columnslist-like, default None
Column names in the DataFrame to be encoded. If columns is None then all the columns withobject, string, or category dtype will be converted.
sparsebool, default False
Whether the dummy-encoded columns should be backed by a SparseArray
(True) or a regular NumPy array (False).
drop_firstbool, default False
Whether to get k-1 dummies out of k categorical levels by removing the first level.
dtypedtype, default bool
Data type for new columns. Only a single dtype is allowed.
Returns:
DataFrame
Dummy-coded data. If data contains other columns than the dummy-coded one(s), these will be prepended, unaltered, to the result.
Notes
Reference the user guide for more examples.
Examples
s = pd.Series(list('abca'))
pd.get_dummies(s) a b c 0 True False False 1 False True False 2 False False True 3 True False False
s1 = ['a', 'b', np.nan]
pd.get_dummies(s1) a b 0 True False 1 False True 2 False False
pd.get_dummies(s1, dummy_na=True) a b NaN 0 True False False 1 False True False 2 False False True
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], ... 'C': [1, 2, 3]})
pd.get_dummies(df, prefix=['col1', 'col2']) C col1_a col1_b col2_a col2_b col2_c 0 1 True False False True False 1 2 False True True False False 2 3 True False False False True
pd.get_dummies(pd.Series(list('abcaa'))) a b c 0 True False False 1 False True False 2 False False True 3 True False False 4 True False False
pd.get_dummies(pd.Series(list('abcaa')), drop_first=True) b c 0 False False 1 True False 2 False True 3 False False 4 False False
pd.get_dummies(pd.Series(list('abc')), dtype=float) a b c 0 1.0 0.0 0.0 1 0.0 1.0 0.0 2 0.0 0.0 1.0