pandas.Series.convert_dtypes — pandas 3.0.1 documentation (original) (raw)

Series.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy_nullable')[source]#

Convert columns from numpy dtypes to the best dtypes that support pd.NA.

Parameters:

infer_objectsbool, default True

Whether object dtypes should be converted to the best possible types.

convert_stringbool, default True

Whether object dtypes should be converted to StringDtype().

convert_integerbool, default True

Whether, if possible, conversion can be done to integer extension types.

convert_booleanbool, defaults True

Whether object dtypes should be converted to BooleanDtypes().

convert_floatingbool, defaults True

Whether, if possible, conversion can be done to floating extension types. If convert_integer is also True, preference will be give to integer dtypes if the floats can be faithfully casted to integers.

dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’

Back-end data type applied to the resultant DataFrame orSeries (still experimental). Behaviour is as follows:

"numpy_nullable": returns nullable-dtype-backedDataFrame or Serires.
"pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame or Series.

Added in version 2.0.

Returns:

Series or DataFrame

Copy of input object with new dtype.

Notes

By default, convert_dtypes will attempt to convert a Series (or each Series in a DataFrame) to dtypes that support pd.NA. By using the optionsconvert_string, convert_integer, convert_boolean andconvert_floating, it is possible to turn off individual conversions to StringDtype, the integer extension types, BooleanDtypeor floating extension types, respectively.

For object-dtyped columns, if infer_objects is True, use the inference rules as during normal Series/DataFrame construction. Then, if possible, convert to StringDtype, BooleanDtype or an appropriate integer or floating extension type, otherwise leave as object.

If the dtype is integer, convert to an appropriate integer extension type.

If the dtype is numeric, and consists of all integers, convert to an appropriate integer extension type. Otherwise, convert to an appropriate floating extension type.

In the future, as new dtypes are added that support pd.NA, the results of this method will change to support those new dtypes.

Examples

df = pd.DataFrame( ... { ... "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")), ... "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")), ... "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")), ... "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")), ... "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")), ... "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")), ... } ... )

Start with a DataFrame with default dtypes.

df a b c d e f 0 1 x True h 10.0 NaN 1 2 y False i NaN 100.5 2 3 z NaN NaN 20.0 200.0

df.dtypes a int32 b object c object d object e float64 f float64 dtype: object

Convert the DataFrame to use best possible dtypes.

dfn = df.convert_dtypes() dfn a b c d e f 0 1 x True h 10 1 2 y False i 100.5 2 3 z 20 200.0

dfn.dtypes a Int32 b string c boolean d string e Int64 f Float64 dtype: object

Start with a Series of strings and missing data represented by np.nan.

s = pd.Series(["a", "b", np.nan]) s 0 a 1 b 2 NaN dtype: str

Obtain a Series with dtype StringDtype.

s.convert_dtypes() 0 a 1 b 2 dtype: string