Have pd.array
infer new extension types · Issue #29791 · pandas-dev/pandas (original) (raw)
Currently pd.array
sometimes requires an explicit dtype=...
to get one of our extension arrays (we'll infer for Period, Datetime, and Interval).
This proposal is to have it infer the extension type for
- strings -> StringArray
- boolean -> BooleanArray
- integer -> IntegerArray
All of these currently return PandasArray.
Concretely, we'll need to teach infer_dtype
how not to infer mixed
for a mix of strings / booleans and NA values, similar to how it handles integer-na
In [27]: lib.infer_dtype([True, None], skipna=False) Out[27]: 'mixed'
In [28]: lib.infer_dtype(['a', None], skipna=False) Out[28]: 'mixed'
In [29]: lib.infer_dtype([0, np.nan], skipna=False) Out[29]: 'integer-na'
and then handle those in array
.