ENH: add BooleanArray extension array by jorisvandenbossche · Pull Request #29555 · pandas-dev/pandas (original) (raw)
Other question: what do we want pd.array([..])
to return by default (when not specifying a dtype)?
Currently, you get this behavour:
In [9]: pd.array([True, False])
Out[9]:
<PandasArray>
[True, False]
Length: 2, dtype: bool
In [10]: pd.array([True, False, np.nan])
Out[10]:
<PandasArray>
[1.0, 0.0, nan]
Length: 3, dtype: float64
In [12]: pd.array([True, False, pd.NA])
Out[12]:
<PandasArray>
[True, False, NA]
Length: 3, dtype: object
Do we want to return BooleanArray also when there are no missing values (which currently returns boolean PandasArray)?
Eg for integer, we also do not return the IntegerArray (I think the "rule" now is to basically have the same default behaviour as pandas, so only infer the extensions dtypes that are used by default (datetimetz, period, interval, etc). So if we want to be consistent with that, we should not return BooleanArray.
When a pd.NA
is present, that's of course something new, and we could in principle decide to already infer the new dtype in that case (although then the inference of the dtype depends on the presence of a missing value or not ..) Although I am not sure this is technically easy to do, as the infer_dtypes does not distinguish different types of NA/NaN at the moment (and this out of scope of this PR, could be a future enhancement).