API: take interface for (Extension)Array-likes · Issue #20640 · pandas-dev/pandas (original) (raw)
Triggered by #20582, I was looking at the take
implementation in ExtensionArray and Categorical (which is already an ExtensionArray subclass) and in the rest of pandas:
ExtensionArray.take
currently uses the "internal pandas"-like behaviour for take:-1
is an indicator for missing value (the behaviour we need for reindexing etc)Series.take
actually uses the numpy behaviour, where negative values (including-1
) start counting from the end of the array-like.
To illustrate the difference with a small example:
In [9]: pd.Categorical(['a', 'b', 'c']).take([0, -1])
Out[9]:
[a, NaN]
Categories (3, object): [a, b, c]
In [10]: pd.Series(['a', 'b', 'c']).take([0, -1])
Out[10]:
0 a
1 c
dtype: object
This difference is a bit unfortunate IMO. If ExtensionArray.take
is a public method (which it is right now), it would be nice if it has consistent behaviour with Series.take
.
If we agree on that, I was thinking about following options:
- make
ExtensionArray.take
private for now (eg require a_take
method for the interface) and keep the "internal pandas"-like behaviour - make
ExtensionArray.take
default behaviour consistent withSeries.take
, but still have theallow_fill
/fill_value
arguments so that when they are specified it has the "internal pandas"-like behavour (so that internal code that expects this behaviour which already passes those keywords keeps working)