API: numeric_only in Series ops behavior · Issue #47500 · pandas-dev/pandas (original) (raw)
Part of #46560
For Series and SeriesGroupBy ops (and perhaps others like resample, rolling, window, expanding, ewm), there is an inconsistency when numeric_only is passed to ops that have it as an argument:
ser = pd.Series([1, 2, 3])
print(ser.mean(numeric_only=True))
# Raises `NotImplementedError: Series.mean does not implement numeric_only.`
print(ser.mean(numeric_only=False))
# 2.0
print(ser.groupby([0, 0, 1]).mean(numeric_only=True))
# 0 1.5
# 1 3.0
# dtype: float64
print(ser.groupby([0, 0, 1]).mean(numeric_only=False))
# 0 1.5
# 1 3.0
# dtype: float64
I see three possible options:
- Ops raise NotImplementedError if numeric_only is truthy (the Series behavior)
- Ops ignore numeric_only when the dtype is numeric (the SeriesGroupBy behavior)
- Ops raise if anything is passed (I don't know of any ops that act like this currently)
I am in favor of 1 - I view passing in numeric_only=True
as erroneous; it doesn't make sense to request this from a Series even if it is a no-op. On the other hand, I view passing in numeric_only=False
as "requesting nothing", though perhaps it is slightly odd. Option 1 would also allow the default to be False across all ops (in pandas 2.0).