API: numeric_only in Series ops behavior · Issue #47500 · pandas-dev/pandas (original) (raw)

Part of #46560

For Series and SeriesGroupBy ops (and perhaps others like resample, rolling, window, expanding, ewm), there is an inconsistency when numeric_only is passed to ops that have it as an argument:

ser = pd.Series([1, 2, 3])

print(ser.mean(numeric_only=True))
# Raises `NotImplementedError: Series.mean does not implement numeric_only.`

print(ser.mean(numeric_only=False))
# 2.0

print(ser.groupby([0, 0, 1]).mean(numeric_only=True))
# 0    1.5
# 1    3.0
# dtype: float64

print(ser.groupby([0, 0, 1]).mean(numeric_only=False))
# 0    1.5
# 1    3.0
# dtype: float64

I see three possible options:

  1. Ops raise NotImplementedError if numeric_only is truthy (the Series behavior)
  2. Ops ignore numeric_only when the dtype is numeric (the SeriesGroupBy behavior)
  3. Ops raise if anything is passed (I don't know of any ops that act like this currently)

I am in favor of 1 - I view passing in numeric_only=True as erroneous; it doesn't make sense to request this from a Series even if it is a no-op. On the other hand, I view passing in numeric_only=False as "requesting nothing", though perhaps it is slightly odd. Option 1 would also allow the default to be False across all ops (in pandas 2.0).