ENH: make Series.ptp() handle missing values · Issue #11163 · pandas-dev/pandas (original) (raw)
Currently (in master), Series.ptp()
is just implemented using np.ptp()
and so the method will return nan
for any Series that has one or more missing values:
>>> s = pd.Series([5, 0, np.nan, -3, 2])
>>> s.ptp()
nan
It is simple to write s.max() - s.min()
instead, but the ptp()
result is surprising as most pandas methods are designed to handle missing data gracefully. I think most users would expect the ptp()
method to ignore NaN
.
If there is any agreement as to whether ptp()
should be changed, I would like to work on a pull request!
Extending the idea, it might be useful to have both DataFrame.ptp()
and groupby.ptp()
methods.
For this example DataFrame...
df = pd.DataFrame({'a': [1, 2, 2, 1, 1],
'b': [3, 11, 72, 46, 32],
'c': [1.2, 6.7, 13.9, np.nan, -7.7],
'd': ['v', 'w', 'x', 'y', 'z']})
...I would expect the following behaviour:
>>> df.ptp()
a 1
b 69
c 12.7
dtype: float64
>>> df.ptp(axis=1)
0 2.0
1 9.0
2 70.0
3 45.0
4 39.7
dtype: float64
>>> df.groupby('a').ptp()
b c
a
1 43 8.9
2 61 7.2
Again, if there is any consensus from the community on whether these additional methods should be added, I'd be happy to work on the pull request.