ENH: make Series.ptp() handle missing values (original) (raw)

Currently (in master), Series.ptp() is just implemented using np.ptp() and so the method will return nan for any Series that has one or more missing values:

>>> s = pd.Series([5, 0, np.nan, -3, 2])
>>> s.ptp()
nan

It is simple to write s.max() - s.min() instead, but the ptp() result is surprising as most pandas methods are designed to handle missing data gracefully. I think most users would expect the ptp() method to ignore NaN.

If there is any agreement as to whether ptp() should be changed, I would like to work on a pull request!

Extending the idea, it might be useful to have both DataFrame.ptp() and groupby.ptp() methods.

For this example DataFrame...

df = pd.DataFrame({'a': [1, 2, 2, 1, 1],
                   'b': [3, 11, 72, 46, 32],
                   'c': [1.2, 6.7, 13.9, np.nan, -7.7],
                   'd': ['v', 'w', 'x', 'y', 'z']})

...I would expect the following behaviour:

>>> df.ptp()
a      1
b     69
c   12.7
dtype: float64

>>> df.ptp(axis=1)
0     2.0
1     9.0
2    70.0
3    45.0
4    39.7
dtype: float64

>>> df.groupby('a').ptp()
    b    c
a         
1  43  8.9
2  61  7.2

Again, if there is any consensus from the community on whether these additional methods should be added, I'd be happy to work on the pull request.