API: boolean dtype upsets cumsum · Issue #4170 · pandas-dev/pandas (original) (raw)
cumsum seems to require skipna=False otherwise it sulks here. (Not investigate which others are also affected, cumprod is though).
In [10]: b = pd.Series([False, False, False, True, True, False, False])
In [11]: b
Out[11]:
0 False
1 False
2 False
3 True
4 True
5 False
6 False
dtype: bool
In [12]: b.cumsum()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-f3f684a93525> in <module>()
----> 1 b.cumsum()
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.pyc in cumsum(self, axis, dtype, out, skipna)
1626
1627 if do_mask:
-> 1628 np.putmask(result, mask, pa.NA)
1629
1630 return Series(result, index=self.index)
ValueError: cannot convert float NaN to integer
In [13]: b.cumsum(skipna=False)
Out[13]:
0 0
1 0
2 0
3 1
4 2
5 2
6 2
dtype: int64
If it has nans or you int or object it works as expected:
In [21]: b.astype(int).cumsum()
In [22]: b.astype(object).cumsum() # False at the beginning is expected
In [23]: b.astype(int).astype(object).cumsum()
Also, if you try and inset an nan it doesn't work nor raise (!):
In [31]: b.loc[0] = np.nan
In [32]: b
Out[32]:
0 True
1 False
2 False
3 True
4 True
5 False
6 False
dtype: bool