API: boolean dtype upsets cumsum · Issue #4170 · pandas-dev/pandas (original) (raw)

cumsum seems to require skipna=False otherwise it sulks here. (Not investigate which others are also affected, cumprod is though).

In [10]: b = pd.Series([False, False, False, True, True, False, False])

In [11]: b
Out[11]:
0    False
1    False
2    False
3     True
4     True
5    False
6    False
dtype: bool

In [12]: b.cumsum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-f3f684a93525> in <module>()
----> 1 b.cumsum()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.pyc in cumsum(self, axis, dtype, out, skipna)
   1626
   1627         if do_mask:
-> 1628             np.putmask(result, mask, pa.NA)
   1629
   1630         return Series(result, index=self.index)

ValueError: cannot convert float NaN to integer

In [13]: b.cumsum(skipna=False)
Out[13]:
0    0
1    0
2    0
3    1
4    2
5    2
6    2
dtype: int64

If it has nans or you int or object it works as expected:

In [21]: b.astype(int).cumsum()
In [22]: b.astype(object).cumsum()  # False at the beginning is expected
In [23]: b.astype(int).astype(object).cumsum()

Also, if you try and inset an nan it doesn't work nor raise (!):

In [31]: b.loc[0] = np.nan

In [32]: b
Out[32]:
0     True
1    False
2    False
3     True
4     True
5    False
6    False
dtype: bool