How to stop a rolling window at nan values and continue after it? · Issue #35596 · pandas-dev/pandas (original) (raw)

Posted the same question on stackoverflow. A user there said I should open a issue here on the github page, since it is a bug.

I have the following dataframe:

     df = pd.DataFrame([[0, 1, 2, 4, np.nan, np.nan, np.nan],
                   [0, 1, 2 ,np.nan, np.nan, np.nan,np.nan],
                   [0, 2, 2 ,np.nan, 2, np.nan,1]])

With output:

       0  1  2    3    4   5   6
    0  0  1  2  4.0  NaN NaN NaN
    1  0  1  2  NaN  NaN NaN NaN
    2  0  2  2  NaN  2.0 NaN 1.0

with dtypes:
df.dtypes

    0      int64
    1      int64
    2      int64
    3    float64
    4    float64
    5    float64
    6    float64
    dtype: object

Then the underneath rolling summation is applied:

df.rolling(window = 7, min_periods =1, axis = 'columns').sum()

And the output is as follows:

         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  4.0  4.0  4.0  4.0
    1  0.0  1.0  3.0  NaN  NaN  NaN  NaN
    2  0.0  2.0  4.0  NaN  2.0  2.0  3.0

I notice that the rolling window stops and starts again whenever the dtype of the next column is different.

I however have a dataframe whereby all columns are of the same object type.
`` df = df.astype('object')``

which has output:

         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  7.0  7.0  7.0  7.0
    1  0.0  1.0  3.0  3.0  3.0  3.0  3.0
    2  0.0  2.0  4.0  4.0  6.0  6.0  7.0

My desired output however, stops and starts again after a nan value appears. This would look like:


         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  7.0  NaN  NaN  NaN
    1  0.0  1.0  3.0  NaN  NaN  NaN  NaN
    2  0.0  2.0  4.0  NaN  2.0  NaN  3.0

I figured there must be a way that NaN values are not considered but also not filled in with values obtained from the rolling window.

Anything would help!