pd.expanding is incorrectly calculating window size when axis=1 · Issue #13753 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [0, 1, 2, np.nan, 4], 
                   'B': [0, 1, 2, np.nan, 4], 
                   'C': [0, 1, 2, np.nan, 4], 
                   'D': [0, 1, 2, np.nan, 4], 
                   'E': [0, 1, 2, np.nan, 4], 
                   'F': [0, 1, 2, np.nan, 4]})

print df.expanding(axis=1).sum()

Expected Output

     A    B     C     D     E     F
0  0.0  0.0   0.0   0.0   0.0   0.0
1  1.0  2.0   3.0   4.0   5.0   5.0
2  2.0  4.0   6.0   8.0  10.0  10.0
3  NaN  NaN   NaN   NaN   NaN   NaN
4  4.0  8.0  12.0  16.0  20.0  20.0

However, the correct result should be:

     A    B     C     D     E     F
0  0.0  0.0   0.0   0.0   0.0   0.0
1  1.0  2.0   3.0   4.0   5.0   6.0
2  2.0  4.0   6.0   8.0  10.0  12.0
3  NaN  NaN   NaN   NaN   NaN   NaN
4  4.0  8.0  12.0  16.0  20.0  24.0

Notice that the last column E is different. I've tracked this down and found that the _get_window function (for expanding) fails to return the correct number of windows when the following conditions are met:

  1. axis=1 is used instead of axis=0 (default)
  2. The number of rows in the dataframe is less than the number of columns

This is caused by the fact that the object is using len(obj) in determining the window size. Instead, it should be using obj.shape[self.axis]

output of pd.show_versions()


commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.18.1+237.ge357ea1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: 0.7.0
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: None
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: 0.9
apiclient: 1.4.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.0