pd.expanding is incorrectly calculating window size when axis=1 · Issue #13753 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [0, 1, 2, np.nan, 4],
'B': [0, 1, 2, np.nan, 4],
'C': [0, 1, 2, np.nan, 4],
'D': [0, 1, 2, np.nan, 4],
'E': [0, 1, 2, np.nan, 4],
'F': [0, 1, 2, np.nan, 4]})
print df.expanding(axis=1).sum()
Expected Output
A B C D E F
0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 3.0 4.0 5.0 5.0
2 2.0 4.0 6.0 8.0 10.0 10.0
3 NaN NaN NaN NaN NaN NaN
4 4.0 8.0 12.0 16.0 20.0 20.0
However, the correct result should be:
A B C D E F
0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 3.0 4.0 5.0 6.0
2 2.0 4.0 6.0 8.0 10.0 12.0
3 NaN NaN NaN NaN NaN NaN
4 4.0 8.0 12.0 16.0 20.0 24.0
Notice that the last column E
is different. I've tracked this down and found that the _get_window function (for expanding) fails to return the correct number of windows when the following conditions are met:
axis=1
is used instead ofaxis=0
(default)- The number of rows in the dataframe is less than the number of columns
This is caused by the fact that the object is using len(obj)
in determining the window size. Instead, it should be using obj.shape[self.axis]
output of pd.show_versions()
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.18.1+237.ge357ea1
nose: 1.3.7
pip: 8.1.2
setuptools: 20.1.1
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: 0.7.0
IPython: 4.0.3
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.4.1
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: None
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: 0.9
apiclient: 1.4.0
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.0