Rolling sum (closed='left') with duplicate Timestamp indices. · Issue #20712 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd import numpy as np
n = 5 t1 = pd.date_range('1/1/2011', periods=n, freq='H') v1 = range(n)
t2 = pd.date_range('1/1/2011', periods=n, freq='H') v2 = range(n, 2*n)
index = np.sort(np.hstack([t1.values, t2.values])) data = np.sort(np.hstack([v1, v2])) s = pd.Series(data=data, index=index) print(s)
2011-01-01 00:00:00 0
2011-01-01 00:00:00 1
2011-01-01 01:00:00 2
2011-01-01 01:00:00 3
2011-01-01 02:00:00 4
2011-01-01 02:00:00 5
2011-01-01 03:00:00 6
2011-01-01 03:00:00 7
2011-01-01 04:00:00 8
2011-01-01 04:00:00 9
Unexpected behavior: rolling treats duplicate times as consecutive
four_hours = pd.Timedelta('4h')
g = s.rolling(four_hours,closed='left').sum()
print(g)
2011-01-01 00:00:00 NaN
2011-01-01 00:00:00 0.0
2011-01-01 01:00:00 1.0
2011-01-01 01:00:00 3.0
2011-01-01 02:00:00 6.0
2011-01-01 02:00:00 10.0
2011-01-01 03:00:00 15.0
2011-01-01 03:00:00 21.0
2011-01-01 04:00:00 28.0
2011-01-01 04:00:00 36.0
Problem description
Rolling treats duplicate times as consecutive rather than equal.
Expected Output
gg = s.groupby(s.index).sum().rolling(four_hours, closed='left').sum()
ggg = pd.concat([gg, gg]).sort_index()
print(ggg)
2011-01-01 00:00:00 NaN
2011-01-01 00:00:00 NaN
2011-01-01 01:00:00 1.0
2011-01-01 01:00:00 1.0
2011-01-01 02:00:00 6.0
2011-01-01 02:00:00 6.0
2011-01-01 03:00:00 15.0
2011-01-01 03:00:00 15.0
2011-01-01 04:00:00 28.0
2011-01-01 04:00:00 28.0
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.4.0
pip: 9.0.3
setuptools: 36.6.0
Cython: None
numpy: 1.13.3
scipy: 0.19.0
xarray: 0.9.5
IPython: 5.5.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: 0.1.0
pandas_gbq: None
pandas_datareader: None
import sys; print('Python %s on %s' % (sys.version, sys.platform))