Shifting GroupBy Object with `freq` inserts unwanted index level · Issue #23918 · pandas-dev/pandas (original) (raw)

IMO somewhat unexpected behavior depending on the arguments provided to shift after a groupby:

In [26]: ser = pd.Series(range(3), index=pd.date_range('1/1/18', periods=3))

In [28]: ser.groupby([1,1,1]).shift()
Out[28]: 2018-01-01 NaN 2018-01-02 0.0 2018-01-03 1.0 Freq: D, dtype: float64

No-op with freq keyword

In [34]: ser.groupby([1,1,1]).shift(0, freq='D')
Out[34]: 2018-01-01 0 2018-01-02 1 2018-01-03 2 Freq: D, Name: 1, dtype: int64

Suddenly inserted the grouper into the first level of a MI?

In [30]: ser.groupby([1,1,1]).shift(freq='D')
Out[30]: 1 2018-01-02 0 2018-01-03 1 2018-01-04 2 Name: 1, dtype: int64

This is arguably in contrast to #22053 which is asking for the behavior of the last item consistently, but I think that is the one that is actually incorrect out of all examples and causes misalignment with the index of the original caller.

cc @SimonAlecks who originally found this in #21235

Expected Output

In [30]: ser.groupby([1,1,1]).shift(freq='D')
Out[34]: 2018-01-01 NaN 2018-01-02 0.0 2018-01-03 1.0 Freq: D, dtype: float64

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: 1f1f705
python: 3.6.7.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+1123.g1f1f7053d.dirty
pytest: 4.0.0
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: 0.11.0
IPython: 7.1.1
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.2.0