BUG: surprising and possibly erroneous behavior of GroupBy.apply with an indexed series (index winds up duplicated) (original) (raw)


Code Sample, a copy-pastable example

import pandas as pd

data = [{"label": l, "x" : x, "y": x + 1} for l in ("foo", "bar") for x in range(5)] df = pd.DataFrame(data) df = df.set_index(["label", "x"]) series = df["y"] series2 = series.groupby(["label"]).apply(lambda s: s[2:]) print(series2.index)

Output:

MultiIndex([('bar', 'bar', 2), ('bar', 'bar', 3), ('bar', 'bar', 4), ('foo', 'foo', 2), ('foo', 'foo', 3), ('foo', 'foo', 4)], names=['label', 'label', 'x'])

Problem description

The "label" field is duplicated in the index of the result

Expected Output

I expect the index after the apply to be the same as before, ie to only contain "label" once

Output of pd.show_versions()

Details

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.1
Cython : 0.29.15
pytest : 5.4.0
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : 0.8.6
xarray : None
xlrd : None
xlwt : None
numba : None