BUG: number of keys for grouping equals axis length causing incorrect grouping · Issue #16843 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
df = pd.DataFrame([['foo','two', 0.371118],['foo','two', 0.765483]],columns=['first','second','one']) df.set_index(['first','second'], inplace=True)
df.groupby(['first','second']).size()
df.reset_index().groupby(['first','second']).size()
Problem description
The output of the first groupby size gives me an output of:
Expected Output
rather then the expected output i get after i reset the index:
The problem appears when the level of multiindex is equal the number of rows of the grouped dataframe (tested for 2 and 3 rows = levels). If the number of rows is smaller or bigger then the levels of the multiindex the problem does not occur.
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 15.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None