BUG: inconsistent behaviour of Groupby with empty bins when grouping by single/multiple keys · Issue #8138 · pandas-dev/pandas (original) (raw)
Hello everyone,
I just stumbled upon an inconsistent behaviour of the groupby function which is causing me a lot of trouble. When grouping on bins, I expect the empty bins to be kept as NA values, for dimension consistency when one wants to aggregate and compare data.
This is effectively the case when grouping on a single key, but the empty bins are dropped as soon as one adds a second key to the groupby function.
import pandas as pd import numpy as np
d = {'Col 1': [3, 3, 4, 5], 'Col 2': [1, 2, 3, 4], 'Col 3': [10, 100, 200, 34]}
test = pd.DataFrame(d)
values = pd.cut(test['Col 1'], [1, 2, 3, 6])
Grouping on a single column
groups_single_key = test.groupby(values)
Grouping on two columns
groups_double_key = test.groupby([values,'Col 2'])
The empty group is kept as NA, which is the behaviour I was expecting
groups_single_key.describe()
The empty groups are dropped
groups_double_key.describe()
This is not just an artifact of the describe() method: the empty group really
does exist and is taken into account when performing aggregation
print(groups_single_key.agg('mean')) print(groups_double_key.agg('mean'))
pandas: 0.14.1
nose: 1.3.1
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.2
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None