BUG: pivot_table with margins=True fails for categorical dtype, #10989 by jakevdp · Pull Request #10993 · pandas-dev/pandas (original) (raw)
So there is something more going on here; this bug report is a sympton of a different issue. Namely,
that we allow a Categorical as a level of a MultiIndex. But in fact we should simply convert them directly to an object dtype when creating the multi-index in the first place; these are de-facto the same.
In [4]: data = pd.DataFrame({'x': np.arange(8),'y': Series(np.arange(8) // 4).astype('category'),'z': Series(np.arange(8) % 2).astype('category')})
In [5]: data
Out[5]:
x y z
0 0 0 0
1 1 0 1
2 2 0 0
3 3 0 1
4 4 1 0
5 5 1 1
6 6 1 0
7 7 1 1
In [6]: data.dtypes
Out[6]:
x int64
y category
z category
dtype: object
In [7]: data.groupby(['y','z']).agg('mean')
Out[7]:
x
y z
0 0 1
1 2
1 0 5
1 6
In [7]: data.groupby(['y','z']).agg('mean')
Out[7]:
x
y z
0 0 1
1 2
1 0 5
1 6
In [8]: data.groupby(['y','z']).agg('mean').index.levels[0]
Out[8]: CategoricalIndex([0, 1], categories=[0, 1], ordered=False, name=u'y', dtype='category')
In [9]: data.groupby(['y','z']).agg('mean').index.levels[1]
Out[9]: CategoricalIndex([0, 1], categories=[0, 1], ordered=False, name=u'z', dtype='category')
In [10]: data.groupby(['y','z']).agg('mean').index.values
Out[10]: array([(0, 0), (0, 1), (1, 0), (1, 1)], dtype=object)
So this is the same as the index in [10]
In [15]: idx = pd.MultiIndex.from_tuples([(0, 0), (0, 1), (1, 0), (1, 1)],names=['y','z'])
In [16]: idx
Out[16]:
MultiIndex(levels=[[0, 1], [0, 1]],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'y', u'z'])
So I think that MultiIndex
creation should coerce Categoricals
on construction. This can be done in the MultiIndex.__init__
(keep all of these existing test), prob need a couple more.