BUG: pivot_table with margins=True fails for categorical dtype, #10989 by jakevdp · Pull Request #10993 · pandas-dev/pandas (original) (raw)

So there is something more going on here; this bug report is a sympton of a different issue. Namely,
that we allow a Categorical as a level of a MultiIndex. But in fact we should simply convert them directly to an object dtype when creating the multi-index in the first place; these are de-facto the same.

In [4]: data = pd.DataFrame({'x': np.arange(8),'y': Series(np.arange(8) // 4).astype('category'),'z': Series(np.arange(8) % 2).astype('category')})

In [5]: data
Out[5]: 
   x  y  z
0  0  0  0
1  1  0  1
2  2  0  0
3  3  0  1
4  4  1  0
5  5  1  1
6  6  1  0
7  7  1  1

In [6]: data.dtypes
Out[6]: 
x       int64
y    category
z    category
dtype: object

In [7]: data.groupby(['y','z']).agg('mean')
Out[7]: 
     x
y z   
0 0  1
  1  2
1 0  5
  1  6

In [7]: data.groupby(['y','z']).agg('mean')
Out[7]: 
     x
y z   
0 0  1
  1  2
1 0  5
  1  6

In [8]: data.groupby(['y','z']).agg('mean').index.levels[0]
Out[8]: CategoricalIndex([0, 1], categories=[0, 1], ordered=False, name=u'y', dtype='category')

In [9]: data.groupby(['y','z']).agg('mean').index.levels[1]
Out[9]: CategoricalIndex([0, 1], categories=[0, 1], ordered=False, name=u'z', dtype='category')

In [10]: data.groupby(['y','z']).agg('mean').index.values   
Out[10]: array([(0, 0), (0, 1), (1, 0), (1, 1)], dtype=object)

So this is the same as the index in [10]

In [15]: idx = pd.MultiIndex.from_tuples([(0, 0), (0, 1), (1, 0), (1, 1)],names=['y','z'])

In [16]: idx
Out[16]: 
MultiIndex(levels=[[0, 1], [0, 1]],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'y', u'z'])

So I think that MultiIndex creation should coerce Categoricals on construction. This can be done in the MultiIndex.__init__ (keep all of these existing test), prob need a couple more.