TST: Fix test for datetime categorical by sinhrks · Pull Request #10501 · pandas-dev/pandas (original) (raw)

Related to #10465, but different part.

Current test_groupby_datetime_categorical in test_groupby.py is incorrect, the actual result returns CategoricalIndex as level 0, otherwise expected result uses DatetimeIndex as level 0. Changed to use the same dtype and added explicit comparison.

Actual Result (current test case)

levels = pd.date_range('2014-01-01', periods=4)
codes = np.random.randint(0, 4, size=100)
cats = pd.Categorical.from_codes(codes, levels, name='myfactor', ordered=True)
data = pd.DataFrame(np.random.randn(100, 4))
grouped = data.groupby(cats)
desc_result = grouped.describe()
desc_result.index.get_level_values(0)
# CategoricalIndex([2014-01-01T09:00:00.000000000+0900,
#                  ...
#                   2014-01-04T09:00:00.000000000+0900],
#                  categories=[2014-01-01 00:00:00, 2014-01-02 00:00:00, 2014-01-03 00:00:00, 2014-01-04 00:00:00],
# ordered=True, name=u'myfactor', dtype='category')

Expected Result (current test case)

It must be CategoricalIndex.

idx = cats.codes.argsort()
ord_labels = np.asarray(cats).take(idx)
ord_data = data.take(idx)
expected = ord_data.groupby(ord_labels, sort=False).describe()
expected.index.names = ['myfactor', None]
expected.index.get_level_values(0)
# DatetimeIndex(['2014-01-01', '2014-01-01', '2014-01-01', '2014-01-01',
#               ...
#               '2014-01-04', '2014-01-04', '2014-01-04', '2014-01-04'],
#               dtype='datetime64[ns]', name=u'myfactor', freq=None, tz=None)