BUG: Fixes GH9311 groupby on datetime64 by chrisbyboston · Pull Request #9345 · pandas-dev/pandas (original) (raw)

@shoyer

Alright, I found it. Here's the relevant function:

def _try_coerce_result(self, result):
    """ reverse of try_coerce_args """
    if isinstance(result, np.ndarray):
        if result.dtype == 'i8':
            result = tslib.array_to_datetime(
                result.astype(object).ravel()).reshape(result.shape)
        elif result.dtype.kind in ['i', 'f', 'O']:
            result = result.astype('M8[ns]', casting='safe')
    elif isinstance(result, (np.integer, np.datetime64)):
        result = lib.Timestamp(result)
    return result

We aren't even hitting the casting='safe' section any more because the new Cython functions keep the datetime64 as an i8 when it comes into this function. The slow down is from using the line that gets hit in the if result.dtype == 'i8 condition. Additionally, casting='safe' is useless in this elif block, because in numpy, none of the dtypes we're looking for can be safely cast to 'M8[ns]'. Here's the proof:

In [5]: np.can_cast(np.int8, 'M8[ns]')
Out[5]: False

In [6]: np.can_cast(np.int16, 'M8[ns]')
Out[6]: False

In [7]: np.can_cast(np.int32, 'M8[ns]')
Out[7]: False

In [8]: np.can_cast(np.int64, 'M8[ns]')
Out[8]: False

In [9]: np.can_cast(np.float32, 'M8[ns]')
Out[9]: False

In [10]: np.can_cast(np.float64, 'M8[ns]')
Out[10]: False

In [11]: np.can_cast('O', 'M8[ns]')
Out[11]: False

What's more, this block...

        if result.dtype == 'i8':
            result = tslib.array_to_datetime(
                result.astype(object).ravel()).reshape(result.shape)

...I don't believe is necessary any more, as I think it was covering for this error in numpy 1.6.

I'm going to clean this function up and make sure all the tests are passing and that vbench looks better, and I'll push up my changes.