Use align_method in comp_method_FRAME by jbrockmendel · Pull Request #22880 · pandas-dev/pandas (original) (raw)

did you do a check of the performance to be sure there is no implication?

There does appear to be a performance hit in the relevant case, but a) there is room for subsequent optimization and b) the bugs this avoids are much more important than the performance impact.

In [3]: arr = np.arange(600).reshape(30, 20)
In [4]: df = pd.DataFrame(arr)
In [5] %timeit arr == arr[[0], :]    # calibrate based on numpy timings
10000 loops, best of 3: 82.9 µs per loop   <-- master
10000 loops, best of 3: 84 µs per loop      <-- PR

In [6]: ser = pd.Series(arr[0, :])
In [7]: l = list(ser)

In [8]: %timeit df == arr[[0], :]
100 loops, best of 3: 4.42 ms per loop   # <-- master
10 loops, best of 3: 87.5 ms per loop    # <-- PR

In [9]: %timeit df == ser
10 loops, best of 3: 83 ms per loop       <-- master
10 loops, best of 3: 86.9 ms per loop    <-- PR

In [10]: %timeit df == l
ValueError: Invalid broadcasting comparison       <-- master
10 loops, best of 3: 82.4 ms per loop    <-- PR

In [11]: df[1] = df[1].astype('f8')
    ...: df[3] = df[3].astype('uint8')
    ...: df[5] = df[5].view('m8[ns]')
    ...: df[7] = df[7].view('M8[ns]')
    ...: df[9] = df[9].astype('f4')
    ...:
In [12]: %timeit df == arr[[0], :]
ValueError: cannot broadcast shape [(30, 15)] with block values [(1, 20)]   <-- master, BUG
TypeError: cannot compare a TimedeltaIndex with type int64   <-- PR, BUG (in TimedeltaIndex.__eq__; just opened #23063)

In [13]: df[5] = df[5].view('i8').astype('i4')
In [14]: %timeit df == arr[[0], :]
ValueError: cannot broadcast shape [(30, 15)] with block values [(1, 20)]  <-- master, BUG
10 loops, best of 3: 86.7 ms per loop  <-- PR

In [14]: %timeit df == arr[0, :]
ValueError: Invalid broadcasting comparison [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])] with block values  <-- master
10 loops, best of 3: 80.9 ms per loop  <-- PR