Use align_method in comp_method_FRAME by jbrockmendel · Pull Request #22880 · pandas-dev/pandas (original) (raw)
did you do a check of the performance to be sure there is no implication?
There does appear to be a performance hit in the relevant case, but a) there is room for subsequent optimization and b) the bugs this avoids are much more important than the performance impact.
In [3]: arr = np.arange(600).reshape(30, 20)
In [4]: df = pd.DataFrame(arr)
In [5] %timeit arr == arr[[0], :] # calibrate based on numpy timings
10000 loops, best of 3: 82.9 µs per loop <-- master
10000 loops, best of 3: 84 µs per loop <-- PR
In [6]: ser = pd.Series(arr[0, :])
In [7]: l = list(ser)
In [8]: %timeit df == arr[[0], :]
100 loops, best of 3: 4.42 ms per loop # <-- master
10 loops, best of 3: 87.5 ms per loop # <-- PR
In [9]: %timeit df == ser
10 loops, best of 3: 83 ms per loop <-- master
10 loops, best of 3: 86.9 ms per loop <-- PR
In [10]: %timeit df == l
ValueError: Invalid broadcasting comparison <-- master
10 loops, best of 3: 82.4 ms per loop <-- PR
In [11]: df[1] = df[1].astype('f8')
...: df[3] = df[3].astype('uint8')
...: df[5] = df[5].view('m8[ns]')
...: df[7] = df[7].view('M8[ns]')
...: df[9] = df[9].astype('f4')
...:
In [12]: %timeit df == arr[[0], :]
ValueError: cannot broadcast shape [(30, 15)] with block values [(1, 20)] <-- master, BUG
TypeError: cannot compare a TimedeltaIndex with type int64 <-- PR, BUG (in TimedeltaIndex.__eq__; just opened #23063)
In [13]: df[5] = df[5].view('i8').astype('i4')
In [14]: %timeit df == arr[[0], :]
ValueError: cannot broadcast shape [(30, 15)] with block values [(1, 20)] <-- master, BUG
10 loops, best of 3: 86.7 ms per loop <-- PR
In [14]: %timeit df == arr[0, :]
ValueError: Invalid broadcasting comparison [array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])] with block values <-- master
10 loops, best of 3: 80.9 ms per loop <-- PR