PERF: implement scalar ops blockwise by jbrockmendel · Pull Request #29853 · pandas-dev/pandas (original) (raw)

Resolved the issue with test_expressions behaving unexpectedly.

Added an asv that times operations on a homogeneous-dtype DataFrame (rows=20k, cols=100) with a scalar. Not sure how many variants of these to do; could be easy to go overboard.

       before           after         ratio
     [0cd388fd]       [1fc1e3ec]
     <cy30>           <back-to-arith>
-         110±4ms         59.6±1ms     0.54  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function floordiv>)
-         115±3ms         59.5±1ms     0.52  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function floordiv>)
-        88.2±1ms       44.1±0.7ms     0.50  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function floordiv>)
-        91.0±3ms       44.7±0.9ms     0.49  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function floordiv>)
-        94.8±3ms         46.5±1ms     0.49  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function floordiv>)
-        92.1±2ms       44.9±0.4ms     0.49  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function floordiv>)
-        93.9±4ms       42.2±0.2ms     0.45  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function floordiv>)
-        94.6±2ms         41.7±1ms     0.44  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function floordiv>)
-        78.0±4ms       28.6±0.4ms     0.37  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function pow>)
-        66.9±1ms       22.5±0.6ms     0.34  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function mod>)
-        67.8±2ms         22.1±1ms     0.33  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function mod>)
-        70.0±3ms       22.5±0.5ms     0.32  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function pow>)
-        67.3±2ms       20.8±0.6ms     0.31  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function mod>)
-        66.7±1ms         20.3±1ms     0.30  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function mod>)
-        64.9±1ms       19.3±0.5ms     0.30  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function mod>)
-      65.5±0.7ms       19.1±0.8ms     0.29  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function mod>)
-        73.2±1ms         18.2±1ms     0.25  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function mod>)
-        74.7±3ms         18.3±1ms     0.25  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function mod>)
-        60.3±1ms       9.87±0.1ms     0.16  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function pow>)
-        60.0±2ms       9.80±0.3ms     0.16  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function pow>)
-        59.4±3ms       9.65±0.4ms     0.16  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function pow>)
-        58.5±2ms       9.09±0.3ms     0.16  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function pow>)
-      42.6±0.9ms      3.96±0.09ms     0.09  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function xor>)
-      51.3±0.4ms       4.70±0.1ms     0.09  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function pow>)
-        52.7±2ms       4.68±0.1ms     0.09  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function pow>)
-      42.3±0.8ms       3.54±0.1ms     0.08  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function or_>)
-        43.4±2ms       3.36±0.2ms     0.08  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function and_>)
-        43.0±1ms       3.23±0.2ms     0.08  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function xor>)
-        44.9±1ms       3.36±0.2ms     0.07  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function or_>)
-      45.6±0.4ms       3.41±0.1ms     0.07  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function and_>)
-        49.8±1ms       3.24±0.1ms     0.07  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function mul>)
-      28.0±0.5ms       1.81±0.2ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function eq>)
-      48.8±0.7ms      3.15±0.07ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function truediv>)
-      50.1±0.5ms      3.15±0.09ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function mul>)
-      50.1±0.8ms       3.14±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function truediv>)
-        51.4±1ms       3.21±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function truediv>)
-      49.6±0.3ms      3.10±0.05ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function sub>)
-      49.5±0.9ms      3.08±0.06ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function mul>)
-        51.3±1ms       3.17±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function add>)
-        50.0±1ms      3.07±0.09ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function add>)
-      28.7±0.8ms       1.77±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function gt>)
-      50.4±0.8ms       3.06±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function add>)
-        29.1±1ms      1.76±0.03ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function ne>)
-        50.1±1ms       3.02±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function truediv>)
-      49.8±0.7ms      2.99±0.07ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function truediv>)
-        50.4±1ms       3.02±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function truediv>)
-        50.9±1ms       3.04±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function mul>)
-        50.8±1ms      3.03±0.09ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function truediv>)
-        50.8±1ms       3.03±0.3ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function sub>)
-      52.2±0.9ms      3.08±0.09ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function mul>)
-        54.0±1ms      3.17±0.09ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function add>)
-      51.1±0.7ms       2.98±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function add>)
-        50.8±1ms      2.96±0.06ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function sub>)
-        53.7±1ms      3.12±0.04ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function sub>)
-        52.7±2ms      3.05±0.08ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function truediv>)
-        51.2±1ms       2.96±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function sub>)
-        28.6±1ms      1.65±0.07ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function le>)
-      29.9±0.6ms       1.72±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function ge>)
-        29.7±2ms      1.69±0.04ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function lt>)
-      54.0±0.7ms       3.06±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function sub>)
-      55.1±0.9ms       3.12±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function add>)
-      50.6±0.8ms       2.86±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function add>)
-        59.1±2ms       3.31±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function sub>)
-        29.4±1ms      1.65±0.08ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function ge>)
-        53.5±1ms       2.97±0.1ms     0.06  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function mul>)
-        30.9±2ms       1.69±0.1ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function lt>)
-        29.2±2ms      1.59±0.09ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function eq>)
-      54.9±0.9ms       2.98±0.2ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function mul>)
-        54.2±1ms       2.93±0.1ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function add>)
-        54.5±1ms      2.92±0.08ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function sub>)
-        29.8±2ms       1.58±0.1ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function gt>)
-        58.2±3ms       3.08±0.2ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function mul>)
-        31.2±1ms       1.61±0.1ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function le>)
-        31.0±2ms       1.59±0.1ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 3.0, <built-in function ne>)
-        27.8±1ms      1.29±0.06ms     0.05  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function ge>)
-      27.4±0.8ms       1.22±0.1ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function le>)
-        28.0±1ms      1.17±0.04ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function le>)
-      27.8±0.6ms       1.16±0.1ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function lt>)
-      27.6±0.9ms      1.15±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function lt>)
-      27.8±0.4ms      1.16±0.07ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function le>)
-        29.8±2ms      1.24±0.06ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function ne>)
-      27.7±0.8ms      1.15±0.07ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function lt>)
-      27.4±0.6ms      1.14±0.08ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function le>)
-      28.5±0.8ms      1.18±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function ne>)
-      27.8±0.7ms      1.15±0.01ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function le>)
-      28.5±0.6ms      1.16±0.06ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function gt>)
-      27.1±0.8ms      1.11±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function lt>)
-      27.5±0.6ms      1.12±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function gt>)
-      27.4±0.4ms      1.11±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function ne>)
-      27.1±0.9ms      1.10±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function ge>)
-      28.6±0.4ms      1.14±0.06ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function eq>)
-      27.4±0.6ms      1.09±0.02ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function eq>)
-      28.2±0.7ms      1.12±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function ne>)
-      27.3±0.6ms      1.08±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function ge>)
-        28.1±1ms      1.12±0.02ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function gt>)
-        28.3±1ms      1.12±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function gt>)
-      28.1±0.8ms      1.11±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function lt>)
-      27.1±0.8ms      1.07±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function le>)
-        27.9±1ms      1.10±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function ge>)
-      28.2±0.5ms      1.09±0.06ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function lt>)
-      29.0±0.5ms      1.11±0.04ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function ne>)
-        29.0±2ms      1.11±0.04ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function gt>)
-      29.2±0.2ms      1.11±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function eq>)
-        28.2±2ms      1.07±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 4, <built-in function ge>)
-        29.1±1ms      1.10±0.04ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function gt>)
-      29.5±0.2ms      1.11±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 2, <built-in function eq>)
-        30.2±2ms      1.13±0.04ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 2, <built-in function ge>)
-      28.4±0.2ms      1.05±0.03ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 4, <built-in function eq>)
-        29.9±1ms      1.09±0.05ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function ne>)
-      29.1±0.6ms      1.03±0.02ms     0.04  binary_ops.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 5.0, <built-in function eq>)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.