PERF: O(n) speedup in any/all by re-enabling short-circuiting for bool case by qwhelan · Pull Request #25070 · pandas-dev/pandas (original) (raw)

I'm going to see if I can simplify further - I was able to remove the "copy=" parameter from _get_values().

That being said, allowing mask=None requires some more involved modifications that also allow for some decent performance gains. The most recent commit extends the speedups across all nanops and to all integer dtypes. In particular, we're now getting a 7-8x speedup in nansum() and nanmean() for int64 data. The nanprod(), nanvar(), and nanstd() functions see a ~3x speedup:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [68b1da7f]       [d96b086c]
     <any_all_fix~3>       <any_all_fix>
-     2.06±0.01ms         1.85±0ms     0.89  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
-      52.3±0.5ms       46.5±0.7ms     0.89  reshape.Cut.time_cut_int(1000)
-         162±3ms        141±0.8ms     0.87  groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'transformation')
-         238±2ms          208±2ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'direct')
-         240±3ms          208±2ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
-         374±2ms          324±3ms     0.87  groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'transformation')
-         373±3ms          321±2ms     0.86  groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'direct')
-      24.6±0.4ms       21.1±0.2ms     0.86  groupby.Nth.time_groupby_nth_all('float64')
-     4.67±0.07ms      3.98±0.08ms     0.85  reshape.SimpleReshape.time_stack
-      25.2±0.4ms       21.4±0.4ms     0.85  groupby.Nth.time_groupby_nth_all('datetime')
-        971±10μs         826±10μs     0.85  stat_ops.SeriesOps.time_op('median', 'int', False)
-         511±7μs         433±10μs     0.85  groupby.GroupByMethods.time_dtype_as_field('object', 'any', 'transformation')
-         166±4ms          140±2ms     0.85  groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'direct')
-        983±10μs         821±30μs     0.84  stat_ops.SeriesOps.time_op('median', 'int', True)
-     6.33±0.02ms      5.27±0.04ms     0.83  timeseries.AsOf.time_asof('DataFrame')
-     16.9±0.07ms       14.1±0.1ms     0.83  indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
-         115±7μs         91.2±6μs     0.80  series_methods.NanOps.time_std(1000, 'int64')
-      4.23±0.1ms       3.36±0.1ms     0.79  indexing.NumericSeriesIndexing.time_ix_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
-     6.42±0.06ms      5.03±0.03ms     0.78  timeseries.AsOf.time_asof_nan('DataFrame')
-        218±10μs          170±4μs     0.78  series_methods.NanOps.time_sem(1000, 'int64')
-      18.5±0.4ms       14.0±0.3ms     0.76  algorithms.Hashing.time_frame
-        70.4±2μs         53.2±4μs     0.76  series_methods.NanOps.time_sum(1000, 'int64')
-       39.3±10ms       29.7±0.5ms     0.75  series_methods.NanOps.time_sem(1000000, 'int64')
-     2.79±0.05ms      2.05±0.05ms     0.73  timeseries.AsOf.time_asof_single('DataFrame')
-      26.1±0.2ms         19.1±2ms     0.73  stat_ops.FrameOps.time_op('sem', 'int', 1, False)
-      26.2±0.2ms         19.2±2ms     0.73  stat_ops.FrameOps.time_op('sem', 'int', 1, True)
-        21.7±2ms         15.6±2ms     0.72  stat_ops.FrameOps.time_op('sem', 'int', 0, False)
-        21.8±2ms         15.6±2ms     0.72  stat_ops.FrameOps.time_op('sem', 'int', 0, True)
-         258±4μs          184±4μs     0.71  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 'numpy.int16'>), 'monotonic_incr')
-         248±3μs          176±2μs     0.71  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
-        91.8±3μs         64.7±6μs     0.71  series_methods.NanOps.time_var(1000, 'int64')
-         129±3ms       90.5±0.4ms     0.70  frame_methods.Describe.time_dataframe_describe
-         312±3μs          217±1μs     0.69  indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt32Engine'>, <class 'numpy.uint32'>), 'monotonic_incr')
-        78.1±4μs         53.8±3μs     0.69  series_methods.NanOps.time_mean(1000, 'int64')
-      41.0±0.2ms       28.2±0.2ms     0.69  frame_methods.Describe.time_series_describe
-        89.0±6μs         61.0±4μs     0.69  series_methods.NanOps.time_skew(1000, 'int64')
-        92.7±4μs         62.0±2μs     0.67  series_methods.NanOps.time_kurt(1000, 'int64')
-        35.9±4ms         23.6±4ms     0.66  series_methods.NanOps.time_skew(1000000, 'int64')
-     2.92±0.02ms      1.91±0.03ms     0.66  timeseries.AsOf.time_asof_nan_single('DataFrame')
-      27.3±0.8ms       17.8±0.3ms     0.65  stat_ops.FrameOps.time_op('skew', 'int', 1, True)
-        59.2±2μs         38.6±3μs     0.65  series_methods.NanOps.time_argmax(1000, 'int64')
-     1.55±0.01ms      1.01±0.06ms     0.65  stat_ops.SeriesOps.time_op('sem', 'int', False)
-      9.85±0.3ms       6.36±0.1ms     0.65  groupby.TransformBools.time_transform_mean
-        36.1±4ms         23.2±4ms     0.64  series_methods.NanOps.time_kurt(1000000, 'int64')
-        51.2±1μs         32.7±3μs     0.64  series_methods.NanOps.time_prod(1000, 'int64')
-     1.58±0.04ms         999±60μs     0.63  stat_ops.SeriesOps.time_op('sem', 'int', True)
-         208±1ms        128±0.6ms     0.62  frame_methods.Dropna.time_dropna('all', 1)
-         207±1ms        125±0.6ms     0.60  frame_methods.Dropna.time_dropna('all', 0)
-        6.96±4ms      4.18±0.02ms     0.60  algorithms.Duplicated.time_duplicated(False, 'int')
-      24.9±0.1ms       14.9±0.2ms     0.60  stat_ops.FrameOps.time_op('kurt', 'int', 1, False)
-         253±2ms          151±4ms     0.60  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 1)
-        53.3±2μs         31.6±3μs     0.59  series_methods.All.time_all(1000, 'slow')
-      25.2±0.9ms       14.9±0.2ms     0.59  stat_ops.FrameOps.time_op('kurt', 'int', 1, True)
-        62.9±3μs         36.4±3μs     0.58  series_methods.NanOps.time_max(1000, 'int64')
-     5.05±0.04ms      2.93±0.01ms     0.58  stat_ops.FrameOps.time_op('median', 'int', 0, False)
-     5.07±0.03ms      2.93±0.03ms     0.58  stat_ops.FrameOps.time_op('median', 'int', 0, True)
-      16.2±0.1ms         9.29±2ms     0.57  stat_ops.FrameOps.time_op('skew', 'int', 0, False)
-      16.1±0.6ms         9.22±1ms     0.57  stat_ops.FrameOps.time_op('kurt', 'int', 0, True)
-      16.1±0.2ms         9.21±2ms     0.57  stat_ops.FrameOps.time_op('kurt', 'int', 0, False)
-        16.3±1ms         9.28±2ms     0.57  stat_ops.FrameOps.time_op('skew', 'int', 0, True)
-        63.9±4μs         35.4±1μs     0.55  series_methods.NanOps.time_min(1000, 'int64')
-        54.6±3μs         30.0±2μs     0.55  series_methods.All.time_all(1000, 'fast')
-        54.7±3μs         28.7±2μs     0.53  series_methods.Any.time_any(1000, 'fast')
-         264±1ms        136±0.3ms     0.51  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 0)
-        56.8±4μs         28.3±1μs     0.50  series_methods.Any.time_any(1000, 'slow')
-        852±20μs         388±10μs     0.46  stat_ops.SeriesOps.time_op('skew', 'int', False)
-        852±10μs          376±6μs     0.44  stat_ops.SeriesOps.time_op('skew', 'int', True)
-         852±9μs          376±6μs     0.44  stat_ops.SeriesOps.time_op('kurt', 'int', True)
-         292±3μs        123±0.9μs     0.42  stat_ops.SeriesOps.time_op('prod', 'int', True)
-         288±5μs          120±3μs     0.42  stat_ops.SeriesOps.time_op('prod', 'int', False)
-     1.23±0.01ms          490±4μs     0.40  stat_ops.FrameOps.time_op('prod', 'int', 0, True)
-     1.24±0.02ms          490±5μs     0.40  stat_ops.FrameOps.time_op('prod', 'int', 0, False)
-       931±100μs          369±6μs     0.40  stat_ops.SeriesOps.time_op('kurt', 'int', False)
-       158±0.6ms       62.5±0.4ms     0.40  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 1)
-         322±7μs          123±4μs     0.38  stat_ops.SeriesOps.time_op('mean', 'int', True)
-        708±10μs          261±8μs     0.37  stat_ops.SeriesOps.time_op('std', 'int', False)
-         328±3μs          120±2μs     0.37  stat_ops.SeriesOps.time_op('mean', 'int', False)
-         681±9μs         249±10μs     0.37  stat_ops.SeriesOps.time_op('var', 'int', False)
-      3.03±0.9ms      1.10±0.01ms     0.36  series_methods.NanOps.time_argmax(1000000, 'int64')
-        701±10μs          251±4μs     0.36  stat_ops.SeriesOps.time_op('std', 'int', True)
-     2.70±0.01ms          966±9μs     0.36  series_methods.NanOps.time_prod(1000000, 'int64')
-         690±6μs          243±3μs     0.35  stat_ops.SeriesOps.time_op('var', 'int', True)
-         305±7μs          103±3μs     0.34  stat_ops.SeriesOps.time_op('sum', 'int', False)
-        311±10μs         99.1±3μs     0.32  stat_ops.SeriesOps.time_op('sum', 'int', True)
-         121±1ms         38.4±1ms     0.32  frame_methods.Dropna.time_dropna('any', 0)
-       114±0.3ms       35.5±0.5ms     0.31  frame_methods.Dropna.time_dropna('any', 1)
-      8.80±0.2ms      2.64±0.07ms     0.30  stat_ops.FrameOps.time_op('std', 'int', 1, True)
-        17.5±4ms         5.20±4ms     0.30  series_methods.NanOps.time_std(1000000, 'int64')
-      8.83±0.2ms       2.61±0.2ms     0.30  stat_ops.FrameOps.time_op('std', 'int', 1, False)
-        8.61±1ms       2.41±0.3ms     0.28  stat_ops.FrameOps.time_op('var', 'int', 1, False)
-     1.37±0.01ms         382±10μs     0.28  stat_ops.FrameOps.time_op('prod', 'int', 1, True)
-     8.59±0.06ms      2.38±0.08ms     0.28  stat_ops.FrameOps.time_op('var', 'int', 1, True)
-     1.40±0.03ms         385±10μs     0.27  stat_ops.FrameOps.time_op('prod', 'int', 1, False)
-        17.8±4ms         4.75±4ms     0.27  series_methods.NanOps.time_var(1000000, 'int64')
-        3.70±1ms         963±10μs     0.26  series_methods.NanOps.time_mean(1000000, 'int64')
-      7.49±0.1ms      1.92±0.07ms     0.26  stat_ops.FrameOps.time_op('var', 'int', 0, True)
-      7.48±0.1ms      1.89±0.04ms     0.25  stat_ops.FrameOps.time_op('std', 'int', 0, False)
-      7.42±0.1ms      1.86±0.06ms     0.25  stat_ops.FrameOps.time_op('var', 'int', 0, False)
-      7.47±0.1ms      1.85±0.09ms     0.25  stat_ops.FrameOps.time_op('std', 'int', 0, True)
-         169±1ms       41.0±0.4ms     0.24  frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 0)
-        3.39±1ms         723±10μs     0.21  series_methods.NanOps.time_min(1000000, 'int64')
-        3.67±1ms         722±10μs     0.20  series_methods.NanOps.time_max(1000000, 'int64')
-        3.40±1ms          649±9μs     0.19  series_methods.NanOps.time_sum(1000000, 'int64')
-     3.93±0.04ms          646±9μs     0.16  stat_ops.FrameOps.time_op('mean', 'int', 1, True)
-     3.91±0.03ms         641±10μs     0.16  stat_ops.FrameOps.time_op('mean', 'int', 1, False)
-     3.28±0.03ms         493±10μs     0.15  stat_ops.FrameOps.time_op('mean', 'int', 0, False)
-     3.27±0.03ms          488±8μs     0.15  stat_ops.FrameOps.time_op('mean', 'int', 0, True)
-     2.69±0.04ms         381±10μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 0, False)
-     2.70±0.02ms          375±5μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 0, True)
-     2.95±0.01ms         400±10μs     0.14  stat_ops.FrameOps.time_op('sum', 'int', 1, True)
-     2.94±0.02ms          383±8μs     0.13  stat_ops.FrameOps.time_op('sum', 'int', 1, False)
-      6.29±0.1ms       70.3±0.8μs     0.01  series_methods.Any.time_any(1000000, 'slow')
-      6.48±0.2ms         68.0±4μs     0.01  series_methods.All.time_all(1000000, 'slow')
-      6.50±0.2ms       37.2±0.9μs     0.01  series_methods.Any.time_any(1000000, 'fast')
-     6.37±0.07ms         35.5±1μs     0.01  series_methods.All.time_all(1000000, 'fast')