PERF: O(n) speedup in any/all by re-enabling short-circuiting for bool case by qwhelan · Pull Request #25070 · pandas-dev/pandas (original) (raw)
I'm going to see if I can simplify further - I was able to remove the "copy=" parameter from _get_values()
.
That being said, allowing mask=None
requires some more involved modifications that also allow for some decent performance gains. The most recent commit extends the speedups across all nanops
and to all integer
dtypes. In particular, we're now getting a 7-8x
speedup in nansum()
and nanmean()
for int64
data. The nanprod()
, nanvar()
, and nanstd()
functions see a ~3x
speedup:
$ asv compare upstream/master HEAD -s --sort ratio --only-changed
before after ratio
[68b1da7f] [d96b086c]
<any_all_fix~3> <any_all_fix>
- 2.06±0.01ms 1.85±0ms 0.89 indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>, 'nonunique_monotonic_inc')
- 52.3±0.5ms 46.5±0.7ms 0.89 reshape.Cut.time_cut_int(1000)
- 162±3ms 141±0.8ms 0.87 groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'transformation')
- 238±2ms 208±2ms 0.87 groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'direct')
- 240±3ms 208±2ms 0.87 groupby.GroupByMethods.time_dtype_as_group('int', 'skew', 'transformation')
- 374±2ms 324±3ms 0.87 groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'transformation')
- 373±3ms 321±2ms 0.86 groupby.GroupByMethods.time_dtype_as_group('float', 'skew', 'direct')
- 24.6±0.4ms 21.1±0.2ms 0.86 groupby.Nth.time_groupby_nth_all('float64')
- 4.67±0.07ms 3.98±0.08ms 0.85 reshape.SimpleReshape.time_stack
- 25.2±0.4ms 21.4±0.4ms 0.85 groupby.Nth.time_groupby_nth_all('datetime')
- 971±10μs 826±10μs 0.85 stat_ops.SeriesOps.time_op('median', 'int', False)
- 511±7μs 433±10μs 0.85 groupby.GroupByMethods.time_dtype_as_field('object', 'any', 'transformation')
- 166±4ms 140±2ms 0.85 groupby.GroupByMethods.time_dtype_as_field('int', 'skew', 'direct')
- 983±10μs 821±30μs 0.84 stat_ops.SeriesOps.time_op('median', 'int', True)
- 6.33±0.02ms 5.27±0.04ms 0.83 timeseries.AsOf.time_asof('DataFrame')
- 16.9±0.07ms 14.1±0.1ms 0.83 indexing.NumericSeriesIndexing.time_getitem_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
- 115±7μs 91.2±6μs 0.80 series_methods.NanOps.time_std(1000, 'int64')
- 4.23±0.1ms 3.36±0.1ms 0.79 indexing.NumericSeriesIndexing.time_ix_array(<class 'pandas.core.indexes.numeric.Int64Index'>, 'unique_monotonic_inc')
- 6.42±0.06ms 5.03±0.03ms 0.78 timeseries.AsOf.time_asof_nan('DataFrame')
- 218±10μs 170±4μs 0.78 series_methods.NanOps.time_sem(1000, 'int64')
- 18.5±0.4ms 14.0±0.3ms 0.76 algorithms.Hashing.time_frame
- 70.4±2μs 53.2±4μs 0.76 series_methods.NanOps.time_sum(1000, 'int64')
- 39.3±10ms 29.7±0.5ms 0.75 series_methods.NanOps.time_sem(1000000, 'int64')
- 2.79±0.05ms 2.05±0.05ms 0.73 timeseries.AsOf.time_asof_single('DataFrame')
- 26.1±0.2ms 19.1±2ms 0.73 stat_ops.FrameOps.time_op('sem', 'int', 1, False)
- 26.2±0.2ms 19.2±2ms 0.73 stat_ops.FrameOps.time_op('sem', 'int', 1, True)
- 21.7±2ms 15.6±2ms 0.72 stat_ops.FrameOps.time_op('sem', 'int', 0, False)
- 21.8±2ms 15.6±2ms 0.72 stat_ops.FrameOps.time_op('sem', 'int', 0, True)
- 258±4μs 184±4μs 0.71 indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int16Engine'>, <class 'numpy.int16'>), 'monotonic_incr')
- 248±3μs 176±2μs 0.71 indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.Int8Engine'>, <class 'numpy.int8'>), 'monotonic_incr')
- 91.8±3μs 64.7±6μs 0.71 series_methods.NanOps.time_var(1000, 'int64')
- 129±3ms 90.5±0.4ms 0.70 frame_methods.Describe.time_dataframe_describe
- 312±3μs 217±1μs 0.69 indexing_engines.NumericEngineIndexing.time_get_loc((<class 'pandas._libs.index.UInt32Engine'>, <class 'numpy.uint32'>), 'monotonic_incr')
- 78.1±4μs 53.8±3μs 0.69 series_methods.NanOps.time_mean(1000, 'int64')
- 41.0±0.2ms 28.2±0.2ms 0.69 frame_methods.Describe.time_series_describe
- 89.0±6μs 61.0±4μs 0.69 series_methods.NanOps.time_skew(1000, 'int64')
- 92.7±4μs 62.0±2μs 0.67 series_methods.NanOps.time_kurt(1000, 'int64')
- 35.9±4ms 23.6±4ms 0.66 series_methods.NanOps.time_skew(1000000, 'int64')
- 2.92±0.02ms 1.91±0.03ms 0.66 timeseries.AsOf.time_asof_nan_single('DataFrame')
- 27.3±0.8ms 17.8±0.3ms 0.65 stat_ops.FrameOps.time_op('skew', 'int', 1, True)
- 59.2±2μs 38.6±3μs 0.65 series_methods.NanOps.time_argmax(1000, 'int64')
- 1.55±0.01ms 1.01±0.06ms 0.65 stat_ops.SeriesOps.time_op('sem', 'int', False)
- 9.85±0.3ms 6.36±0.1ms 0.65 groupby.TransformBools.time_transform_mean
- 36.1±4ms 23.2±4ms 0.64 series_methods.NanOps.time_kurt(1000000, 'int64')
- 51.2±1μs 32.7±3μs 0.64 series_methods.NanOps.time_prod(1000, 'int64')
- 1.58±0.04ms 999±60μs 0.63 stat_ops.SeriesOps.time_op('sem', 'int', True)
- 208±1ms 128±0.6ms 0.62 frame_methods.Dropna.time_dropna('all', 1)
- 207±1ms 125±0.6ms 0.60 frame_methods.Dropna.time_dropna('all', 0)
- 6.96±4ms 4.18±0.02ms 0.60 algorithms.Duplicated.time_duplicated(False, 'int')
- 24.9±0.1ms 14.9±0.2ms 0.60 stat_ops.FrameOps.time_op('kurt', 'int', 1, False)
- 253±2ms 151±4ms 0.60 frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 1)
- 53.3±2μs 31.6±3μs 0.59 series_methods.All.time_all(1000, 'slow')
- 25.2±0.9ms 14.9±0.2ms 0.59 stat_ops.FrameOps.time_op('kurt', 'int', 1, True)
- 62.9±3μs 36.4±3μs 0.58 series_methods.NanOps.time_max(1000, 'int64')
- 5.05±0.04ms 2.93±0.01ms 0.58 stat_ops.FrameOps.time_op('median', 'int', 0, False)
- 5.07±0.03ms 2.93±0.03ms 0.58 stat_ops.FrameOps.time_op('median', 'int', 0, True)
- 16.2±0.1ms 9.29±2ms 0.57 stat_ops.FrameOps.time_op('skew', 'int', 0, False)
- 16.1±0.6ms 9.22±1ms 0.57 stat_ops.FrameOps.time_op('kurt', 'int', 0, True)
- 16.1±0.2ms 9.21±2ms 0.57 stat_ops.FrameOps.time_op('kurt', 'int', 0, False)
- 16.3±1ms 9.28±2ms 0.57 stat_ops.FrameOps.time_op('skew', 'int', 0, True)
- 63.9±4μs 35.4±1μs 0.55 series_methods.NanOps.time_min(1000, 'int64')
- 54.6±3μs 30.0±2μs 0.55 series_methods.All.time_all(1000, 'fast')
- 54.7±3μs 28.7±2μs 0.53 series_methods.Any.time_any(1000, 'fast')
- 264±1ms 136±0.3ms 0.51 frame_methods.Dropna.time_dropna_axis_mixed_dtypes('all', 0)
- 56.8±4μs 28.3±1μs 0.50 series_methods.Any.time_any(1000, 'slow')
- 852±20μs 388±10μs 0.46 stat_ops.SeriesOps.time_op('skew', 'int', False)
- 852±10μs 376±6μs 0.44 stat_ops.SeriesOps.time_op('skew', 'int', True)
- 852±9μs 376±6μs 0.44 stat_ops.SeriesOps.time_op('kurt', 'int', True)
- 292±3μs 123±0.9μs 0.42 stat_ops.SeriesOps.time_op('prod', 'int', True)
- 288±5μs 120±3μs 0.42 stat_ops.SeriesOps.time_op('prod', 'int', False)
- 1.23±0.01ms 490±4μs 0.40 stat_ops.FrameOps.time_op('prod', 'int', 0, True)
- 1.24±0.02ms 490±5μs 0.40 stat_ops.FrameOps.time_op('prod', 'int', 0, False)
- 931±100μs 369±6μs 0.40 stat_ops.SeriesOps.time_op('kurt', 'int', False)
- 158±0.6ms 62.5±0.4ms 0.40 frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 1)
- 322±7μs 123±4μs 0.38 stat_ops.SeriesOps.time_op('mean', 'int', True)
- 708±10μs 261±8μs 0.37 stat_ops.SeriesOps.time_op('std', 'int', False)
- 328±3μs 120±2μs 0.37 stat_ops.SeriesOps.time_op('mean', 'int', False)
- 681±9μs 249±10μs 0.37 stat_ops.SeriesOps.time_op('var', 'int', False)
- 3.03±0.9ms 1.10±0.01ms 0.36 series_methods.NanOps.time_argmax(1000000, 'int64')
- 701±10μs 251±4μs 0.36 stat_ops.SeriesOps.time_op('std', 'int', True)
- 2.70±0.01ms 966±9μs 0.36 series_methods.NanOps.time_prod(1000000, 'int64')
- 690±6μs 243±3μs 0.35 stat_ops.SeriesOps.time_op('var', 'int', True)
- 305±7μs 103±3μs 0.34 stat_ops.SeriesOps.time_op('sum', 'int', False)
- 311±10μs 99.1±3μs 0.32 stat_ops.SeriesOps.time_op('sum', 'int', True)
- 121±1ms 38.4±1ms 0.32 frame_methods.Dropna.time_dropna('any', 0)
- 114±0.3ms 35.5±0.5ms 0.31 frame_methods.Dropna.time_dropna('any', 1)
- 8.80±0.2ms 2.64±0.07ms 0.30 stat_ops.FrameOps.time_op('std', 'int', 1, True)
- 17.5±4ms 5.20±4ms 0.30 series_methods.NanOps.time_std(1000000, 'int64')
- 8.83±0.2ms 2.61±0.2ms 0.30 stat_ops.FrameOps.time_op('std', 'int', 1, False)
- 8.61±1ms 2.41±0.3ms 0.28 stat_ops.FrameOps.time_op('var', 'int', 1, False)
- 1.37±0.01ms 382±10μs 0.28 stat_ops.FrameOps.time_op('prod', 'int', 1, True)
- 8.59±0.06ms 2.38±0.08ms 0.28 stat_ops.FrameOps.time_op('var', 'int', 1, True)
- 1.40±0.03ms 385±10μs 0.27 stat_ops.FrameOps.time_op('prod', 'int', 1, False)
- 17.8±4ms 4.75±4ms 0.27 series_methods.NanOps.time_var(1000000, 'int64')
- 3.70±1ms 963±10μs 0.26 series_methods.NanOps.time_mean(1000000, 'int64')
- 7.49±0.1ms 1.92±0.07ms 0.26 stat_ops.FrameOps.time_op('var', 'int', 0, True)
- 7.48±0.1ms 1.89±0.04ms 0.25 stat_ops.FrameOps.time_op('std', 'int', 0, False)
- 7.42±0.1ms 1.86±0.06ms 0.25 stat_ops.FrameOps.time_op('var', 'int', 0, False)
- 7.47±0.1ms 1.85±0.09ms 0.25 stat_ops.FrameOps.time_op('std', 'int', 0, True)
- 169±1ms 41.0±0.4ms 0.24 frame_methods.Dropna.time_dropna_axis_mixed_dtypes('any', 0)
- 3.39±1ms 723±10μs 0.21 series_methods.NanOps.time_min(1000000, 'int64')
- 3.67±1ms 722±10μs 0.20 series_methods.NanOps.time_max(1000000, 'int64')
- 3.40±1ms 649±9μs 0.19 series_methods.NanOps.time_sum(1000000, 'int64')
- 3.93±0.04ms 646±9μs 0.16 stat_ops.FrameOps.time_op('mean', 'int', 1, True)
- 3.91±0.03ms 641±10μs 0.16 stat_ops.FrameOps.time_op('mean', 'int', 1, False)
- 3.28±0.03ms 493±10μs 0.15 stat_ops.FrameOps.time_op('mean', 'int', 0, False)
- 3.27±0.03ms 488±8μs 0.15 stat_ops.FrameOps.time_op('mean', 'int', 0, True)
- 2.69±0.04ms 381±10μs 0.14 stat_ops.FrameOps.time_op('sum', 'int', 0, False)
- 2.70±0.02ms 375±5μs 0.14 stat_ops.FrameOps.time_op('sum', 'int', 0, True)
- 2.95±0.01ms 400±10μs 0.14 stat_ops.FrameOps.time_op('sum', 'int', 1, True)
- 2.94±0.02ms 383±8μs 0.13 stat_ops.FrameOps.time_op('sum', 'int', 1, False)
- 6.29±0.1ms 70.3±0.8μs 0.01 series_methods.Any.time_any(1000000, 'slow')
- 6.48±0.2ms 68.0±4μs 0.01 series_methods.All.time_all(1000000, 'slow')
- 6.50±0.2ms 37.2±0.9μs 0.01 series_methods.Any.time_any(1000000, 'fast')
- 6.37±0.07ms 35.5±1μs 0.01 series_methods.All.time_all(1000000, 'fast')