PERF: DataFrame.transpose with dt64tz by jbrockmendel · Pull Request #40149 · pandas-dev/pandas (original) (raw)
Does what it says on the tin: DatetimeBlock.values
is always DatetimeArray
, and dt64tzblock.shape == dt64tzblock.values
in all cases. Similarly TimedeltaBlock.values
is always TimedeltaArray
.
ASVs: run repeatedly (vs master from yesterday) with --record-samples --append-samples so im pretty confident these are stable (but still include some nonsense xref #40066)
before after ratio
[f4b67b5e] [65792836]
<master> <ref-hybrid-3>
+ 10.1±3ms 13.9±3ms 1.38 eval.Eval.time_add('python', 'all')
+ 2.06±0.02ms 2.40±0.06ms 1.16 hash_functions.NumericSeriesIndexingShuffled.time_loc_slice(<class 'pandas.core.indexes.numeric.Int64Index'>, 1000000)
+ 227±2μs 263±2μs 1.15 groupby.GroupByMethods.time_dtype_as_field('datetime', 'head', 'transformation')
+ 228±2μs 261±2μs 1.15 groupby.GroupByMethods.time_dtype_as_field('datetime', 'head', 'direct')
+ 238±2μs 272±2μs 1.14 groupby.GroupByMethods.time_dtype_as_field('datetime', 'tail', 'transformation')
+ 248±6μs 282±5μs 1.14 groupby.GroupByMethods.time_dtype_as_field('datetime', 'tail', 'direct')
+ 3.92±0.03ms 4.37±0.01ms 1.11 rolling.Engine.time_rolling_apply('DataFrame', 'float', <function Engine.<lambda> at 0x7fb1c0b40670>, 'cython', 'median')
+ 2.83±0.02ms 3.14±0.06ms 1.11 io.hdf.HDFStoreDataFrame.time_store_info
- 275±4μs 248±4μs 0.90 groupby.GroupByMethods.time_dtype_as_field('datetime', 'shift', 'direct')
- 1.41±0.05ms 1.27±0.01ms 0.90 stat_ops.FrameOps.time_op('sum', 'int', 1)
- 1.13±0.06ms 1.02±0.07ms 0.90 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.int64'>, 5.0, <built-in function ne>)
- 271±2μs 242±2μs 0.89 groupby.GroupByMethods.time_dtype_as_field('datetime', 'shift', 'transformation')
- 188±3μs 167±1μs 0.89 algos.isin.IsIn.time_isin_empty('datetime64[ns]')
- 192±2μs 170±2μs 0.89 algos.isin.IsIn.time_isin_mismatched_dtype('datetime64[ns]')
- 227±2μs 200±2μs 0.88 groupby.GroupByMethods.time_dtype_as_field('datetime', 'any', 'direct')
- 226±2μs 199±1μs 0.88 groupby.GroupByMethods.time_dtype_as_field('datetime', 'all', 'transformation')
- 227±2μs 199±1μs 0.88 groupby.GroupByMethods.time_dtype_as_field('datetime', 'any', 'transformation')
- 895±60μs 785±80μs 0.88 arithmetic.IntFrameWithScalar.time_frame_op_with_scalar(<class 'numpy.float64'>, 3.0, <built-in function ge>)
- 10.2±0.3ms 8.93±0.7ms 0.88 algos.isin.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 19, 'inside')
- 235±4μs 204±4μs 0.87 groupby.GroupByMethods.time_dtype_as_field('datetime', 'all', 'direct')
- 3.26±0.03μs 2.83±0.03μs 0.87 frame_methods.ToNumpy.time_to_numpy_tall
- 3.28±0.03μs 2.82±0.02μs 0.86 frame_methods.ToNumpy.time_to_numpy_wide
- 9.77±0.2ms 8.40±0.2ms 0.86 indexing.NumericSeriesIndexing.time_loc_slice(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
- 2.94±0.05μs 2.52±0.02μs 0.86 frame_methods.ToNumpy.time_values_tall
- 2.95±0.03μs 2.52±0.02μs 0.85 frame_methods.ToNumpy.time_values_wide
- 2.09±0.02ms 1.77±0.01ms 0.85 groupby.FillNA.time_df_ffill
- 2.09±0.02ms 1.77±0.01ms 0.85 groupby.FillNA.time_df_bfill
- 204±3μs 168±2μs 0.82 arithmetic.OffsetArrayArithmetic.time_add_series_offset(<Day>)
- 170±2μs 137±3μs 0.81 groupby.GroupByMethods.time_dtype_as_field('datetime', 'count', 'direct')
- 170±2μs 137±4μs 0.80 groupby.GroupByMethods.time_dtype_as_field('datetime', 'count', 'transformation')
- 29.1±3ms 22.9±0.4ms 0.79 algos.isin.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.uint64'>, 20, 'outside')
- 26.0±1ms 19.2±2ms 0.74 algos.isin.IsinAlmostFullWithRandomInt.time_isin(<class 'numpy.int64'>, 20, 'inside')
- 26.2±0.2ms 18.0±0.07ms 0.69 index_object.SetOperations.time_operation('date_string', 'symmetric_difference')
- 11.6±0.1ms 7.32±0.08ms 0.63 reshape.ReshapeExtensionDtype.time_stack('datetime64[ns, US/Pacific]')
- 40.2±0.5μs 25.0±0.3μs 0.62 ctors.SeriesDtypesConstructors.time_dtindex_from_index_with_series
- 3.77±0.03ms 2.08±0.03ms 0.55 reshape.ReshapeExtensionDtype.time_unstack_slow('datetime64[ns, US/Pacific]')
- 32.1±0.5μs 17.0±0.2μs 0.53 ctors.SeriesDtypesConstructors.time_dtindex_from_series
- 1.11±0.03ms 408±7μs 0.37 categoricals.Constructor.time_datetimes
- 14.1±0.1μs 1.26±0.02μs 0.09 attrs_caching.SeriesArrayAttribute.time_extract_array_numpy('datetime64')
- 13.7±0.1μs 1.04±0.03μs 0.08 attrs_caching.SeriesArrayAttribute.time_extract_array('datetime64')
- 13.0±0.2μs 455±10ns 0.04 attrs_caching.SeriesArrayAttribute.time_array('datetime64')
- 73.8±1ms 1.66±0.03ms 0.02 reshape.ReshapeExtensionDtype.time_unstack_fast('datetime64[ns, US/Pacific]')
- 64.3±0.9ms 258±2μs 0.00 reshape.ReshapeExtensionDtype.time_transpose('datetime64[ns, US/Pacific]')
IIRC the groupby.GroupByMethods.time_dtype_as_field were heavily influenced by constructor overhead, which motivated #40054. Still need to try out @jorisvandenbossche's suggestion of non-cython optimization there.