Fixed HDFSTore.groups() performance. by spott · Pull Request #21543 · pandas-dev/pandas (original) (raw)
Thanks.
I ran the io.hdf.*
benchmarks and got the following:
before after ratio
[49188296] [38b70fcf]
+ 7.02±0.07ms 25.1±1ms 3.57 io.hdf.HDFStoreDataFrame.time_query_store_table
+ 28.9±0.4ms 50.7ms 1.75 io.hdf.HDFStoreDataFrame.time_read_store_mixed
+ 16.3±0.4ms 28.3ms 1.74 io.hdf.HDFStoreDataFrame.time_query_store_table_wide
+ 25.5±0.01ms 38.0ms 1.49 io.hdf.HDFStoreDataFrame.time_read_store
+ 68.1±2ms 98.1ms 1.44 io.hdf.HDFStoreDataFrame.time_read_store_table_mixed
+ 41.8±1ms 57.5±7ms 1.37 io.hdf.HDFStoreDataFrame.time_write_store_mixed
- 22.5ms 19.5ms 0.86 io.hdf.HDFStoreDataFrame.time_read_store_table_wide
- 29.8ms 25.1±1ms 0.84 io.hdf.HDFStoreDataFrame.time_write_store
- 10.4μs 8.63±0.9μs 0.83 io.hdf.HDFStoreDataFrame.time_store_repr
- 6.04±0.1ms 4.61±0.2ms 0.76 io.hdf.HDFStoreDataFrame.time_store_info
Which is obviously an issue.
Because this doesn't make sense (some of those tests don't even appear to be touching the part of the code that I changed...), I ran it again:
before after ratio
[49188296] [38b70fcf]
+ 28.2±0.3ms 41.0±1ms 1.45 io.hdf.HDFStoreDataFrame.time_write_store_mixed
+ 200±3ms 284ms 1.42 io.hdf.HDFStoreDataFrame.time_write_store_table_dc
+ 44.5±2ms 60.7±2ms 1.36 io.hdf.HDFStorePanel.time_read_store_table_panel
+ 5.84±0.03μs 7.64±0.3μs 1.31 io.hdf.HDFStoreDataFrame.time_store_repr
+ 49.3±3ms 58.0ms 1.18 io.hdf.HDFStoreDataFrame.time_write_store_table
+ 70.6±0.4ms 80.8±2ms 1.15 io.hdf.HDFStoreDataFrame.time_write_store_table_mixed
- 28.4±0.2ms 24.4±0.2ms 0.86 io.hdf.HDFStoreDataFrame.time_read_store
- 159±2ms 136±10ms 0.85 io.hdf.HDF.time_read_hdf('table')
- 152±4ms 128±5ms 0.84 io.hdf.HDF.time_write_hdf('table')
- 16.6±0.9ms 13.1±0.2ms 0.79 io.hdf.HDFStoreDataFrame.time_query_store_table_wide
- 5.10±0.4ms 3.04±0.08ms 0.60 io.hdf.HDFStoreDataFrame.time_store_info
And I get what looks to be a completely different set of tests that are different.
A third time gives me:
before after ratio
[49188296] [38b70fcf]
+ 27.6±0.1ms 36.1±0.7ms 1.31 io.hdf.HDFStoreDataFrame.time_read_store_mixed
+ 19.1±0.3ms 22.5±0.7ms 1.18 io.hdf.HDFStoreDataFrame.time_read_store_table_wide
- 71.7±0.7ms 56.6±0.1ms 0.79 io.hdf.HDFStoreDataFrame.time_write_store_table_mixed
- 26.0±0.07ms 19.0±0.2ms 0.73 io.hdf.HDFStoreDataFrame.time_write_store
- 60.6±3ms 41.4±1ms 0.68 io.hdf.HDFStorePanel.time_read_store_table_panel
- 42.7±2ms 26.9±0.1ms 0.63 io.hdf.HDFStoreDataFrame.time_write_store_mixed
Which is at least a little better. I think this test is fairly sensitive to what else is going on on my computer. In order to get a more accurate result, I think I'll need to clean boot and run the tests without anything else running.