Fixed HDFSTore.groups() performance. by spott · Pull Request #21543 · pandas-dev/pandas (original) (raw)

Thanks.

I ran the io.hdf.* benchmarks and got the following:

       before           after         ratio
     [49188296]       [38b70fcf]
+     7.02±0.07ms         25.1±1ms     3.57  io.hdf.HDFStoreDataFrame.time_query_store_table
+      28.9±0.4ms           50.7ms     1.75  io.hdf.HDFStoreDataFrame.time_read_store_mixed
+      16.3±0.4ms           28.3ms     1.74  io.hdf.HDFStoreDataFrame.time_query_store_table_wide
+     25.5±0.01ms           38.0ms     1.49  io.hdf.HDFStoreDataFrame.time_read_store
+        68.1±2ms           98.1ms     1.44  io.hdf.HDFStoreDataFrame.time_read_store_table_mixed
+        41.8±1ms         57.5±7ms     1.37  io.hdf.HDFStoreDataFrame.time_write_store_mixed
-          22.5ms           19.5ms     0.86  io.hdf.HDFStoreDataFrame.time_read_store_table_wide
-          29.8ms         25.1±1ms     0.84  io.hdf.HDFStoreDataFrame.time_write_store
-          10.4μs       8.63±0.9μs     0.83  io.hdf.HDFStoreDataFrame.time_store_repr
-      6.04±0.1ms       4.61±0.2ms     0.76  io.hdf.HDFStoreDataFrame.time_store_info

Which is obviously an issue.

Because this doesn't make sense (some of those tests don't even appear to be touching the part of the code that I changed...), I ran it again:

       before           after         ratio
     [49188296]       [38b70fcf]
+      28.2±0.3ms         41.0±1ms     1.45  io.hdf.HDFStoreDataFrame.time_write_store_mixed
+         200±3ms            284ms     1.42  io.hdf.HDFStoreDataFrame.time_write_store_table_dc
+        44.5±2ms         60.7±2ms     1.36  io.hdf.HDFStorePanel.time_read_store_table_panel
+     5.84±0.03μs       7.64±0.3μs     1.31  io.hdf.HDFStoreDataFrame.time_store_repr
+        49.3±3ms           58.0ms     1.18  io.hdf.HDFStoreDataFrame.time_write_store_table
+      70.6±0.4ms         80.8±2ms     1.15  io.hdf.HDFStoreDataFrame.time_write_store_table_mixed
-      28.4±0.2ms       24.4±0.2ms     0.86  io.hdf.HDFStoreDataFrame.time_read_store
-         159±2ms         136±10ms     0.85  io.hdf.HDF.time_read_hdf('table')
-         152±4ms          128±5ms     0.84  io.hdf.HDF.time_write_hdf('table')
-      16.6±0.9ms       13.1±0.2ms     0.79  io.hdf.HDFStoreDataFrame.time_query_store_table_wide
-      5.10±0.4ms      3.04±0.08ms     0.60  io.hdf.HDFStoreDataFrame.time_store_info

And I get what looks to be a completely different set of tests that are different.

A third time gives me:

       before           after         ratio
     [49188296]       [38b70fcf]
+      27.6±0.1ms       36.1±0.7ms     1.31  io.hdf.HDFStoreDataFrame.time_read_store_mixed
+      19.1±0.3ms       22.5±0.7ms     1.18  io.hdf.HDFStoreDataFrame.time_read_store_table_wide
-      71.7±0.7ms       56.6±0.1ms     0.79  io.hdf.HDFStoreDataFrame.time_write_store_table_mixed
-     26.0±0.07ms       19.0±0.2ms     0.73  io.hdf.HDFStoreDataFrame.time_write_store
-        60.6±3ms         41.4±1ms     0.68  io.hdf.HDFStorePanel.time_read_store_table_panel
-        42.7±2ms       26.9±0.1ms     0.63  io.hdf.HDFStoreDataFrame.time_write_store_mixed

Which is at least a little better. I think this test is fairly sensitive to what else is going on on my computer. In order to get a more accurate result, I think I'll need to clean boot and run the tests without anything else running.