ENH: Faster merge_asof() through a single-pass algo · Issue #13902 · pandas-dev/pandas (original) (raw)

Out of curiosity, I took a crack at a single-pass merge_asof(). My sample passes the existing regression tests but is "wrong" in that it works only for a single object-type "by" parameter. I use PyObjectHashTable while scanning through the right DataFrame to cache the most recently found row for each "by" object.

I could add a little type differentiation if there is interest. I see that Tempita is getting some use in pandas. The main question is whether I can use multiple columns in the "by" parameter, which would be useful for matching things like ['ticker', 'exchange']. Still investigating.

$ asv continuous master -b "join_merge.merge_asof_*"
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing into conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[  0.00%] · For pandas commit hash c4302949:
[  0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[  0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running join_merge.merge_asof_by.time_merge_asof_by                                               41.07ms
[ 50.00%] ··· Running join_merge.merge_asof_noby.time_merge_asof_noby                                           12.90ms
[ 50.00%] · For pandas commit hash 97de42ab:
[ 50.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 50.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· Running join_merge.merge_asof_by.time_merge_asof_by                                              608.08ms
[100.00%] ··· Running join_merge.merge_asof_noby.time_merge_asof_noby                                           81.03ms
   before     after       ratio
  [97de42ab] [c4302949]
-   81.03ms    12.90ms      0.16  join_merge.merge_asof_noby.time_merge_asof_noby
-  608.08ms    41.07ms      0.07  join_merge.merge_asof_by.time_merge_asof_by

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.