ENH: Faster merge_asof() through a single-pass algo · Issue #13902 · pandas-dev/pandas (original) (raw)
Out of curiosity, I took a crack at a single-pass merge_asof()
. My sample passes the existing regression tests but is "wrong" in that it works only for a single object-type "by" parameter. I use PyObjectHashTable
while scanning through the right DataFrame to cache the most recently found row for each "by" object.
I could add a little type differentiation if there is interest. I see that Tempita is getting some use in pandas. The main question is whether I can use multiple columns in the "by" parameter, which would be useful for matching things like ['ticker', 'exchange']
. Still investigating.
$ asv continuous master -b "join_merge.merge_asof_*"
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
·· Installing into conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[ 0.00%] · For pandas commit hash c4302949:
[ 0.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 0.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running join_merge.merge_asof_by.time_merge_asof_by 41.07ms
[ 50.00%] ··· Running join_merge.merge_asof_noby.time_merge_asof_noby 12.90ms
[ 50.00%] · For pandas commit hash 97de42ab:
[ 50.00%] ·· Building for conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 50.00%] ·· Benchmarking conda-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· Running join_merge.merge_asof_by.time_merge_asof_by 608.08ms
[100.00%] ··· Running join_merge.merge_asof_noby.time_merge_asof_noby 81.03ms
before after ratio
[97de42ab] [c4302949]
- 81.03ms 12.90ms 0.16 join_merge.merge_asof_noby.time_merge_asof_noby
- 608.08ms 41.07ms 0.07 join_merge.merge_asof_by.time_merge_asof_by
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.