BUG/PERF: merge_asof raising TypeError for various "by" column dtypes by lukemanley · Pull Request #55678 · pandas-dev/pandas (original) (raw)

merge_asof currently raises if the dtype of by is something other than int64, uint64, or object. This PR removes that limitation.

| Change   | Before [e48df1cf] <main>   | After [a53a5533] <merge-asof-by-dtypes>   |   Ratio | Benchmark (Parameter)                                |
|----------|----------------------------|-------------------------------------------|---------|------------------------------------------------------|
| -        | 299±30ms                   | 199±20ms                                  |    0.67 | join_merge.MergeAsof.time_multiby('backward', 5)     |
| -        | 311±30ms                   | 202±20ms                                  |    0.65 | join_merge.MergeAsof.time_multiby('backward', None)  |
| -        | 292±20ms                   | 158±20ms                                  |    0.54 | join_merge.MergeAsof.time_by_object('forward', None) |
| -        | 302±30ms                   | 157±10ms                                  |    0.52 | join_merge.MergeAsof.time_by_object('forward', 5)    |
| -        | 411±10ms                   | 200±5ms                                   |    0.49 | join_merge.MergeAsof.time_multiby('forward', 5)      |
| -        | 420±30ms                   | 202±20ms                                  |    0.48 | join_merge.MergeAsof.time_by_object('nearest', None) |
| -        | 457±20ms                   | 215±8ms                                   |    0.47 | join_merge.MergeAsof.time_multiby('forward', None)   |
| -        | 515±10ms                   | 241±6ms                                   |    0.47 | join_merge.MergeAsof.time_multiby('nearest', 5)      |
| -        | 519±10ms                   | 242±4ms                                   |    0.47 | join_merge.MergeAsof.time_multiby('nearest', None)   |
| -        | 419±40ms                   | 185±20ms                                  |    0.44 | join_merge.MergeAsof.time_by_object('nearest', 5)    |
| -        | 159±20ms                   | 64.5±8ms                                  |    0.41 | join_merge.MergeAsof.time_by_int('backward', 5)      |
| -        | 154±10ms                   | 63.5±6ms                                  |    0.41 | join_merge.MergeAsof.time_by_int('backward', None)   |
| -        | 437±20ms                   | 123±20ms                                  |    0.28 | join_merge.MergeAsof.time_by_int('nearest', 5)       |
| -        | 328±30ms                   | 83.9±3ms                                  |    0.26 | join_merge.MergeAsof.time_by_int('forward', None)    |
| -        | 470±30ms                   | 124±10ms                                  |    0.26 | join_merge.MergeAsof.time_by_int('nearest', None)    |
| -        | 324±30ms                   | 81.7±3ms                                  |    0.25 | join_merge.MergeAsof.time_by_int('forward', 5)       |