pd.merge_asof() matches out of tolerance when timestamps are duplicated · Issue #13709 · pandas-dev/pandas (original) (raw)

This is a continuation of #13695.

Starting with the original DataFrames from that issue:

df1 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.030']), 'username':['bob']})
df2 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.000', '2016-07-15 13:30:00.030']), 'version':[1, 2]})

I now get the null:

In [82]: pd.merge_asof(df1, df2, on='time', allow_exact_matches=False, tolerance=pd.Timedelta('10ms'))
Out[82]:
                     time username  version
0 2016-07-15 13:30:00.030      bob      NaN

However, if I change the first DataFrame to have duplicate timestamps:

df1 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.030', '2016-07-15 13:30:00.030']), 'username':['bob', 'charlie']})

then the bug reappears:

In [85]: pd.merge_asof(df1, df2, on='time', allow_exact_matches=False, tolerance=pd.Timedelta('10ms'))
Out[85]:
                     time username  version
0 2016-07-15 13:30:00.030      bob        1
1 2016-07-15 13:30:00.030  charlie        1

This is in pandas version 0.18.0+418.gc46dcfa.