pd.merge_asof() matches out of tolerance when timestamps are duplicated · Issue #13709 · pandas-dev/pandas (original) (raw)
This is a continuation of #13695.
Starting with the original DataFrames from that issue:
df1 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.030']), 'username':['bob']})
df2 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.000', '2016-07-15 13:30:00.030']), 'version':[1, 2]})
I now get the null:
In [82]: pd.merge_asof(df1, df2, on='time', allow_exact_matches=False, tolerance=pd.Timedelta('10ms'))
Out[82]:
time username version
0 2016-07-15 13:30:00.030 bob NaN
However, if I change the first DataFrame to have duplicate timestamps:
df1 = pd.DataFrame({'time':pd.to_datetime(['2016-07-15 13:30:00.030', '2016-07-15 13:30:00.030']), 'username':['bob', 'charlie']})
then the bug reappears:
In [85]: pd.merge_asof(df1, df2, on='time', allow_exact_matches=False, tolerance=pd.Timedelta('10ms'))
Out[85]:
time username version
0 2016-07-15 13:30:00.030 bob 1
1 2016-07-15 13:30:00.030 charlie 1
This is in pandas version 0.18.0+418.gc46dcfa.