Non-monotonic-increasing DatetimeIndex claims not to contain duplicate entries · Issue #9512 · pandas-dev/pandas (original) (raw)
This was fun to debug.
In [1]: import pandas as pd
In [2]: 0 in pd.Int64Index([0, 0, 1]) Out[2]: True
In [3]: 0 in pd.Int64Index([0, 1, 0]) Out[3]: True
In [4]: 0 in pd.Int64Index([0, 0, -1]) Out[4]: True
In [5]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, -1]) Out[5]: True
In [6]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, 0]) Out[6]: False # BAD
In [7]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, 1]) Out[7]: True
In [8]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, -1]) Out[8]: False # BAD
TimedeltaIndex is also broken.
The problem is in DatetimeIndexOpsMixin.__contains__, which checks the type of idx.get_loc(key)
to determine whether the key was found in the index. If the index contains duplicate entries and is not monotonic increasing (for some reason, monotonic decreasing doesn't cut it), get_loc
eventually falls back to Int64Engine._maybe_get_bool_indexer, which returns an ndarray of bools if the key is duplicated. Since the original __contains__
method is looking for scalars or slices, it reports that the duplicated entry is not present.