Non-monotonic-increasing DatetimeIndex claims not to contain duplicate entries · Issue #9512 · pandas-dev/pandas (original) (raw)

This was fun to debug.

In [1]: import pandas as pd

In [2]: 0 in pd.Int64Index([0, 0, 1]) Out[2]: True

In [3]: 0 in pd.Int64Index([0, 1, 0]) Out[3]: True

In [4]: 0 in pd.Int64Index([0, 0, -1]) Out[4]: True

In [5]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, -1]) Out[5]: True

In [6]: pd.Timestamp(0) in pd.DatetimeIndex([0, 1, 0]) Out[6]: False # BAD

In [7]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, 1]) Out[7]: True

In [8]: pd.Timestamp(0) in pd.DatetimeIndex([0, 0, -1]) Out[8]: False # BAD

TimedeltaIndex is also broken.

The problem is in DatetimeIndexOpsMixin.__contains__, which checks the type of idx.get_loc(key) to determine whether the key was found in the index. If the index contains duplicate entries and is not monotonic increasing (for some reason, monotonic decreasing doesn't cut it), get_loc eventually falls back to Int64Engine._maybe_get_bool_indexer, which returns an ndarray of bools if the key is duplicated. Since the original __contains__ method is looking for scalars or slices, it reports that the duplicated entry is not present.