BUG: Index.difference not always returning a unique set of values by lukemanley · Pull Request #55113 · pandas-dev/pandas (original) (raw)

Index.difference has two short-circuit paths which currently differ in behavior from the main path in that they do not return a unique set of values when duplicates exist:

In [1]: import pandas as pd

In [2]: idx = pd.Index([1, 1])

In [3]: idx.difference([2])
Out[3]: Index([1], dtype='int64')      <- main path: unique

In [4]: idx.difference([])
Out[4]: Index([1, 1], dtype='int64')   <- short circuit path when right is empty: non-unique

In [5]: idx.difference([True])
Out[5]: Index([1, 1], dtype='int64')   <- short circuit path when right is considered "non-comparable": non-unique