BUG: Index.difference not always returning a unique set of values by lukemanley · Pull Request #55113 · pandas-dev/pandas (original) (raw)
Index.difference
has two short-circuit paths which currently differ in behavior from the main path in that they do not return a unique set of values when duplicates exist:
In [1]: import pandas as pd
In [2]: idx = pd.Index([1, 1])
In [3]: idx.difference([2])
Out[3]: Index([1], dtype='int64') <- main path: unique
In [4]: idx.difference([])
Out[4]: Index([1, 1], dtype='int64') <- short circuit path when right is empty: non-unique
In [5]: idx.difference([True])
Out[5]: Index([1, 1], dtype='int64') <- short circuit path when right is considered "non-comparable": non-unique