BUG: CategoricalIndex allowed reindexing duplicate sources by batterseapower · Pull Request #28257 · pandas-dev/pandas (original) (raw)

Had to fix a couple of incidental Pandas bugs that were surfaced by the main fix.

  1. pd.Index([1, 1, 0, 2, 2]).union(pd.Index([0, 2, -1])) would fail with an error about duplicates in the index. We now return pd.Index([-1, ,0 1, 1, 2, 2]) (or pd.Index([1, 1, 0, 2, 2, -1]) if sort=False) which is more consistent with the behaviour of Index.intersection
  2. pd.Index(['A', 'B']).get_indexer_non_unique(pd.Index([0])) would fail, complaining with a TypeError about '<' not supported between instances of 'str' and 'int'. This is caused by over-aggressive use of searchsorted, solved by falling back to linear search