BUG: regression, is_unique is incorrect since pandas 2.1.0 · Issue #57911 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd print("pandas version", pd.version)
values = [1, 1, 2, 3, 4] index = pd.Index(values)
print("===========") print(index) print("is_unique=", index.is_unique)
filtered_index = index[2:].copy()
print("===========") print(filtered_index) print("is_unique=", filtered_index.is_unique)
index = pd.Index(values) filtered_index = index[2:].copy()
print("===========") print(filtered_index) print("is_unique=", filtered_index.is_unique)
Issue Description
Hello,
We found a regression, index.is_unique
is incorrect since pandas 2.1.0.
I looked for open issues but did not find any fix or existing discussion.
Having a look at the changelog, there were lots of changes in 2.1.0 to introduce copy-on-write optimizations on the index.
I think the issue could be related to that, my best guess, maybe index[2:]
cached something from the original index
that is no longer correct?
Attaching a simple repro, it's very easy to reproduce. :)
Thank you.
Expected Behavior
pandas version 1.5.3
===========
Int64Index([1, 1, 2, 3, 4], dtype='int64')
is_unique= False
===========
Int64Index([2, 3, 4], dtype='int64')
is_unique= True
===========
Int64Index([2, 3, 4], dtype='int64')
is_unique= True
pandas version 2.2.1
===========
Index([1, 1, 2, 3, 4], dtype='int64')
is_unique= False
===========
Index([2, 3, 4], dtype='int64')
is_unique= False # <---------------- INCORRECT
===========
Index([2, 3, 4], dtype='int64')
is_unique= True
Installed Versions
tested on:
- pandas 1.5.3: PASS
- pandas 2.0.0: PASS
- pandas 2.0.3: PASS
- pandas 2.1.0: INCORRECT
- pandas 2.1.4: INCORRECT
- pandas 2.2.1 (latest): INCORRECT