BUG: has_duplicates misbehaves when multiindex has a NaN · Issue #5873 · pandas-dev/pandas (original) (raw)
When (at least) one element in a MultiIndex contains a NaN, has_duplicates starts to behave strangely:
idx = pd.MultiIndex.from_arrays([[101, 102], [3.5, np.nan]]) idx MultiIndex [(101, 3.5), (102, nan)] idx.has_duplicates True idx.get_duplicates() []
I would expect has_duplicates to return False here, because 102 is not the same as 101.
I would also expect it to return false for the MultiIndex
MultiIndex [(101, 3.5), (101, nan)]
since 3.5 != NaN, but this case is more debatable.
This is important because you can't call .unstack() on a series with a MultiIndex for which has_duplicates is True, even if the MultiIndex is of high dimension and the dimensions containing the NaN(s) are not involved in the operation.
This is with pandas 0.12.0