BUG: has_duplicates misbehaves when multiindex has a NaN · Issue #5873 · pandas-dev/pandas (original) (raw)

When (at least) one element in a MultiIndex contains a NaN, has_duplicates starts to behave strangely:

idx = pd.MultiIndex.from_arrays([[101, 102], [3.5, np.nan]]) idx MultiIndex [(101, 3.5), (102, nan)] idx.has_duplicates True idx.get_duplicates() []

I would expect has_duplicates to return False here, because 102 is not the same as 101.

I would also expect it to return false for the MultiIndex

MultiIndex [(101, 3.5), (101, nan)]

since 3.5 != NaN, but this case is more debatable.

This is important because you can't call .unstack() on a series with a MultiIndex for which has_duplicates is True, even if the MultiIndex is of high dimension and the dimensions containing the NaN(s) are not involved in the operation.

This is with pandas 0.12.0