MultiIndex.dropna() does not always drop NANs · Issue #19387 · pandas-dev/pandas (original) (raw)
A MultiIndex "label" of -1 means NAN (though I have found no documentation of this). For example:
pd.MultiIndex.from_arrays([['a', 'a'], ['x', np.nan]]) MultiIndex(levels=[['a'], ['x']], labels=[[0, 0], [0, -1]])
A MultiIndex can also be constructed with NAN values in levels:
pd.MultiIndex(levels=[['a'], ['x', np.nan]], labels=[[0, 0], [0, 1]]) MultiIndex(levels=[['a'], ['x', nan]], labels=[[0, 0], [0, 1]])
MultiIndex.dropna()
works for the first case, but does nothing for the second:
pd.MultiIndex(levels=[['a'], ['x', np.nan]], labels=[[0,0], [0,1]]).dropna() MultiIndex(levels=[['a'], ['x', nan]], labels=[[0, 0], [0, 1]])
It appears that MultiIndex.dropna()
only drops rows whose label is -1, but not rows whose level is actually NAN. It should drop both types of rows, so the result should be:
MultiIndex(levels=[['a'], ['x']], labels=[[0], [0]])
I am using Pandas 0.20.3, NumPy 1.13.1, and Python 3.5.