>> pd.MultiIndex.from_arrays([['a', 'a'], ['x', np.nan]]) MultiIndex(levels=[['a'], ['x']], labels...">

MultiIndex.dropna() does not always drop NANs · Issue #19387 · pandas-dev/pandas (original) (raw)

A MultiIndex "label" of -1 means NAN (though I have found no documentation of this). For example:

pd.MultiIndex.from_arrays([['a', 'a'], ['x', np.nan]]) MultiIndex(levels=[['a'], ['x']], labels=[[0, 0], [0, -1]])

A MultiIndex can also be constructed with NAN values in levels:

pd.MultiIndex(levels=[['a'], ['x', np.nan]], labels=[[0, 0], [0, 1]]) MultiIndex(levels=[['a'], ['x', nan]], labels=[[0, 0], [0, 1]])

MultiIndex.dropna() works for the first case, but does nothing for the second:

pd.MultiIndex(levels=[['a'], ['x', np.nan]], labels=[[0,0], [0,1]]).dropna() MultiIndex(levels=[['a'], ['x', nan]], labels=[[0, 0], [0, 1]])

It appears that MultiIndex.dropna() only drops rows whose label is -1, but not rows whose level is actually NAN. It should drop both types of rows, so the result should be:

MultiIndex(levels=[['a'], ['x']], labels=[[0], [0]])

I am using Pandas 0.20.3, NumPy 1.13.1, and Python 3.5.