MultiIndex.get_level_values() replaces NA by another value · Issue #5074 · pandas-dev/pandas (original) (raw)

Test case:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: index = pd.MultiIndex.from_arrays([
    ['a', 'b', 'b'],
    [1, np.nan, 2]
])

In [4]: index.get_level_values(1)
Out[4]: Float64Index([1.0, 2.0, 2.0], dtype=object)

The expected output is
Float64Index([1.0, nan, 2.0], dtype=object)

This happens because NA values are not stored in the MultiIndex levels and the corresponding label is set to -1. Then when labels are used as indexes to values in get_level_values() that -1 points to the last (not null) value.

I tried to fix this by appending a NA to the values if -1 is in levels.
https://github.com/goyodiaz/pandas/commit/f028513ad96a
It needs to be improved in order to return the proper NA value (NaN, None, maybe NaT?) depending on the index type. Does this approach makes sense?