API/ERR/ENH: Allow MultiIndex.from_tuples to handle NaNs · Issue #23578 · pandas-dev/pandas (original) (raw)

This is the origin of #23558 and #23677, but IMO worthy to fix in its own right:

Trying to create a MultiIndex from a list of tuples containing NaNs (like Index.str.partition does if there are NaNs present) yields:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), np.nan, ('d', '', '')])
[...]
TypeError: object of type 'float' has no len()

However, it works easily when passing a tuple of NaNs

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,) * 3, ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

In fact, the length of the tuple is irrelevant:

>>> pd.MultiIndex.from_tuples([('a', 'b', 'c'), (np.nan,), ('d', '', '')])
MultiIndex(levels=[['a', 'd'], ['', 'b'], ['', 'c']],
           labels=[[0, -1, 1], [1, -1, 0], [1, -1, 0]])

Aside from the inconvenience, it is also a real problem to set elements of an array (say if there are several NaNs) to tuples, because they almost always get interpreted as another axis/list-like and then give creative errors.

I'm thinking this extra fail-safe should not be controversial, and I've got a fix prepared already, which is blocked by #23582 (at least in terms of adding tests).