API: what should a 2D indexing operation into a 1D Index do? (eg idx[:, None]) · Issue #27837 · pandas-dev/pandas (original) (raw)

Follow-up on #27775 and #27818.

Short recap of what those issues were about:

Currently, indexing into an Index with a 2D (or multiple D) indexer results in an "invalid" Index with an underlying ndarray:

In [1]: idx = pd.Index([1, 2, 3])  

In [2]: idx2 = idx[:, None] 

In [3]: idx2
Out[3]: Int64Index([1, 2, 3], dtype='int64')

In [4]: idx2.values
Out[4]: 
array([[1],
       [2],
       [3]])

So from the repr it looks like a proper index, but the underlying values of an Index should always be 1D (such an invalid index will also lead to errors once you do operations on them).

Before pandas 0.25.0, the shape attribute of the index "correctly" returned the shape of the underlying values: (3, 1), but in 0.25.0 this was changed to (3,) (only checking the length). This caused a regression matplotlib (#27775), and will be "fixed" in 0.25.1 returning again the 2D shape of the underlying values (#27818). Of course, this is only about the shape attribute, while the root cause is this invalid Index.

I think it is clear that we should not allow such invalid Index object to exist.
I currently know of two ways to end up such situation:

Passing a multidimensional array to the Index constructor (e.g. pd.Index(np.random.randn(5, 5, 5)). I think this is something we can deprecate and raise for later, and there is already an issue for this: BUG: Index constructor should not allow an ndarray with ndim > 2 #27125
Indexing into an Index (e.g. idx[:, None] ) -> this issue

So let's use this issue to discuss what to do for this second way: a 2D indexing operation on a 1D object.

This is relevant for the Index, but we should probably try to have it consistent with Series as well.