Bug in Fancy/Boolean Indexing with nested lists · Issue #2702 · pandas-dev/pandas (original) (raw)
Fancy or Boolean indexing on a Series has two strange behaviors. My examples only show the behavior with Fancy indexing, but it's the same for Boolean indexing.
LHS vs RHS length
>>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = range(27)
>>> list(s)
[0, 1, 2]
I would have expected an error, similar to what I get with slice indexing
>>> s = pd.Series(list('abc'))
>>> s[0:3] = range(27)
ValueError: cannot copy sequence with size 27 to array axis with dimension 3
An even odder behavior is when you have too few items in the RHS
>>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = range(2)
>>> list(s)
[0, 1, 0]
It seems to be using something like itertools.cycle which seems very arbitrary to me
Nested RHS
This may seem like a strange use of pandas, but I need to store Python lists
>>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = [[100,200], [300,400], [500,600]]
>>> list(s)
[100, 200, 300]
Very strange. It's like it flattens the input first.
But this flattening only happens if the nested levels are all the same size.
>>> s = pd.Series(list('abc'))
>>> s[[0,1,2]] = [[100,200], [300,400], [500,600, 601, 602]]
>>> list(s)
[[100,200], [300,400], [500,600, 601, 602]]
I know in numpy the array constructor would make a distinction between these two inputs, so maybe that's the reason for the difference, but I still don't see why ndarrays are being flattened.
I can work around the issue by converting the RHS to a 1-D array and passing that in.
>>> s = pd.Series(list('abc'))
>>> rhs = np.empty(3).astype('object')
>>> rhs[:] = [[100,200], [300,400], [500,600]]
>>> s[[0,1,2]] = rhs
>>> list(s)
[[100,200], [300,400], [500,600]]
Slice indexing doesn't have this problem at all
>>> s = pd.Series(list('abc'))
>>> s[0:3] = [[100,200], [300,400], [500,600]]
>>> list(s)
[[100,200], [300,400], [500,600]]
My Question: Are these behaviors a bug or a "feature"? I think Fancy/Boolean indexing should operate the same as slice indexing -- i.e. check for matching lengths and don't auto-convert to numpy array.