pandas (original) (raw)

Fancy or Boolean indexing on a Series has two strange behaviors. My examples only show the behavior with Fancy indexing, but it's the same for Boolean indexing.

LHS vs RHS length

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = range(27)
    >>> list(s)
    [0, 1, 2]

I would have expected an error, similar to what I get with slice indexing

    >>> s = pd.Series(list('abc'))
    >>> s[0:3] = range(27)
    ValueError: cannot copy sequence with size 27 to array axis with dimension 3

An even odder behavior is when you have too few items in the RHS

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = range(2)
    >>> list(s)
    [0, 1, 0]

It seems to be using something like itertools.cycle which seems very arbitrary to me

Nested RHS

This may seem like a strange use of pandas, but I need to store Python lists

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = [[100,200], [300,400], [500,600]]
    >>> list(s)
    [100, 200, 300]

Very strange. It's like it flattens the input first.
But this flattening only happens if the nested levels are all the same size.

    >>> s = pd.Series(list('abc'))
    >>> s[[0,1,2]] = [[100,200], [300,400], [500,600, 601, 602]]
    >>> list(s)
    [[100,200], [300,400], [500,600, 601, 602]]

I know in numpy the array constructor would make a distinction between these two inputs, so maybe that's the reason for the difference, but I still don't see why ndarrays are being flattened.

I can work around the issue by converting the RHS to a 1-D array and passing that in.

    >>> s = pd.Series(list('abc'))
    >>> rhs = np.empty(3).astype('object')
    >>> rhs[:] = [[100,200], [300,400], [500,600]]
    >>> s[[0,1,2]] = rhs
    >>> list(s)
    [[100,200], [300,400], [500,600]]

Slice indexing doesn't have this problem at all

    >>> s = pd.Series(list('abc'))
    >>> s[0:3] = [[100,200], [300,400], [500,600]]
    >>> list(s)
    [[100,200], [300,400], [500,600]]

My Question: Are these behaviors a bug or a "feature"? I think Fancy/Boolean indexing should operate the same as slice indexing -- i.e. check for matching lengths and don't auto-convert to numpy array.