Series with MultiIndex gives surprising iloc results · Issue #17148 · pandas-dev/pandas (original) (raw)

Several puzzling behaviors appear below wherein calling Series.iloc[int] for a Series with a MultiIndex gives confusing results. Namely, I see unexpected indices mutated. Unexpected indices read and then added to others. As well as more than one index being mutated. Contrast all of the behaviors below with indexing a vanilla numpy array where none of the mystery occurs. Full recreates below along with why I find them surprising. I've seen similar behavior back in the Pandas 0.19.x series.

Code Sample 1

The length-2 case with duplicated keys at the first level.

In [1]: import pandas as pd

In [2]: z = pd.DataFrame() ...: z['i'] = [1, 1] ...: z['j'] = [22, 33] ...: z['k'] = [555, 666] ...: z.set_index(['i', 'j'], inplace=True) ...: v = z.k ...:

In [3]: v Out[3]: i j 1 22 555 33 666 Name: k, dtype: int64

In [4]: v.values Out[4]: array([555, 666])

In [5]: v.iloc[0] += 200; v Out[5]: i j 1 22 755 33 666 Name: k, dtype: int64

In [6]: v.iloc[-1] += 300; v Out[6]: i j 1 22 755 33 966 Name: k, dtype: int64

In [7]: v.iloc[1] += 100; v Out[7]: i j 1 22 1066 33 1066 Name: k, dtype: int64

Code Sample 1 Problem / Expected Result

Final statement incremented key (1, 33) then wrote the result to be (1, 22) and (1, 33)?
Expected (1, 22) to remain 755 and (1, 33) to become 1066.

Code Sample 2

The length-2 case with unique keys at each level.

In [1]: import pandas as pd

In [2]: z = pd.DataFrame() ...: z['i'] = [1, 2] ...: z['j'] = [3, 4] ...: z['k'] = [7, 8] ...: z.set_index(['i', 'j'], inplace=True) ...: v = z.k ...:

In [3]: v Out[3]: i j 1 3 7 2 4 8 Name: k, dtype: int64

In [4]: v.values Out[4]: array([7, 8])

In [5]: v.iloc[0] += 10; v Out[5]: i j 1 3 17 2 4 8 Name: k, dtype: int64

In [6]: v.iloc[-1] += 10; v Out[6]: i j 1 3 17 2 4 18 Name: k, dtype: int64

In [7]: v.iloc[1] += 1000; v Out[7]: i j 1 3 1018 2 4 18 Name: k, dtype: int64

Code Sample 2 Problem / Expected Result

Above iloc[0] incremented key (1, 3) but then iloc[1] does?
Do not expect 2 different iloc indices differing by 1 to point to the same location in a length 2 array.

Code Sample 3

The length-3 case.

In [1]: import pandas as pd

In [2]: x = pd.DataFrame() ...: x['i'] = [1, 2, 3] ...: x['j'] = [11, 22, 33] ...: x['k'] = [4, 5, 6] ...: x.set_index(['i', 'j'], inplace=True) ...: v = x.k ...:

In [3]: v Out[3]: i j 1 11 4 2 22 5 3 33 6 Name: k, dtype: int64

In [4]: v.values Out[4]: array([4, 5, 6])

In [5]: v.iloc[0] += 4; v Out[5]: i j 1 11 8 2 22 5 3 33 6 Name: k, dtype: int64

In [6]: # Expected (3, 33) to be incremented but instead (2, 22) is? ...: v.iloc[2] += 7; v ...: Out[6]: i j 1 11 8 2 22 13 3 33 6 Name: k, dtype: int64

In [7]: # Expected (2, 22) to be incremented but instead (1, 11) is set to (2, 22) + increment? ...: v.iloc[1] += 10; v ...: Out[7]: i j 1 11 23 2 22 13 3 33 6 Name: k, dtype: int64

Code Sample 3 Problem / Expected Result

Comments inline above

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35-pv-ts2 machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.utf8 LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None