xs is filling nan in index with its last item, as if sorted ascending, in the resulting index · Issue #6574 · pandas-dev/pandas (original) (raw)

Illustration:

acc = [ ('a','abcde',1), ('b','bbcde',2), ('y','yzcde',25), ('z','xbcde',24), ('z',None,26), ('z','zbcde',25), ('z','ybcde',26), ] df1 = pd.DataFrame(acc, columns=['a1','a2','cnt']).set_index(['a1','a2'])

In [476]: df1
Out[476]: 
          cnt
a1 a2        
a  abcde    1
b  bbcde    2
y  yzcde   25
z  xbcde   24
   NaN     26
   zbcde   25
   ybcde   26

[7 rows x 1 columns]

In [477]: df1.xs('z',level='a1')
Out[477]: 
       cnt
a2        
xbcde   24
zbcde   26
zbcde   25
ybcde   26

[4 rows x 1 columns]

I was expecting:

       cnt
a2        
xbcde   24
NaN     26
zbcde   25
ybcde   26

because I thought it would preserve the index of df1.

Sorting explicitly doesn't seem to affect the result:

In [478]: df1.sort('cnt',ascending=False)
Out[478]: 
          cnt
a1 a2        
z  ybcde   26
   NaN     26
   zbcde   25
y  yzcde   25
z  xbcde   24
b  bbcde    2
a  abcde    1

[7 rows x 1 columns]

In [479]: df1.sort('cnt',ascending=False).xs('z',level='a1')
Out[479]: 
       cnt
a2        
ybcde   26
zbcde   26
zbcde   25
xbcde   24

[4 rows x 1 columns]

It might be related to forward filling, but then I think it would be:

       cnt
a2        
ybcde   26
ybcde   26
zbcde   25
xbcde   24

which still isn't what I was expecting.

Versions and dependencies:

In [480]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 12.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8

pandas: 0.13.1
Cython: 0.17.2
numpy: 1.6.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.1.0
sphinx: 1.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2012h
bottleneck: None
tables: 3.0.0
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.8.0
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None