xs is filling nan in index with its last item, as if sorted ascending, in the resulting index · Issue #6574 · pandas-dev/pandas (original) (raw)
Illustration:
acc = [ ('a','abcde',1), ('b','bbcde',2), ('y','yzcde',25), ('z','xbcde',24), ('z',None,26), ('z','zbcde',25), ('z','ybcde',26), ] df1 = pd.DataFrame(acc, columns=['a1','a2','cnt']).set_index(['a1','a2'])
In [476]: df1
Out[476]:
cnt
a1 a2
a abcde 1
b bbcde 2
y yzcde 25
z xbcde 24
NaN 26
zbcde 25
ybcde 26
[7 rows x 1 columns]
In [477]: df1.xs('z',level='a1')
Out[477]:
cnt
a2
xbcde 24
zbcde 26
zbcde 25
ybcde 26
[4 rows x 1 columns]
I was expecting:
cnt
a2
xbcde 24
NaN 26
zbcde 25
ybcde 26
because I thought it would preserve the index of df1.
Sorting explicitly doesn't seem to affect the result:
In [478]: df1.sort('cnt',ascending=False)
Out[478]:
cnt
a1 a2
z ybcde 26
NaN 26
zbcde 25
y yzcde 25
z xbcde 24
b bbcde 2
a abcde 1
[7 rows x 1 columns]
In [479]: df1.sort('cnt',ascending=False).xs('z',level='a1')
Out[479]:
cnt
a2
ybcde 26
zbcde 26
zbcde 25
xbcde 24
[4 rows x 1 columns]
It might be related to forward filling, but then I think it would be:
cnt
a2
ybcde 26
ybcde 26
zbcde 25
xbcde 24
which still isn't what I was expecting.
Versions and dependencies:
In [480]: pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 12.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
pandas: 0.13.1
Cython: 0.17.2
numpy: 1.6.2
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.1.0
sphinx: 1.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2012h
bottleneck: None
tables: 3.0.0
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.8.0
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None