BUG: Subtracting two series with unordered index and all-nan index produces unexpected result · Issue #38439 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd import numpy as np a_index = pd.MultiIndex.from_tuples([ ... (81.0, np.nan, '2018-06-01'), ... (81.0, np.nan, '2018-07-01'), ... (82.0, np.nan, '2018-07-01'), ... (82.0, np.nan, '2018-08-01'), ... (np.nan,np.nan, np.nan)], ... names=['id', 'sub_ix', 'data'] ... ) a_values = [25, 22, 20, 21, np.nan] b_index = pd.MultiIndex.from_tuples([ ... (81.0, np.nan, '2018-06-01'), ... (np.nan, np.nan, np.nan), ... (81.0, np.nan, '2018-07-01'), ... (82.0, np.nan, '2018-07-01'), ... (82.0, np.nan, '2018-08-01')], ... names=['id', 'sub_ix', 'data'] ... ) b_values = [28.28, np.nan, 28.28, 25.25, 25.25] a = pd.Series(a_values, index=a_index) b = pd.Series(b_values, index=b_index)
a id sub_ix data
81.0 NaN 2018-06-01 25.0 2018-07-01 22.0 82.0 NaN 2018-07-01 20.0 2018-08-01 21.0 NaN NaN NaN NaN dtype: float64
b id sub_ix data
81.0 NaN 2018-06-01 28.28 NaN NaN NaN NaN 81.0 NaN 2018-07-01 28.28 82.0 NaN 2018-07-01 25.25 2018-08-01 25.25 dtype: float64
a - b id sub_ix data
81.0 NaN 2018-06-01 -3.28 2018-07-01 NaN <-- this shouldn't be NaN, the index (81.0, NaN, 2018-07-01) exists in botha
andb
(it's just not ordered inb
) 82.0 NaN 2018-07-01 -8.28 <-- also wrong, expected -5.25 2018-08-01 -4.25 NaN NaN NaN NaN dtype: float64 a - b.sort_index() id sub_ix data
81.0 NaN 2018-06-01 -3.28 2018-07-01 -6.28 <-- expected value 82.0 NaN 2018-07-01 -5.25 2018-08-01 -4.25 NaN NaN NaN NaN dtype: float64
Problem description
When combining two series with both the same index and with an all-nan index row at different positions, the result of the arithmetic operation (+
, -
, /
) is not as expected. The issue can be worked around by sorting both indices (series.sort_index). I tried a different example with unordered indices, but without the all-nan index row and the result is as expected (so it's not an issue of the unsorted indices).
id sub_ix data
81.0 NaN 2018-07-01 25
2018-06-01 22
82.0 NaN 2018-07-01 20
2018-08-01 21
dtype: int64
>>> b
id sub_ix data
81.0 NaN 2018-06-01 1
2018-07-01 2
82.0 NaN 2018-07-01 3
2018-08-01 4
dtype: int64
>>> a - b
id sub_ix data
81.0 NaN 2018-06-01 21
2018-07-01 23
82.0 NaN 2018-07-01 17
2018-08-01 17
dtype: int64
Expected Output
Operands should be aligned as per index (despite all nan-rows in the index).
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 65f0463
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.9.11-200.fc33.x86_64
Version : #1 SMP Tue Nov 24 18🔞01 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8
pandas : 1.2.0.dev0+1441.g65f0463d3
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.2
Cython : 0.29.21
pytest : 5.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : 1.3.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : 2.7.1
odfpy : None
openpyxl : 1.8.6
pandas_gbq : None
pyarrow : 1.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None