BUG: For Multi-Index, joining same level names in opposite order results in infinite recursion · Issue #28956 · pandas-dev/pandas (original) (raw)
Code Sample
import pandas as pd idx = pd.IndexSlice import numpy as np import sys import traceback trace_limit = 10
Make the original index order.
ab_ind = pd.MultiIndex.from_product( [['i', 'ii', 'iii'], [1, 2, 3]], names = ['A', 'B'])
Make the dataframe with levels in order ['A', 'B']
ab_df = pd.DataFrame( np.arange(len(ab_ind)), index = ab_ind) print(ab_df)
Now swap the order of the levels to ['B', 'A']
ba_df = ab_df.swaplevel('A', 'B') print(ba_df)
Now try adding them together.
try: sum_df = ab_df + ba_df print(sum_df) except RecursionError: print('Recursion Error! Here is {limit} levels of the stack trace:' .format(limit = trace_limit)) print( traceback.print_tb( sys.exc_info()[2], limit = trace_limit))
This gives the following output:
val
A B
a1 b1 0
b2 1
b3 2
a2 b1 3
b2 4
b3 5
val
B A
b1 a1 0
b2 a1 1
b3 a1 2
b1 a2 3
b2 a2 4
b3 a2 5
Recursion Error! Here is 10 levels of the stack trace:
File "recursion_error.py", line 27, in <module>
sum_df = ab_df + ba_df
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\ops\__init__.py", line 1493, in f
return self._combine_frame(other, pass_op, fill_value, level)
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 5359, in _combine_frame
this, other = self.align(other, join="outer", level=level, copy=False)
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 3939, in align
broadcast_axis=broadcast_axis,
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 8811, in align
fill_axis=fill_axis,
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 8850, in _align_frame
other.index, how=join, level=level, return_indexers=True
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3522, in join
return self._join_multi(other, how=how, return_indexers=return_indexers)
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3643, in _join_multi
other_jnlevels, how, return_indexers=True
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3522, in join
return self._join_multi(other, how=how, return_indexers=return_indexers)
File "C:\Users\Matthew\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 3643, in _join_multi
other_jnlevels, how, return_indexers=True
None
Problem description
Switching the order of the levels of the Multi-Index results in infinite recursion if we use any operation that tries to join the original order of levels with the new order of levels. The desired result of ab_df + ba_df
should just be the same as 2 * ab_df
.
Expected Output
I would expect the same as 2 * ab_df
where we choose an order of levels coming from the first operand ab_df
in ab_df + ba_df
. So we should get the dataframe:
val
A B
a1 b1 0
b2 2
b3 3
a2 b1 6
b2 8
b3 10
The actual order isn't as important as not resulting in an infinite recursion.
Possible Cause of Bug
In pandas.core.indexes.base.join()
, there is first a check to see of self.names == other.names
. This is false if the names are the same but in the opposite order.
Then in pandas.core.indexes.base._join_multi()
, there is a call on pandas.core.indexes.base.join()
on the set intersection of the level names. So this treats the levels the same even if they are in the opposite order.
For the same set of names but in the wrong order, the recursion keeps alternating between pandas.core.indexes.base.join()
and pandas.core.indexes.base._join_multi()
.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2.1
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.6
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None