combinatorial explosion when merging dataframes · Issue #2690 · pandas-dev/pandas (original) (raw)
Hi Wes, Not sure if this is real, but opening to follow up on a comment in http://stackoverflow.com/questions/14199168/combinatorial-explosion-when-merging-dataframes-in-pandas. Here's a fabricated example to reproduce MergeError: Combinatorial explosion! (boom)
in tools/merge.py:
df1 = pd.DataFrame(np.random.randn(1000, 7),
columns=list('ABCDEF') + ['G1'])
df2 = pd.DataFrame(np.random.randn(1000, 7),
columns=list('ABCDEF') + ['G2'])
df1.merge(df2) # Boom
The value of group_sizes at the time of exception is:
[2000, 2000, 2000, 2000, 2000, 2000]
The exception does not occur if the common columns are set as the index, though.
df1 = df1.set_index(list('ABCDEF'))
df2 = df2.set_index(list('ABCDEF'))
df1.merge(df2, left_index=True, right_index=True, how='outer') # OK