combinatorial explosion when merging dataframes · Issue #2690 · pandas-dev/pandas (original) (raw)

Hi Wes, Not sure if this is real, but opening to follow up on a comment in http://stackoverflow.com/questions/14199168/combinatorial-explosion-when-merging-dataframes-in-pandas. Here's a fabricated example to reproduce MergeError: Combinatorial explosion! (boom) in tools/merge.py:

df1 = pd.DataFrame(np.random.randn(1000, 7),
                   columns=list('ABCDEF') + ['G1'])
df2 = pd.DataFrame(np.random.randn(1000, 7),
                   columns=list('ABCDEF') + ['G2'])
df1.merge(df2) # Boom

The value of group_sizes at the time of exception is:

[2000, 2000, 2000, 2000, 2000, 2000]

The exception does not occur if the common columns are set as the index, though.

df1 = df1.set_index(list('ABCDEF'))
df2 = df2.set_index(list('ABCDEF'))
df1.merge(df2, left_index=True, right_index=True, how='outer') # OK