API: disallow duplicate level names by toobaz · Pull Request #18882 · pandas-dev/pandas (original) (raw)

FYI, this is causing failures on dask due to operations like:

In [1]: import dask.dataframe as dd

In [2]: import pandas as pd

In [3]: pdf = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6, 7, 8, 9], ...: 'b': [4, 5, 6, 3, 2, 1, 0, 0, 0]}, ...: index=[0, 1, 3, 5, 6, 8, 9, 9, 9]).set_index("a") ...: ...:

In [4]: pdf.groupby(pdf.index).apply(lambda x: x.b) Out[4]: a a 1 1 4 2 2 5 3 3 6 4 4 3 5 5 2 6 6 1 7 7 0 8 8 0 9 9 0 Name: b, dtype: int64

essentially, grouping by an index and doing a .apply that returns a Series whose index is the same as the original (at least the same name). This seems like a fairly common use-case.

I think we should make this a warning for now and and raise later.