ERR automatic broadcast for merging different levels, #9455 by nbonnotte · Pull Request #12219 · pandas-dev/pandas (original) (raw)

@jreback I feel like none of us understand what the other is saying. I am not talking either about actually merging a single level to a single level of a multi-level.

What I am saying is: what are the drawbacks to letting the feature? From my (still naive) point of view, I would find it difficult to guess the result of the operation: here, the multi-level is "flattened" (not sure the term is correct), but it could have been possible to instead merge the single-level index to a single level of the multi-level, right?

So I'm not saying we should actually merge a single level to a multi-level, just that we could, and it would seem to me at least as natural as the current result... which therefore might appear as unpredictable. That's a drawback, for me.

Are there any other drawbacks? Why should it be an error?

As for a use case, what about #2024 ?

Otherwise, I have an example. I often use pandas to work on the features I'm going to feed to scikit-learn, and I have features from many different origins: for instance, for each day weather data (temperature, pressure, etc.), calendar data (is it a bank holiday? a school holiday?) and let's say the target value of the previous day. At some point, I thus have multiple multi-level dataframes on the one hand, e.g. with ('weather', 'temperature'), ('weather', 'pressure'), ... and ('calendar', 'is_bank_holiday'), ('calendar', 'is_school_holiday') ..., and single-level dataframes on the other hand, for instance containing 'yesterday_target'. And I want to merge all that, and give the result to scikit-learn. But I like multi-level dataframes because they make it so much easier to select a subset of the features, e.g. for plotting, or just to exclude some features, so I'd like to get one in the end (or a single-level with tuples, which can easily be converted).

Sure, I can think of some ways to arrive to the same result without merging a single-level dataframe to a multi-level one. But it is simpler if it's directly possible.

Otherwise, we can raise an error, and wait to see if anyone complains. You tell me.