DataFrameGroupby.boxplot fails when subplots=False by charlesdong1991 · Pull Request #28102 · pandas-dev/pandas (original) (raw)

hey @jreback @WillAyd @simonjayhawkins @TomAugspurger

sorry for very late update here. I just took a look at the original issue and my past commits together with the relevant codebase, I feel it was quite inefficient and confusing. So here, I flipped most of the changes and I think now the new solution looks much better.

So the issue was with a df like below, it will fail doing df.groupby.boxplot(subplots=False, column='v'):

df = pd.DataFrame({'cat':np.random.choice(list('abcde'), 100), 'v':np.random.rand(100), 'v1':np.random.rand(100)}) df.groupby('cat').boxplot(subplots=False, column='v')

and it is because the data for plotting after df.groupby('cat') has changed to MI, so v no longer exists in the transformed data.

Therefore, the new solution for this is quite simple, I couple the keys of groupby values (in this case, [a, b, c, d, e]) with the column value sellers assign to column argument (in this case, v), so we have [(a, v), (b,v), (c, v), (d, v), (e, v)] and assign them to boxplot function, and then boxplot function will look for subset based on this new column values, instead of v in the original df which is used by sellers.

How does it sound now? Feedbacks and reviews are very welcomed!

~~Regarding the error, It seems irrelevant to this PR.~~
EDIT:
CI failure is gone now after commiting one more time