DataFrameGroupby.boxplot fails when subplots=False by charlesdong1991 · Pull Request #28102 · pandas-dev/pandas (original) (raw)
hey @jreback @WillAyd @simonjayhawkins @TomAugspurger
sorry for very late update here. I just took a look at the original issue and my past commits together with the relevant codebase, I feel it was quite inefficient and confusing. So here, I flipped most of the changes and I think now the new solution looks much better.
So the issue was with a df like below, it will fail doing df.groupby.boxplot(subplots=False, column='v')
:
df = pd.DataFrame({'cat':np.random.choice(list('abcde'), 100), 'v':np.random.rand(100), 'v1':np.random.rand(100)}) df.groupby('cat').boxplot(subplots=False, column='v')
and it is because the data for plotting after df.groupby('cat')
has changed to MI, so v
no longer exists in the transformed data.
Therefore, the new solution for this is quite simple, I couple the keys of groupby values (in this case, [a, b, c, d, e]) with the column value sellers assign to column
argument (in this case, v
), so we have [(a, v), (b,v), (c, v), (d, v), (e, v)]
and assign them to boxplot
function, and then boxplot
function will look for subset based on this new column
values, instead of v
in the original df which is used by sellers.
How does it sound now? Feedbacks and reviews are very welcomed!
Regarding the error, It seems irrelevant to this PR.
EDIT:
CI failure is gone now after commiting one more time