BUG: Fix groupby nth with axis=1 by johnzangwill · Pull Request #43926 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation19 Commits16 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

johnzangwill

Fixes a bug that no one has reported and probably no one cares about. But overlaps slightly with #42947, so I thought that I would make it a separate PR.

Current behaviour: df.groupby(df.iloc[1], axis=1).nth(0) crashes ungracefully.

@pep8speaks

Hello @johnzangwill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-13 07:59:55 UTC

@johnzangwill

jbrockmendel

out = out.reindex(result_index)
out = self._reindex_output(out)
else:
out.columns = result_index[ids[mask]]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we re-use any of the _wrap_foo_result methods?

jbrockmendel

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls add a whatsnew note for 1.4, groupby bug fix section

@@ -2545,18 +2545,25 @@ def nth(
# Drop NA values in grouping
mask = mask & (ids != -1)
out = self._selected_obj[mask]
if self.axis == 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let's make a method

self._selected_obj_for_axis(mask: np.ndarray[bool] | None)

as i think we use this in several places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let's make a method

self._selected_obj_for_axis(mask: np.ndarray[bool] | None)

as i think we use this in several places.

@jreback, I had already factored these out in my other branch but I have done it here too, and made them both _mask_selected_obj(mask)

@johnzangwill

also pls add a whatsnew note for 1.4, groupby bug fix section

Done

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, pls merge master and ping on green

@@ -503,6 +503,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`)
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`)
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`)
- Bug in :meth:`GroupBy.nth` failing on column groupings (``axis=1``)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say axis=1 (instead)

@johnzangwill

@johnzangwill

@johnzangwill

@jreback all green apart from Python Dev / actions-310-dev (macOS-latest) (pull_request) 60 minute timeout

@johnzangwill

@johnzangwill

@johnzangwill

small nit, pls merge master and ping on green

Still all green apart from another quite frequent ci testing failure:

>           raise AssertionError(f"Caused unexpected warning(s): {repr(extra_warnings)}")
E           AssertionError: Caused unexpected warning(s): [('ResourceWarning', ResourceWarning('unclosed file <_io.BufferedRandom name=13>'), '/home/runner/work/pandas/pandas/pandas/io/formats/excel.py', 179)]

Which I am sure is nothing to do with my code...

@johnzangwill

jreback

@@ -505,6 +505,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`)
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`)
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`)
- Bug in :meth:`GroupBy.nth` failing on ``axis=1``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the PR number here (as the issue number)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the PR number here (as the issue number)

Done

return self._mask_selected_obj(mask)
@final
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:
if self.axis == 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

Done

@johnzangwill

@johnzangwill

jreback

@jreback

jbrockmendel

return self._mask_selected_obj(mask)
@final
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npt.NDArray[bool]