BUG: Fix groupby nth with axis=1 by johnzangwill · Pull Request #43926 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation19 Commits16 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Fixes a bug that no one has reported and probably no one cares about. But overlaps slightly with #42947, so I thought that I would make it a separate PR.
Current behaviour: df.groupby(df.iloc[1], axis=1).nth(0)
crashes ungracefully.
Hello @johnzangwill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻
Comment last updated at 2021-10-13 07:59:55 UTC
out = out.reindex(result_index) |
---|
out = self._reindex_output(out) |
else: |
out.columns = result_index[ids[mask]] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we re-use any of the _wrap_foo_result methods?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls add a whatsnew note for 1.4, groupby bug fix section
@@ -2545,18 +2545,25 @@ def nth( |
---|
# Drop NA values in grouping |
mask = mask & (ids != -1) |
out = self._selected_obj[mask] |
if self.axis == 0: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually let's make a method
self._selected_obj_for_axis(mask: np.ndarray[bool] | None)
as i think we use this in several places.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually let's make a method
self._selected_obj_for_axis(mask: np.ndarray[bool] | None)
as i think we use this in several places.
@jreback, I had already factored these out in my other branch but I have done it here too, and made them both _mask_selected_obj(mask)
also pls add a whatsnew note for 1.4, groupby bug fix section
Done
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit, pls merge master and ping on green
@@ -503,6 +503,7 @@ Groupby/resample/rolling |
---|
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`) |
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`) |
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`) |
- Bug in :meth:`GroupBy.nth` failing on column groupings (``axis=1``) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say axis=1 (instead)
@jreback all green apart from Python Dev / actions-310-dev (macOS-latest) (pull_request)
60 minute timeout
small nit, pls merge master and ping on green
Still all green apart from another quite frequent ci testing failure:
> raise AssertionError(f"Caused unexpected warning(s): {repr(extra_warnings)}")
E AssertionError: Caused unexpected warning(s): [('ResourceWarning', ResourceWarning('unclosed file <_io.BufferedRandom name=13>'), '/home/runner/work/pandas/pandas/pandas/io/formats/excel.py', 179)]
Which I am sure is nothing to do with my code...
@@ -505,6 +505,7 @@ Groupby/resample/rolling |
---|
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`) |
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`) |
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`) |
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the PR number here (as the issue number)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the PR number here (as the issue number)
Done
return self._mask_selected_obj(mask) |
---|
@final |
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT: |
if self.axis == 0: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string here
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string here
Done
return self._mask_selected_obj(mask) |
---|
@final |
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npt.NDArray[bool]