ENH: add GroupBy.pipe method by topper-123 · Pull Request #17871 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation43 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- [x ] closes API: Add pipe method to GroupBy objects #10353, see also ENH: Add .pipe to GroupBy objects #17863
- [x ] tests added / passed
- [ x] passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- [ x] whatsnew entry
This PR adds a .pipe
method to GroupBy objects like for DataFrame.pipe
and Series .pipe
.
This PR is basically #10466 written by @ghl3 with some very minor updates, because that PR somehow got stalled and subsequently was closed.
This PR says it's new in 0.21, but I'll change that if it's too late to add this.
topper-123 changed the title
add GroupBy.pipe method ENH: add GroupBy.pipe method
allow for a cleaner, more readable syntax. To read about ``.pipe`` in general terms, |
---|
see :ref:`here <basics.pipe>`. |
For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine have a |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have -> having
.. ipython:: python |
from numpy.random import choice, random |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't import like like, rather import numpy as np
and write things out (e.g. np.random.choice
(base_df.pipe(lambda x: x[x.A>3]) |
---|
.groupby(['Store', 'Product']) |
.pipe(rapport_func) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit abstract
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've redone it with the previous df. Alternatively, I could remove this example.
element of the tuple. |
---|
func : callable or tuple of (callable, string) |
Function to apply to this GroupBy or, alternatively, a |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the GroupBy from here and make this more generic
kwargs[target] = obj |
---|
return func(*args, **kwargs) |
else: |
return func(obj, *args, **kwargs) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a return at the end
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@jreback , I've made changes to the PR. Travis was green and newest forcepush is just for some linting issues.
@ghl3, well, thank you for writing most of this:-)
@@ -235,6 +235,9 @@ Other Enhancements |
---|
- Improved the import time of pandas by about 2.25x. (:issue:`16764`) |
- :func:`read_json` and :func:`to_json` now accept a ``compression`` argument which allows them to transparently handle compressed files. (:issue:`17798`) |
- :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`) |
- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series`` |
that allow for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. |
See the :ref:`documentation <groupby.pipe>` for more. (:issue:`17871`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this to highlites is ok
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll move this to highlights and add an example in the whatsnew (if I understand you correctly?)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Had some git issue, so squashed all the commits.
I'm off to sleep, I'll look into eventual comments tomorrow.
(df.groupby(['Store', 'Product']).pipe(rapport_func) |
where ``rapport_func`` take an arbitrary GroupBy object and creates a rapport |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'report' is the correct English word? (also above in the code example)
(no native speaker, but in my mother tongue 'rapport' exists, but the english translation is 'report' :-))
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's rapport in my language too. I'm changed that
.. versionadded:: 0.21.0 |
Similar to the functionality provided by ``DataFrames`` and ``Series``, functions |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrames -> DataFrame (or otherwise do not put it as a code object)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments
@@ -3508,7 +3510,7 @@ def sample(self, n=None, frac=None, replace=False, weights=None, |
---|
----- |
Use ``.pipe`` when chaining together functions that expect |
on Series or DataFrames. Instead of writing |
Series, DataFrames or GroupBys. Instead of writing |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is needed? (as the docstring of GroupBy.pipe has a separate one?)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "on" was gramatically incorrect, So I changed the line. I think adding "GroupBy objects" add clarity that GroupBy objrcts can be used with piping. I didnt intend for this to only refer to series/DataFrames, but is too easy to misunderstand?
Parameters |
---|
---------- |
func : callable or tuple of (callable, string) |
Function to apply to this GroupBy or, alternatively, a |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start of "Function ..." does not need to be aligned with the "callable .. " above, but just have a single identation of 4 spaces (numpy docstring specifics ..)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do "this GroupBy" -> "this GroupBy object"
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, changing.
@@ -14,6 +14,8 @@ Highlights include: |
---|
categoricals independent of the data, see :ref:`here <whatsnew_0210.enhancements.categorical_dtype>`. |
- The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck http://berkeleyanalytics.com/bottleneck`__ is installed, see :ref:`here <whatsnew_0210.api_breaking.bottleneck>` |
- Compatibility fixes for pypy, see :ref:`here <whatsnew_0210.pypy>`. |
- ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``, |
that allows for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a ref to the subsection
def pipe(self, func, *args, **kwargs): |
---|
""" Apply a function with arguments to this GroupBy object |
.. versionadded:: 0.21.0 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of mostly repeated doc strings in both places, you can template _pipe and use and Appender/Substitution
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to not do this here. The dosctrings are sufficiently different to just make it a lot harder to develop when trying to merge them in a single one with a lot of substituted parts
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thats fine
expected = pd.Series([-79.5160891089, -78.4839108911, None], |
---|
index=index) |
assert_series_equal(expected, result) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a tests on SeriesGroupBy
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
-------- |
---|
pandas.Series.pipe |
pandas.DataFrame.pipe |
pandas.GroupBy.apply |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short note on the difference here? (and also add a See also in apply to pipe, again with a short note on the difference)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but admittedly found it very hard to do in so few lines, see result. Do you have suggestions?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like: "Apply function to each group instead of to the full GroupBy object" ?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IIUC) perhaps
``apply`` applies a function to each group. ``pipe`` applies a function to a ``GroupBy`` object.
would work
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've uploaded new ones.
I've built the documentation, and pipe doesn't show up in the API. Do I need to add something somewhere?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, In groupby.rst line 1178 I have a link ":ref:here <basics.pipe>
".
This link doesn't show up in the docs, can anyone see why? It seems normal
Hmm, I can see the newest code changes on my github repository, but they don't show up in the PR. Any suggestions on how to proceed?
for reference, the newest code her: topper-123@8614c32
EDIT: never mind, it works now.
I think you need to add in api.rst as well. if docs build ok, ping on green.
yeemey pushed a commit to yeemey/pandas that referenced this pull request
alanbato pushed a commit to alanbato/pandas that referenced this pull request
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request