DOC: Add to docs on group_keys in groupby.apply (#47185) · pandas-dev/pandas@a2f5815 (original) (raw)
`@@ -188,21 +188,33 @@ class providing the base-class of operations.
`
188
188
` >>> df = pd.DataFrame({'A': 'a a b'.split(),
`
189
189
` ... 'B': [1,2,3],
`
190
190
` ... 'C': [4,6,5]})
`
191
``
`-
g = df.groupby('A')
`
``
191
`+
g1 = df.groupby('A', group_keys=False)
`
``
192
`+
g2 = df.groupby('A', group_keys=True)
`
192
193
``
193
``
Notice that ``g`` has two groups, ``a`` and ``b``.
194
``
`` -
Calling apply
in various ways, we can get different grouping results:
``
``
194
Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only
``
195
differ in their ``group_keys`` argument. Calling `apply` in various ways,
``
196
`+
we can get different grouping results:
`
195
197
``
196
198
`` Example 1: below the function passed to apply
takes a DataFrame as
``
197
199
`` its argument and returns a DataFrame. apply
combines the result for
``
198
200
` each group together into a new DataFrame:
`
199
201
``
200
``
`-
g[['B', 'C']].apply(lambda x: x / x.sum())
`
``
202
`+
g1[['B', 'C']].apply(lambda x: x / x.sum())
`
201
203
` B C
`
202
204
` 0 0.333333 0.4
`
203
205
` 1 0.666667 0.6
`
204
206
` 2 1.000000 1.0
`
205
207
``
``
208
`+
In the above, the groups are not part of the index. We can have them included
`
``
209
by using ``g2`` where ``group_keys=True``:
``
210
+
``
211
`+
g2[['B', 'C']].apply(lambda x: x / x.sum())
`
``
212
`+
B C
`
``
213
`+
A
`
``
214
`+
a 0 0.333333 0.4
`
``
215
`+
1 0.666667 0.6
`
``
216
`+
b 2 1.000000 1.0
`
``
217
+
206
218
`` Example 2: The function passed to apply
takes a DataFrame as
``
207
219
`` its argument and returns a Series. apply
combines the result for
``
208
220
` each group together into a new DataFrame.
`
`@@ -211,28 +223,41 @@ class providing the base-class of operations.
`
211
223
``
212
224
``` The resulting dtype will reflect the return value of the passed func
.
`213`
`225`
``
`214`
``
`-
>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
`
``
`226`
`+
>>> g1[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
`
``
`227`
`+
B C
`
``
`228`
`+
A
`
``
`229`
`+
a 1.0 2.0
`
``
`230`
`+
b 0.0 0.0
`
``
`231`
`+`
``
`232`
`+
>>> g2[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
`
`215`
`233`
` B C
`
`216`
`234`
` A
`
`217`
`235`
` a 1.0 2.0
`
`218`
`236`
` b 0.0 0.0
`
`219`
`237`
``
``
`238`
``` +
The ``group_keys`` argument has no effect here because the result is not
``
239
`` +
like-indexed (i.e. :ref:a transform <groupby.transform>
) when compared
``
``
240
`+
to the input.
`
``
241
+
220
242
`` Example 3: The function passed to apply
takes a DataFrame as
``
221
243
`` its argument and returns a scalar. apply
combines the result for
``
222
244
` each group together into a Series, including setting the index as
`
223
245
` appropriate:
`
224
246
``
225
``
`-
g.apply(lambda x: x.C.max() - x.B.min())
`
``
247
`+
g1.apply(lambda x: x.C.max() - x.B.min())
`
226
248
` A
`
227
249
` a 5
`
228
250
` b 2
`
229
251
` dtype: int64""",
`
230
252
`"series_examples": """
`
231
253
` >>> s = pd.Series([0, 1, 2], index='a a b'.split())
`
232
``
`-
g = s.groupby(s.index)
`
``
254
`+
g1 = s.groupby(s.index, group_keys=False)
`
``
255
`+
g2 = s.groupby(s.index, group_keys=True)
`
233
256
``
234
257
``` From s
above we can see that g
has two groups, a
and b
.
`235`
``
`` -
Calling `apply` in various ways, we can get different grouping results:
``
``
`258`
``` +
Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only
``
259
differ in their ``group_keys`` argument. Calling `apply` in various ways,
``
260
`+
we can get different grouping results:
`
236
261
``
237
262
`` Example 1: The function passed to apply
takes a Series as
``
238
263
`` its argument and returns a Series. apply
combines the result for
``
`@@ -242,18 +267,36 @@ class providing the base-class of operations.
`
242
267
``
243
268
``` The resulting dtype will reflect the return value of the passed func
.
`244`
`269`
``
`245`
``
`-
>>> g.apply(lambda x: x*2 if x.name == 'a' else x/2)
`
``
`270`
`+
>>> g1.apply(lambda x: x*2 if x.name == 'a' else x/2)
`
`246`
`271`
` a 0.0
`
`247`
`272`
` a 2.0
`
`248`
`273`
` b 1.0
`
`249`
`274`
` dtype: float64
`
`250`
`275`
``
``
`276`
`+
In the above, the groups are not part of the index. We can have them included
`
``
`277`
``` +
by using ``g2`` where ``group_keys=True``:
``
278
+
``
279
`+
g2.apply(lambda x: x*2 if x.name == 'a' else x/2)
`
``
280
`+
a a 0.0
`
``
281
`+
a 2.0
`
``
282
`+
b b 1.0
`
``
283
`+
dtype: float64
`
``
284
+
251
285
`` Example 2: The function passed to apply
takes a Series as
``
252
286
`` its argument and returns a scalar. apply
combines the result for
``
253
287
` each group together into a Series, including setting the index as
`
254
288
` appropriate:
`
255
289
``
256
``
`-
g.apply(lambda x: x.max() - x.min())
`
``
290
`+
g1.apply(lambda x: x.max() - x.min())
`
``
291
`+
a 1
`
``
292
`+
b 0
`
``
293
`+
dtype: int64
`
``
294
+
``
295
The ``group_keys`` argument has no effect here because the result is not
``
296
`` +
like-indexed (i.e. :ref:a transform <groupby.transform>
) when compared
``
``
297
`+
to the input.
`
``
298
+
``
299
`+
g2.apply(lambda x: x.max() - x.min())
`
257
300
` a 1
`
258
301
` b 0
`
259
302
` dtype: int64""",
`