DOC: Add to docs on group_keys in groupby.apply (#47185) · pandas-dev/pandas@a2f5815 (original) (raw)

`@@ -188,21 +188,33 @@ class providing the base-class of operations.

`

188

188

` >>> df = pd.DataFrame({'A': 'a a b'.split(),

`

189

189

` ... 'B': [1,2,3],

`

190

190

` ... 'C': [4,6,5]})

`

191

``

`-

g = df.groupby('A')

`

``

191

`+

g1 = df.groupby('A', group_keys=False)

`

``

192

`+

g2 = df.groupby('A', group_keys=True)

`

192

193

``

193

``


 Notice that ``g`` has two groups, ``a`` and ``b``.

194

``

`` -

Calling apply in various ways, we can get different grouping results:

``

``

194


 Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only

``

195


 differ in their ``group_keys`` argument. Calling `apply` in various ways,

``

196

`+

we can get different grouping results:

`

195

197

``

196

198

`` Example 1: below the function passed to apply takes a DataFrame as

``

197

199

`` its argument and returns a DataFrame. apply combines the result for

``

198

200

` each group together into a new DataFrame:

`

199

201

``

200

``

`-

g[['B', 'C']].apply(lambda x: x / x.sum())

`

``

202

`+

g1[['B', 'C']].apply(lambda x: x / x.sum())

`

201

203

` B C

`

202

204

` 0 0.333333 0.4

`

203

205

` 1 0.666667 0.6

`

204

206

` 2 1.000000 1.0

`

205

207

``

``

208

`+

In the above, the groups are not part of the index. We can have them included

`

``

209


 by using ``g2`` where ``group_keys=True``:

``

210

+

``

211

`+

g2[['B', 'C']].apply(lambda x: x / x.sum())

`

``

212

`+

B C

`

``

213

`+

A

`

``

214

`+

a 0 0.333333 0.4

`

``

215

`+

1 0.666667 0.6

`

``

216

`+

b 2 1.000000 1.0

`

``

217

+

206

218

`` Example 2: The function passed to apply takes a DataFrame as

``

207

219

`` its argument and returns a Series. apply combines the result for

``

208

220

` each group together into a new DataFrame.

`

`@@ -211,28 +223,41 @@ class providing the base-class of operations.

`

211

223

``

212

224

``` The resulting dtype will reflect the return value of the passed func.


`213`

`225`

``

`214`

``

`-

>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())

`

``

`226`

`+

>>> g1[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())

`

``

`227`

`+

B C

`

``

`228`

`+

A

`

``

`229`

`+

a 1.0 2.0

`

``

`230`

`+

b 0.0 0.0

`

``

`231`

`+`

``

`232`

`+

>>> g2[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())

`

`215`

`233`

` B C

`

`216`

`234`

` A

`

`217`

`235`

` a 1.0 2.0

`

`218`

`236`

` b 0.0 0.0

`

`219`

`237`

``

``

`238`

``` +

The ``group_keys`` argument has no effect here because the result is not

``

239

`` +

like-indexed (i.e. :ref:a transform <groupby.transform>) when compared

``

``

240

`+

to the input.

`

``

241

+

220

242

`` Example 3: The function passed to apply takes a DataFrame as

``

221

243

`` its argument and returns a scalar. apply combines the result for

``

222

244

` each group together into a Series, including setting the index as

`

223

245

` appropriate:

`

224

246

``

225

``

`-

g.apply(lambda x: x.C.max() - x.B.min())

`

``

247

`+

g1.apply(lambda x: x.C.max() - x.B.min())

`

226

248

` A

`

227

249

` a 5

`

228

250

` b 2

`

229

251

` dtype: int64""",

`

230

252

`"series_examples": """

`

231

253

` >>> s = pd.Series([0, 1, 2], index='a a b'.split())

`

232

``

`-

g = s.groupby(s.index)

`

``

254

`+

g1 = s.groupby(s.index, group_keys=False)

`

``

255

`+

g2 = s.groupby(s.index, group_keys=True)

`

233

256

``

234

257

``` From s above we can see that g has two groups, a and b.


`235`

``

`` -

Calling `apply` in various ways, we can get different grouping results:

``

``

`258`

``` +

Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only

``

259


 differ in their ``group_keys`` argument. Calling `apply` in various ways,

``

260

`+

we can get different grouping results:

`

236

261

``

237

262

`` Example 1: The function passed to apply takes a Series as

``

238

263

`` its argument and returns a Series. apply combines the result for

``

`@@ -242,18 +267,36 @@ class providing the base-class of operations.

`

242

267

``

243

268

``` The resulting dtype will reflect the return value of the passed func.


`244`

`269`

``

`245`

``

`-

>>> g.apply(lambda x: x*2 if x.name == 'a' else x/2)

`

``

`270`

`+

>>> g1.apply(lambda x: x*2 if x.name == 'a' else x/2)

`

`246`

`271`

` a 0.0

`

`247`

`272`

` a 2.0

`

`248`

`273`

` b 1.0

`

`249`

`274`

` dtype: float64

`

`250`

`275`

``

``

`276`

`+

In the above, the groups are not part of the index. We can have them included

`

``

`277`

``` +

by using ``g2`` where ``group_keys=True``:

``

278

+

``

279

`+

g2.apply(lambda x: x*2 if x.name == 'a' else x/2)

`

``

280

`+

a a 0.0

`

``

281

`+

a 2.0

`

``

282

`+

b b 1.0

`

``

283

`+

dtype: float64

`

``

284

+

251

285

`` Example 2: The function passed to apply takes a Series as

``

252

286

`` its argument and returns a scalar. apply combines the result for

``

253

287

` each group together into a Series, including setting the index as

`

254

288

` appropriate:

`

255

289

``

256

``

`-

g.apply(lambda x: x.max() - x.min())

`

``

290

`+

g1.apply(lambda x: x.max() - x.min())

`

``

291

`+

a 1

`

``

292

`+

b 0

`

``

293

`+

dtype: int64

`

``

294

+

``

295


 The ``group_keys`` argument has no effect here because the result is not

``

296

`` +

like-indexed (i.e. :ref:a transform <groupby.transform>) when compared

``

``

297

`+

to the input.

`

``

298

+

``

299

`+

g2.apply(lambda x: x.max() - x.min())

`

257

300

` a 1

`

258

301

` b 0

`

259

302

` dtype: int64""",

`