DOC: Add to docs on group_keys in groupby.apply (#47185) · pandas-dev/pandas@a2f5815 (original) (raw)

`@@ -188,21 +188,33 @@ class providing the base-class of operations.

188

` >>> df = pd.DataFrame({'A': 'a a b'.split(),

189

` ... 'B': [1,2,3],

190

` ... 'C': [4,6,5]})

191

g = df.groupby('A')

191

g1 = df.groupby('A', group_keys=False)

192

g2 = df.groupby('A', group_keys=True)

192

193


 Notice that ``g`` has two groups, ``a`` and ``b``.

194

`` -

Calling apply in various ways, we can get different grouping results:

194


 Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only

195


 differ in their ``group_keys`` argument. Calling `apply` in various ways,

196

we can get different grouping results:

195

197

196

198

`` Example 1: below the function passed to apply takes a DataFrame as

197

199

`` its argument and returns a DataFrame. apply combines the result for

198

200

` each group together into a new DataFrame:

199

201

200

g[['B', 'C']].apply(lambda x: x / x.sum())

202

g1[['B', 'C']].apply(lambda x: x / x.sum())

201

203

` B C

202

204

` 0 0.333333 0.4

203

205

` 1 0.666667 0.6

204

206

` 2 1.000000 1.0

205

207

208

In the above, the groups are not part of the index. We can have them included

209


 by using ``g2`` where ``group_keys=True``:

210

+

211

g2[['B', 'C']].apply(lambda x: x / x.sum())

212

B C

213

214

a 0 0.333333 0.4

215

1 0.666667 0.6

216

b 2 1.000000 1.0

217

+

206

218

`` Example 2: The function passed to apply takes a DataFrame as

207

219

`` its argument and returns a Series. apply combines the result for

208

220

` each group together into a new DataFrame.

`@@ -211,28 +223,41 @@ class providing the base-class of operations.

211

223

212

224

``` The resulting dtype will reflect the return value of the passed func.


`213`

`225`

``

`214`

``

`-

>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())

`

``

`226`

`+

>>> g1[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())

`

``

`227`

`+

B C

`

``

`228`

`+

A

`

``

`229`

`+

a 1.0 2.0

`

``

`230`

`+

b 0.0 0.0

`

``

`231`

`+`

``

`232`

`+

>>> g2[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())

`

`215`

`233`

` B C

`

`216`

`234`

` A

`

`217`

`235`

` a 1.0 2.0

`

`218`

`236`

` b 0.0 0.0

`

`219`

`237`

``

``

`238`

``` +

The ``group_keys`` argument has no effect here because the result is not

239

`` +

like-indexed (i.e. :ref:a transform <groupby.transform>) when compared

240

to the input.

241

+

220

242

`` Example 3: The function passed to apply takes a DataFrame as

221

243

`` its argument and returns a scalar. apply combines the result for

222

244

` each group together into a Series, including setting the index as

223

245

` appropriate:

224

246

225

g.apply(lambda x: x.C.max() - x.B.min())

247

g1.apply(lambda x: x.C.max() - x.B.min())

226

248

` A

227

249

` a 5

228

250

` b 2

229

251

` dtype: int64""",

230

252

`"series_examples": """

231

253

` >>> s = pd.Series([0, 1, 2], index='a a b'.split())

232

g = s.groupby(s.index)

254

g1 = s.groupby(s.index, group_keys=False)

255

g2 = s.groupby(s.index, group_keys=True)

233

256

234

257

``` From s above we can see that g has two groups, a and b.


`235`

``

`` -

Calling `apply` in various ways, we can get different grouping results:

``

``

`258`

``` +

Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only

259


 differ in their ``group_keys`` argument. Calling `apply` in various ways,

260

we can get different grouping results:

236

261

237

262

`` Example 1: The function passed to apply takes a Series as

238

263

`` its argument and returns a Series. apply combines the result for

`@@ -242,18 +267,36 @@ class providing the base-class of operations.

242

267

243

268

``` The resulting dtype will reflect the return value of the passed func.


`244`

`269`

``

`245`

``

`-

>>> g.apply(lambda x: x*2 if x.name == 'a' else x/2)

`

``

`270`

`+

>>> g1.apply(lambda x: x*2 if x.name == 'a' else x/2)

`

`246`

`271`

` a 0.0

`

`247`

`272`

` a 2.0

`

`248`

`273`

` b 1.0

`

`249`

`274`

` dtype: float64

`

`250`

`275`

``

``

`276`

`+

In the above, the groups are not part of the index. We can have them included

`

``

`277`

``` +

by using ``g2`` where ``group_keys=True``:

278

+

279

g2.apply(lambda x: x*2 if x.name == 'a' else x/2)

280

a a 0.0

281

a 2.0

282

b b 1.0

283

dtype: float64

284

+

251

285

`` Example 2: The function passed to apply takes a Series as

252

286

`` its argument and returns a scalar. apply combines the result for

253

287

` each group together into a Series, including setting the index as

254

288

` appropriate:

255

289

256

g.apply(lambda x: x.max() - x.min())

290

g1.apply(lambda x: x.max() - x.min())

291

a 1

292

b 0

293

dtype: int64

294

+

295


 The ``group_keys`` argument has no effect here because the result is not

296

`` +

like-indexed (i.e. :ref:a transform <groupby.transform>) when compared

297

to the input.

298

+

299

g2.apply(lambda x: x.max() - x.min())

257

300

` a 1

258

301

` b 0

259

302

` dtype: int64""",