ENH: Create a nlargest and nsmallest methods for pandas.core.groupby.DataFrameGroupBy objects · Issue #23993 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Current method

import pandas as pd

dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')}) dataframe.groupby('col1').apply(lambda df: df.nlargest(2, 'col2'))

Desired method

import pandas as pd

dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')}) dataframe.groupby('col1').nlargest(2, 'col2')

Problem description

I want to select the first 2 entries of each group ordered by a specific column. I can easily get these values if I only want to keep the column used for the ordering (see code below) but it gets way less efficient and not as clear when I want to keep all the columns of the DataFrame.

import pandas as pd

dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')}) dataframe.groupby('col1').col2.nlargest(2)