ENH: Create a nlargest and nsmallest methods for pandas.core.groupby.DataFrameGroupBy objects · Issue #23993 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Current method
import pandas as pd
dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')}) dataframe.groupby('col1').apply(lambda df: df.nlargest(2, 'col2'))
Desired method
import pandas as pd
dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')}) dataframe.groupby('col1').nlargest(2, 'col2')
Problem description
I want to select the first 2 entries of each group ordered by a specific column. I can easily get these values if I only want to keep the column used for the ordering (see code below) but it gets way less efficient and not as clear when I want to keep all the columns of the DataFrame.
import pandas as pd
dataframe = pd.DataFrame({'col1': list('aaabbbcc'), 'col2': range(8), 'col3': list('whatever')}) dataframe.groupby('col1').col2.nlargest(2)