BUG: in _nsorted for frame with duplicated values index · Issue #13412 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@Tux1

Description

@Tux1

The function below has been incorrectly implemented. If the frame has an index with duplicated values, you will get a result with more than n rows and not properly sorted. So nsmallest and nlargest for DataFrame doesn't return a correct frame in this particular case.

def _nsorted(self, columns, n, method, keep):
    if not com.is_list_like(columns):
        columns = [columns]
    columns = list(columns)
    ser = getattr(self[columns[0]], method)(n, keep=keep)
    ascending = dict(nlargest=False, nsmallest=True)[method]
    return self.loc[ser.index].sort_values(columns, ascending=ascending,
                                           kind='mergesort')