BUG: merge not always following documented sort behavior by lukemanley · Pull Request #54611 · pandas-dev/pandas (original) (raw)

When merging dataframes there are a number of different code paths hit depending on arguments passed (e.g. how, sort, on index vs columns) as well as left/right characteristics (e.g. unique, monotonic)

The resulting sort behavior is not always consistent and does not always align with documented behavior.

The docs state:

sort: bool, default False
    Sort the join keys lexicographically in the result DataFrame. If False, the order 
    of the join keys depends on the join type (how keyword).

...

left: preserve the order of the left keys
right: preserve the order of the right keys
outer: sort keys lexicographically
inner: preserve the order of the left keys

This MR aims to fix the sort behavior for cases where it does not follow documented behavior and add tests to validate sort behavior across a wide range of arguments.

NOTE: a few existing tests that relied on incorrect sort behavior were updated.