BUG: merge not always following documented sort behavior by lukemanley · Pull Request #54611 · pandas-dev/pandas (original) (raw)
- closes Merge with how='inner' does not always preserve the order of the left keys #18776
- closes changing order of keys? #24730
- closes BUG: merge left and merge inner produce different index-order #33554
- closes BUG: join/merge of DataFrame does not keep order of index #40608
- closes BUG: merge: order when merging on index not preserved #48021
- closes DOC: Key order after dataframe inner merge #53157
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added type annotations to new arguments/methods/functions.
- Added an entry in the latest
doc/source/whatsnew/v2.2.0.rst
file if fixing a bug or adding a new feature.
When merging dataframes there are a number of different code paths hit depending on arguments passed (e.g. how, sort, on index vs columns) as well as left/right characteristics (e.g. unique, monotonic)
The resulting sort behavior is not always consistent and does not always align with documented behavior.
The docs state:
sort: bool, default False
Sort the join keys lexicographically in the result DataFrame. If False, the order
of the join keys depends on the join type (how keyword).
...
left: preserve the order of the left keys
right: preserve the order of the right keys
outer: sort keys lexicographically
inner: preserve the order of the left keys
This MR aims to fix the sort behavior for cases where it does not follow documented behavior and add tests to validate sort behavior across a wide range of arguments.
NOTE: a few existing tests that relied on incorrect sort behavior were updated.