ENH: Add sort parameter to RangeIndex.union (#24471) by reidy-p · Pull Request #25788 · pandas-dev/pandas (original) (raw)

This is WIP for adding a sort parameter to RangeIndex.union that behaves in a similar way to the other index types.

sort=None is the default to make it consistent with the union method in the base class. When sort=None a monotonically increasing RangeIndex will be returned if possible and a sorted Int64Index if not.

The way I have implemented sort=False is that it returns an Int64Index in all cases. I have been trying to think of cases where it would make sense to still return a RangeIndex when sort=False. For example, there might be a case where if we had two RangeIndexs and both had the same step and the second RangeIndex overlapped with the first we would want to return a RangeIndex here even if we had sort=False. But would it be better just to always return an Int64Index when sort=False as I have done here to make the return type consistent and because this particular case seems quite rare?

# sort=False returns an Int64Index even though we might be able to return a RangeIndex as below
In [1]: RangeIndex(0, 10, 2).union(RangeIndex(10, 12, 2), sort=False)
Out[1]: Int64Index([0, 2, 4, 6, 8, 10], dtype='int64')

In [1]: RangeIndex(0, 10, 2).union(RangeIndex(10, 12, 2), sort=None)
Out[1]: RangeIndex(start=0, stop=12, step=2)