implement assert_tzawareness_compat for DatetimeIndex by jbrockmendel · Pull Request #18376 · pandas-dev/pandas (original) (raw)

good if somebody made kind of an overview of the different related issues

That was the idea behind #18435. Feel free to edit if that if the OP doesn't provide enough overview for your tastes.

And I think we certainly should not regard it is as just a bug fix (eg in the whatsnew notes).

How to treat it in the whatsnew notes is above my pay grade. But over the course of this PR's history I've become increasingly convinced that the current comparison behavior is Just Plain Wrong and should be treated like a bug.

The three options on hand are 1) this PR which makes DatetimeIndex comparisons behave like all other datetime-like comparisons, 2) edit the spec to explicitly make string comparisons a special case, 3) do nothing. I'm going to assume away 3 and argue against 2.

AFAICT the main objection to 1) is that in conjunction with #17920 it breaks the equivalence between ser.loc[lower:upper] and ser[(ser.index >= lower) & (ser.index <= upper)]. But consider what other equivalences are broken by 2:

We can no longer count on DatetimeIndex comparisons to be transitive. a >= b and b >= c no longer implies a >= c.
Comparison-broadcasting becomes inconsistent. (index >= bound)[n] no longer equals index[n] >= bound (unless we then go mess with Timestamp...)
Comparison-boxing becomes inconsistent. index >= bound no longer necessarily equals index >= Timestamp(bound).
Box-conversion becomes inconsistent. index >= bound no longer necessarily equals index.astype(object) >= bound

... and most of all, I am not remotely confident that this list is complete. How many places across the code-base do comparisons with DatetimeIndex objects in one place or another? I have no idea.

A tz-aware DatetimeIndex is not easy to get by accident. If a user has one, they have made a decision that timezones matter.

Any of the available options introduces an inconsistency somewhere. AFAICT Option1 breaks a convenience equivalency, will do so loudly, and as a result will not snowball into other inconsistencies.

Special-casing string comparisons generates a whole mess of other potential (often silent) problems that can be avoided by enforcing behavior that is already canonical.