Improve to_datetime bounds checking by rebecca-palmer · Pull Request #50183 · pandas-dev/pandas (original) (raw)

to_datetime(errors='raise') is supposed to raise an exception on out-of-bounds input. However, it uses rounding to integer for this bounds checking but not the actual conversion, and hence, for input just outside the bounds it may instead return NaT (on x86) or output clipped to the bounds (on arm).

It also assumes that converting NaN to int gives NaT, which may not be true on unusual hardware. (This was how I originally noticed the problem, as Debian tries to build packages on a wide range of hardware.)

This patch fixes both issues. (In the Debian logs linked here, search for 'near-limits test', the first run is unpatched 1.3.5, the second run is patched 1.5.x.)

Its effect on speed is unclear: it is sometimes faster and sometimes slower, possibly because other load on the build machines varies.