Datetime parsing (PDEP-4): allow mixture of ISO formatted strings? · Issue #50411 · pandas-dev/pandas (original) (raw)
In pandas < 2, parsing datetime strings that are all strictly ISO formatted strings (so there is no ambiguity) but have slightly different formatting because of a different resolution, works fine:
>>> pd.to_datetime(["2022-01-01T09:00:00", "2022-01-02T09:00:00.123"])
DatetimeIndex(['2022-01-01 09:00:00', '2022-01-02 09:00:00.123000'], dtype='datetime64[ns]', freq=None)
With the changes related to datetime parsing (PDEP-4), this will now infer the format from the first, and then parsing the second fails:
>>> pd.to_datetime(["2022-01-01T09:00:00", "2022-01-02T09:00:00.123"])
...
ValueError: time data "2022-01-02T09:00:00.123" at position 1 doesn't match format "%Y-%m-%dT%H:%M:%S"
This is of course expected and can be explained with the changes in behaviour. But I do wonder if we want to allow some way to still parse such data in a fast way.
For this specific case, you can't specify a format since it is inconsistent. And before, this was actually fast because it took a fast ISO parsing path.
With the current dev version of pandas, I don't see any way to parse this in a performant way?
(this also came up in pyarrow's CI, it was an easy fix on our side to update the strings in the test, but still wanted to raise this as a specific case to consider, since with real world data you can't necessarily easily update your data)