Issue 32497: datetime.strptime creates tz naive object from value containing a tzname (original) (raw)

Created on 2018-01-05 12:24 by akeeman, last changed 2022-04-11 14:58 by admin.

Messages (4)

msg309502 - (view)

Author: Arjan Keeman (akeeman) *

Date: 2018-01-05 12:24

Consider the following:

tz_naive_object = datetime.strptime("2018-01-05 13:10:00 CET", "%Y-%m-%d %H:%M:%S %Z")

Python's standard library is not capable of converting the timezone name CET to a tzinfo object. Therefore the case made above returns a timezone naive datetime object.

I propose to add an extra optional argument to _strptime.py's _strptime_datetime function, and to datetime.strptime: tzname_to_tzinfo:Optional[Callable[[str],Optional[tzinfo]]]=None. This parameter can be set with a function that accepts the timezone name and returns a tzinfo object or None (like pytz.timezone). None will mean that a timezone naive object will be created.

Usage: tz_aware_object = datetime.strptime("2018-01-05 13:10:00 CET", "%Y-%m-%d %H:%M:%S %Z", pytz.timezone)

msg309509 - (view)

Author: Paul Ganssle (p-ganssle) * (Python committer)

Date: 2018-01-05 16:55

This is essentially what the tzinfos argument to dateutil.parser.parse does. I do think something like this is the only reasonable way to handle %Z->tzinfo mappings.

In dateutil (https://dateutil.readthedocs.io/en/latest/parser.html#dateutil.parser.parse), you can either pass a mapping or callable. Most of the problems we have in dateutil relate to the fact that we're both inferring what should or should not be interpreted as a time zone and passing it to the mapping or callable. Given that the first problem is solved by the format specifier already having an option for %Z, the implementation of this would be much easier.

I think the options for how this could be implemented are:

  1. Mapping only
  2. Callable only
  3. Mapping or callable

Callable-only will probably lead to plenty of problems, since there's already a problem in this bug report, which is that pytz.timezone evidently doesn't do what Arjan thinks it does, because that function only happens to work. It would not work with, say, CST or PST. That said, callable is the most versatile way to do it, and if we don't include it, then people will probably end up having to work around it by creating mappings whose .get calls arbitrary functions.

#1 is probably the least convenient and #3 is the most convenient. Either way, I'd say that the primary documented interface should be mappings, since that's least error-prone (these mappings could be curated by third party libraries for a given local context). An advantage of using mappings is that if we ever have a C implementation of strptime, it can have a fast evalution path for when the mapping is a Dict.

msg309511 - (view)

Author: Paul Ganssle (p-ganssle) * (Python committer)

Date: 2018-01-05 17:12

By the way, one possibly significant problem with this interface is that it would tend to encourage the use of static timezone offsets rather than rule sets as intended by tzinfo. The main problem is that a simple mapping between tzname and tzinfo (whether done with a Mapping or a callable) will actually lose information about the fold that is encoded in the chosen tzname.

In dateutil, I solved this problem by attaching the timezone object and checking whether the .tzname() of the created datetime matches the string it was parsed from, and if not, set fold=1 and check again - if that one matches, use fold=1, otherwise just return it with fold=0. This is obviously a heuristic metric that will not always work.

Two possible more general solutions to this problem:

  1. have a variant of strptime that returns a datetime and the contents of %Z and let users or third party libraries handle converting the string into a timezone and attaching it to the datetime.
  2. have tzinfos take a callable like handle_tzinfo(dt, tzstr) which returns the localized datetime.
  3. have separate tzinfos and apply_tzinfo arguments, the first generating the tzinfo object, the second of the format apply_tzinfo(dt, tz) - if the second one doesn't exist, the default implementation is just lambda dt, tz: dt.replace(tzinfo=tz) (or equivalent)

#1 is a pretty significant (and possibly awkward) change to the interface, and #2 makes the implementation of these mappings less convenient for the downstream users, but is probably the most elegant from an API perspective. #3 is a somewhat reasonable marriage of #1 and #2, but it's ugly and I'm fairly certain it would lead to a lot of buggy code out there from people who don't realize why you would need to implement the apply function.

msg309512 - (view)

Author: Paul Ganssle (p-ganssle) * (Python committer)

Date: 2018-01-05 17:13

Sorry, forgot to include the link to the dateutil implementation of the fold-resolution code: https://github.com/dateutil/dateutil/pull/517/files

History

Date

User

Action

Args

2022-04-11 14:58:56

admin

set

github: 76678

2018-01-05 17:13:13

p-ganssle

set

messages: +

2018-01-05 17:12:21

p-ganssle

set

messages: +

2018-01-05 16:55:21

p-ganssle

set

nosy: + belopolsky, p-ganssle
messages: +

2018-01-05 12:38:37

akeeman

set

keywords: + patch
stage: patch review
pull_requests: + <pull%5Frequest4973>

2018-01-05 12:24:22

akeeman

create