[Python-Dev] Status on PEP-431 Timezones (original) (raw)

Lennart Regebro regebro at gmail.com
Wed Apr 8 17🔞15 CEST 2015

Previous message (by thread): [Python-Dev] ctypes module
Next message (by thread): [Python-Dev] Status on PEP-431 Timezones
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi!

I wrote PEP-431 two years ago, and never got around to implement it. This year I got some renewed motivation after Berker Peksağ made an effort of implementing it. I'm planning to work more on this during the PyCon sprints, and also have a BoF session or similar during the conference.

Anyone interested in a session on this, mail me and we'll set up a time and place!

//Lennart

If anyone is interested in the details of the problem, this is it.

The big problem is the ambiguous times, like 02:30 a time when you move the clock back one hour, as there are two different 02:30's that day. I wrote down my experiences with looking into and trying to implement several different solutions. And the problem there is actually how to tell the datetime if it is before or after the changeover.

== How others have solved it ==

=== dateutil.tz: Ignore the problem ===

dateutil.tz simply ignores the problems with ambiguous datetimes, keeping them ambiguous.

=== pytz: One timezone instance per changeover ===

Pytz implements ambiguous datetimes by having one class per timezone. Each change in the UTC offset changes, either because of a DST changeover, or because the timezone changes, is represented as one instance of the class.

All instances are held in a list which is a class attribute of the timezone class. You flag in which DST changeover you are by uising different instances as the datetimes tzinfo. Since the timezone this way knows if it is DST or not, the datetime as a whole knows if it's DST or not.

Benefits:

Only known possible implementation without modifying stdlib, which of course was a requirement, as pytz is a third-party library.
DST offset can be quickly returned, as it does not need to be calculated. Drawbacks:
A complex and highly magical implementation of timezones that is hard to understand.
Required new normalize()/localize() functions on the timezone, and hence the API is not stdlib's API.
Hundreds of instances per timezone means slightly more memory usage.

== Options for PEP 431 ==

=== Stdlib option 0: Ignore it ===

I don't think this is an option, really. Listed for completness.

=== Stdlib option 1: One timezone instance per changeover ===

Option 1 is to do it like pytz, have one timezone instance per changeover. However, this is likely not possible to do without fundamentally changing the datetime API, or making it very hard to use.

For example, when creating a datetime instance and passing in a tzinfo today this tzinfo is just attached to the datetime. But when having multiple instances of tzinfos this means you have to select the correct one to pass in. pytz solves this with the .localize() method, which let's the timezone class choose which instance to pass in.

We can't pass in the timezone class into datetime(), because that would require datetime.new to create new datetimes as a part of the timezone arithmetic. These in turn, would create new datetimes in new as a part of the timezone arithmetic, which in turn, yeah, you get it...

I haven't been able to solve that issue without either changing the API/usage, or getting infinite recursions.

Benefits:

Proven soloution through pytz.
Fast dst() call. Drawbacks:
Trying to use this technique with the current API tends to create infinite recursions. It seems to require big API changes.
Slow datetime() instance creation.

=== Stdlib option 2: A datetime _is_dst flag ===

By having a flag on the datetime instance that says "this is in DST or not" the timezone implementation can be kept simpler.

You also have to either calculate if the datetime is in a DST or not either when creating it, which demands datetime object creations, and causes infinite recursions, or you have to calculate it when needed, which means you can get "Ambiguous date time errors" at unexpected times later.

Also, when trying to implement this, I get bogged down in the complexities of how tzinfo and datetime is calling each other back and forth, and when to pass in the current is_dst and when to pass in the the desired is_dst, etc. The API and current implementation is not designed with this case in mind, and it gets very tricky.

Benefits:

Simpler tzinfo() implementations. Drawbacks:
It seems likely that we must change some API's.
This in turn may affect the pytz implementation. Or not, hard to say.
The DST offset must use slow timezone calculations. However, since datetimes are immutable it can be a cached, lazy, one-time operation.

=== Stdlib option 3: UTC internal representation ===

Having UTC as the internal representation makes the whole issue go away. Datetimes are no longer ambiguous, except when creating, so checks need to be done during creation, but that should be possible without datetime creation in this case, resolving the infinite recursion problem.

Benefits:

Problem solved.
Minimal API changes. Drawbacks:
Backwards compatibility with pickles.
Possible other backwards incompatibility problems.
Both DST offset and date time display representation must use slow timezone calculations. However, since datetimes are immutable it can be a cached, lazy, one-time operation.

I'm currently trying to implement solution #2 above. Feedback is welcome.

Previous message (by thread): [Python-Dev] ctypes module
Next message (by thread): [Python-Dev] Status on PEP-431 Timezones
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list