Message 385309 - Python tracker (original) (raw)

Like many others here, I've run into this issue because I'm trying to parse timestamps from JSON.

(Specifically, I'm trying to parse timestamps from JSON serialization of Java POJOs and/or Kotlin data classes, as serialized by the Jackson serialization library for JVM languages, in conjunction with JavaTimeModule. https://fasterxml.github.io/jackson-modules-java8/javadoc/datetime/2.9/com/fasterxml/jackson/datatype/jsr310/JavaTimeModule.html)

In order to "be lenient in what I accept" (adhering to the robustness principal), I need to add a special case for deserialization of strings ending with 'Z'. This gets pretty tricky and pretty subtle quickly.

Here is my Python 3.7+ code path (the strptime-based code path for earlier versions is much, much uglier).

from numbers import Number
from datetime import datetime, timezone
def number_or_iso8601_to_dt(ts, t=datetime):
    if isinstance(ts, Number):
        return datetime.utcfromtimestamp(ts).replace(tzinfo=timezone.utc)
    elif ts.endswith('Z'):
        # This is not strictly correct, since it would accept a string with
        # two timezone specifications (e.g. ending with +01:00Z) and 
        # silently pass that erroneous representation:
        #
        # return datetime.fromisoformat(ts[:-1]).replace(tzinfo=timezone.utc)
        #
        # This version is better:
        d = datetime.fromisoformat(ts[:-1])
        if d.tzinfo is not None:
            raise ValueError(f"time data '{ts}' contains multiple timezone suffixes")
        return d.replace(tzinfo=timezone.utc)
    else:
        return datetime.fromisoformat(ts)

I don't really understand why .fromisoformat() must be strictly the inverse of .isoformat(). As @mehaase points out, the transformation isn't strictly reversible as is.

There are other functions where the Python standard library has special-cased options for extremely common use cases. For example, str.split(None), which is certainly not the inverse of the non-existent None.join().

This feels to me like a case where the standard library should simply just accept an extremely-common real-world variant in the interests of interoperability.

I would also be in favor of @p-ganssle's proposal (3), wherein datetime.isoformat would also output the 'Z' suffix for the UTC timezone.