Issue 26695: pickle and _pickle accelerator have different behavior when unpickling an object with falsy getstate return (original) (raw)

According to a note on the pickle docs ( https://docs.python.org/3/library/pickle.html#object.getstate ): "If getstate() returns a false value, the setstate() method will not be called upon unpickling."

The phrasing is a little odd (since according to the setstate docs, there is a behavior for classes without setstate where it just assigns the contents of the pickled state dict to the dict of the object), but to me, this means that any falsy value should prevent any setstate-like behavior.

But this is not how it works. Both the C accelerator and Python code treat None specially (they don't pickle state at all if it's None), which prevents setstate or the setstate-like fallback from being executed.

But if it's any other falsy value, the behaviors differ, and diverge from the docs. Specifically, on load of a pickle with a non-None falsy state (say, False itself, or 0, or () or []):

Without setstate: Pure Python pickle: Does not execute fallback code, behaves as expected (it just stored state it will never use), matching spirit of docs C accelerated _pickle: Fails on anything but the empty dict with an UnpicklingError: state is not a dictionary, violating spirit of docs

With setstate: Both versions call setstate even though the documentation explicitly says they will not.

Seems like if nothing else, the docs should agree with the code, and the C and Python modules should agree on behavior.

I would not be at all surprised if outside code depends on being able to pickle falsy state and have its setstate receive the falsy state (if nothing else, when the state is a container or number, being empty or 0 would be reasonable; failing to call setstate in that case would be surprising). So it's probably not a good idea to make the implementation match the docs.

My proposal would be that at pickle time, if the class lacks setstate, treat any falsy return value as None. This means:

  1. pickles are smaller (no storing junk that the default setstate-like behavior can't use)
  2. pickles are valid (no UnpicklingError from the default setstate-like behavior)

The docs would also have to change, to indicate that, if defined, setstate will be called even if getstate returned a falsy (but not None) value.

Downside is the description of what happens is a little complex, since the behavior for non-None falsy values differs depending on the presence of a real setstate. Upside is that any code depending on the current behavior of falsy state being passed to setstate keeps working, CPython and other interpreters will match behavior, and classes without setstate will have smaller pickles.