[Python-3000] Heaptypes (original) (raw)

Guido van Rossum guido at python.org
Thu Jul 19 05:01:18 CEST 2007


On 7/18/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:

> You broke backwards compatibility this way; I think that a pickle > produced by Python 2.x should be readable by Python 3.0.

It is, is it not?

No; {'a': 1} pickled on 2.x results in an error complaining about an unhashable object when the pickle is read in 3.0; this is the error you saw in test_pickle.py.

> (I haven't decided whether to keep str8 or something like it, or > whether to try to get rid of it completely).

I assumed the latter - and if it indeed goes away, it's certainly a bug to ever return str8 from pickle, right?

If indeed it goes away, it can't be returned. If it's still around, we can argue about the desirability of returning one.

> One possibility might be to first try to decode the STRING argument as > utf-8, and if that fails to convert it to str8 instead. What do you > think? I don't understand all of the changes you made in r56438, > perhaps you can save most of them.

The question really is what bytes should be pickled as; that needs to be decided before fixing the code. Should it be built-in (and if so, using what code)? If not, it probably needs to go through reduce, and if so, what should reduce return for bytes object?

Either a new opcode (which would such a pickle fail hard when unpickled with 2.5, but that's probably fine as it would fail anyway), or some variation of what I coded before, using reduce.

reduce currently does (O(s#)) with (obtype, obbytes, obsize). Now, s# creates a Unicode object, and the pickling fails to round-trip correctly.

I thought that before your patch a bytes object roundtripped correctly with all three protocols. Or maybe it got broken when s# was changed?

An additional requirement might be that if bytes are introduced in 2.6, a pickle containing bytes written by 3.0 should be readable by 2.6. Ideally, pickles not containing bytes written in 3.0 should always be readable in 2.6 (assuming the user-defined types it references exist).

If reduce returns a Unicode object, what encoding should be assumed? (which then needs to be symmetric with bytes())

If reduce returns a str8 object, you will have to keep str8 (or else you cannot pickle bytes).

When reduce returns a string at all, that means it's the name of a global. I guess that should be encoded using UTF-8, so that as long as the name is ASCII, 2.x can unpickle it. But I'm not sure if that's what you were asking.

Anyway, one reason this is such a mess is clearly that the pickle protocol has no independent spec -- it's grown organically in code. Reverse-engineering the intent of the code is a pain.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list