[Python-3000] Heaptypes (original) (raw)
Guido van Rossum guido at python.org
Thu Jul 19 05:01:18 CEST 2007
- Previous message: [Python-3000] Heaptypes
- Next message: [Python-3000] Heaptypes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 7/18/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> You broke backwards compatibility this way; I think that a pickle > produced by Python 2.x should be readable by Python 3.0.
It is, is it not?
No; {'a': 1} pickled on 2.x results in an error complaining about an unhashable object when the pickle is read in 3.0; this is the error you saw in test_pickle.py.
> (I haven't decided whether to keep str8 or something like it, or > whether to try to get rid of it completely).
I assumed the latter - and if it indeed goes away, it's certainly a bug to ever return str8 from pickle, right?
If indeed it goes away, it can't be returned. If it's still around, we can argue about the desirability of returning one.
> One possibility might be to first try to decode the STRING argument as > utf-8, and if that fails to convert it to str8 instead. What do you > think? I don't understand all of the changes you made in r56438, > perhaps you can save most of them.
The question really is what bytes should be pickled as; that needs to be decided before fixing the code. Should it be built-in (and if so, using what code)? If not, it probably needs to go through reduce, and if so, what should reduce return for bytes object?
Either a new opcode (which would such a pickle fail hard when unpickled with 2.5, but that's probably fine as it would fail anyway), or some variation of what I coded before, using reduce.
reduce currently does (O(s#)) with (obtype, obbytes, obsize). Now, s# creates a Unicode object, and the pickling fails to round-trip correctly.
I thought that before your patch a bytes object roundtripped correctly with all three protocols. Or maybe it got broken when s# was changed?
An additional requirement might be that if bytes are introduced in 2.6, a pickle containing bytes written by 3.0 should be readable by 2.6. Ideally, pickles not containing bytes written in 3.0 should always be readable in 2.6 (assuming the user-defined types it references exist).
If reduce returns a Unicode object, what encoding should be assumed? (which then needs to be symmetric with bytes())
If reduce returns a str8 object, you will have to keep str8 (or else you cannot pickle bytes).
When reduce returns a string at all, that means it's the name of a global. I guess that should be encoded using UTF-8, so that as long as the name is ASCII, 2.x can unpickle it. But I'm not sure if that's what you were asking.
Anyway, one reason this is such a mess is clearly that the pickle protocol has no independent spec -- it's grown organically in code. Reverse-engineering the intent of the code is a pain.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] Heaptypes
- Next message: [Python-3000] Heaptypes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]