[Python-3000] Heaptypes (original) (raw)

Guido van Rossum guido at python.org
Fri Jul 20 00:25:07 CEST 2007


On 7/19/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:

> But you can do it using bytes('\xff', 'latin-1'). I think that's a > reasonable thing for bytes.reduce() to return.

That's certainly a choice. Another choice is that bytes defaults to latin-1, rather than the system default encoding. This is roughly equivalent, and gives a slightly more compact pickle result.

I don't like bytes defaulting to anything at all; that they currently do is a transitional issue in the branch. Java used to have a default of Latin-1 for converting bytes <--> string and it was considered a mistake AFAIK.

I've implemented the explicit latin-1version for now; we can change this later.

> How about the following. it's not perfect but it's the best I can > think of that doesn't break any pickles. > > In 3.0, when an S, T or U pickle code is encountered, the returned > value is a Unicode string decoded from the bytes using Latin-1. This > means that all S, T or U pickle codes returns Unicode objects. In > those cases where this was really meant to transfer binary data, the > application running under 3.0 can fix this by calling bytes(X, > 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call > str(Y, 'utf-8') after that.

It would actually have to be Y.encode('latin-1').decode('utf-8') (assuming Y is what you get from unpickling):

That's another way of saying it. I meant for Y to be the result of bytes(X, 'latin-1') but that was non-obvious. Anyway I think we're in agreement here. :-)

py> str('\xc3\xb6', 'utf-8') Traceback (most recent call last): File "", line 1, in TypeError: decoding Unicode is not supported

> But 3.0 should only generate the S, T or U pickle codes for str8 > values (as long as that type exists) or for str values containing only > 7-bit ASCII bytes; for all else it should use the unicode pickle > codes. Sounds fine to me. > For bytes, I propose that b"ab\xff".reduce() return (bytes, > ("ab\xff", "latin-1")). See above. Unless somebody objects, I'd rather make latin-1 the default for bytes when a string is passed (I'm uncertain myself of how much explicit is better than implicit here).

See above.

I'll look into implementing that strategy.

How about instead you help with fixing pickling of datetime objects? This broke when I fixed test_pickle. Rolling back your changes to datetime pickling didn't seem to help.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list