[Python-3000] Heaptypes (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Thu Jul 19 22:26:35 CEST 2007


But you can do it using bytes('\xff', 'latin-1'). I think that's a reasonable thing for bytes.reduce() to return.

That's certainly a choice. Another choice is that bytes defaults to latin-1, rather than the system default encoding. This is roughly equivalent, and gives a slightly more compact pickle result.

How about the following. it's not perfect but it's the best I can think of that doesn't break any pickles.

In 3.0, when an S, T or U pickle code is encountered, the returned value is a Unicode string decoded from the bytes using Latin-1. This means that all S, T or U pickle codes returns Unicode objects. In those cases where this was really meant to transfer binary data, the application running under 3.0 can fix this by calling bytes(X, 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call str(Y, 'utf-8') after that.

It would actually have to be Y.encode('latin-1').decode('utf-8') (assuming Y is what you get from unpickling):

py> str('\xc3\xb6', 'utf-8') Traceback (most recent call last): File "", line 1, in TypeError: decoding Unicode is not supported

But 3.0 should only generate the S, T or U pickle codes for str8 values (as long as that type exists) or for str values containing only 7-bit ASCII bytes; for all else it should use the unicode pickle codes.

Sounds fine to me.

For bytes, I propose that b"ab\xff".reduce() return (bytes, ("ab\xff", "latin-1")).

See above. Unless somebody objects, I'd rather make latin-1 the default for bytes when a string is passed (I'm uncertain myself of how much explicit is better than implicit here).

I'll look into implementing that strategy.

Regards, Martin



More information about the Python-3000 mailing list