[Python-3000] Heaptypes (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Thu Jul 19 22:26:35 CEST 2007
- Previous message: [Python-3000] Heaptypes
- Next message: [Python-3000] Heaptypes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
But you can do it using bytes('\xff', 'latin-1'). I think that's a reasonable thing for bytes.reduce() to return.
That's certainly a choice. Another choice is that bytes defaults to latin-1, rather than the system default encoding. This is roughly equivalent, and gives a slightly more compact pickle result.
How about the following. it's not perfect but it's the best I can think of that doesn't break any pickles.
In 3.0, when an S, T or U pickle code is encountered, the returned value is a Unicode string decoded from the bytes using Latin-1. This means that all S, T or U pickle codes returns Unicode objects. In those cases where this was really meant to transfer binary data, the application running under 3.0 can fix this by calling bytes(X, 'latin-1'). If it was meant to be UTF-8-encoded text, the app can call str(Y, 'utf-8') after that.
It would actually have to be Y.encode('latin-1').decode('utf-8') (assuming Y is what you get from unpickling):
py> str('\xc3\xb6', 'utf-8') Traceback (most recent call last): File "", line 1, in TypeError: decoding Unicode is not supported
But 3.0 should only generate the S, T or U pickle codes for str8 values (as long as that type exists) or for str values containing only 7-bit ASCII bytes; for all else it should use the unicode pickle codes.
Sounds fine to me.
For bytes, I propose that b"ab\xff".reduce() return (bytes, ("ab\xff", "latin-1")).
See above. Unless somebody objects, I'd rather make latin-1 the default for bytes when a string is passed (I'm uncertain myself of how much explicit is better than implicit here).
I'll look into implementing that strategy.
Regards, Martin
- Previous message: [Python-3000] Heaptypes
- Next message: [Python-3000] Heaptypes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]