[Python-Dev] Patch making the current email package (mostly) support bytes (original) (raw)
Scott Dial scott+python-dev at scottdial.com
Mon Oct 4 18:32:26 CEST 2010
- Previous message: [Python-Dev] Patch making the current email package (mostly) support bytes
- Next message: [Python-Dev] Patch making the current email package (mostly) support bytes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10/2/2010 7:00 PM, R. David Murray wrote:
The clever hack (thanks ultimately to Martin) is to accept 8bit data by encoding it using the ASCII codec and the surrogateescape error handler.
I've seen this idea pop up in a number of threads. I worry that you are all inventing a new kind of dual that is a direct parallel to Python 2.x strings. That is to say,
3.x>>> b = b'\xc2\xa1' 3.x>>> s = b.decode('utf8') 3.x>>> v = b.decode('ascii', 'surrogateescape')
, where s and v should be the same "thing" in 3.x but they are not due to an encoding trick. I believe this trick generates more-or-less the same issues as strings did in 2.x:
2.x>>> b = '\xc2\xa1' 2.x>>> s = b.decode('utf8') 2.x>>> v = b
Any reasonable 2.x code has to guard on str/unicode and it would seem in 3.x, if this idiom spreads, reasonable code will have to guard on surrogate escapes (which actually seems like a more expensive test). As in,
3.x>>> print(v) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc2' in position 0: surrogates not allowed
It seems like this hack is about making the 3.x unicode type more like the 2.x string type, and I thought we decided that was a bad idea. How will developers not have to ask themselves whether a given string is a "real" string or a byte sequence masquerading as a string? Am I missing something here?
-- Scott Dial scott at scottdial.com scodial at cs.indiana.edu
- Previous message: [Python-Dev] Patch making the current email package (mostly) support bytes
- Next message: [Python-Dev] Patch making the current email package (mostly) support bytes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]