[Python-3000] BOM handling (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Wed Sep 13 22:33:22 CEST 2006
- Previous message: [Python-3000] Pre-PEP: Easy Text File Decoding
- Next message: [Python-3000] BOM handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Le mercredi 13 septembre 2006 à 09:41 -0700, Josiah Carlson a écrit :
And is generally ignored, as per unicode spec; it's a "zero width non-breaking space" - an invisible character with no effect on wrapping or otherwise.
Well it would be better if Py3K (with all strings unicode) makes things easy for the programmer and abstracts away those "invisible characters with no textual meaning". Currently it's not the case:
a = "hello".decode("utf-8") b = (codecs.BOMUTF8 + "hello").decode("utf-8") len(a) 5 len(b) 6 a == b False
a = "hello".encode("utf-16le").decode("utf-16le") b = (codecs.BOMUTF16LE + "hello".encode("utf-16le")).decode("utf-16le") len(a) 5 len(b) 6 a == b False a u'hello' b u'\ufeffhello' print a hello print b Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.4/encodings/iso8859_15.py", line 18, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to
Regards
Antoine.
- Previous message: [Python-3000] Pre-PEP: Easy Text File Decoding
- Next message: [Python-3000] BOM handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]