[Python-Dev] utf8 issue (original) (raw)
M.-A. Lemburg mal@egenix.com
Thu, 05 Sep 2002 11:14:06 +0200
- Previous message: [apug] Re: [Python-Dev] Call for clarity ( clarification ;-) )
- Next message: [Python-Dev] utf8 issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido van Rossum wrote:
Guido van Rossum <guido@python.org> writes:
This might beling on SF, except it's already been solved in Python 2.3, and I need guidance about what to do for Python 2.2.2. In 2.2.1, a lone surrogate encoded into utf8 gives an utf8 string that cannot be decode back. In 2.3, this is fixed. Should this be fixed in 2.2.2 as well? I think this was discussed really quite a long time ago, like six months or so. I'm asking because it caused problems with reading .pyc files: if there's a Unicode literal containing a lone surrogate, reading the .pyc file causes an exception: UnicodeError: UTF-8 decoding error: unexpected code byte It looks like revision 2.128 fixed this for 2.3, but that patch doesn't cleanly apply to the 2.2 maintenance branch. Can someone help? I think the reason this didn't get fixed in 2.2.1 is that it necessitates bumping MAGIC. I can probably dig up more references if you want. Please do. Bumping MAGIC is a no-no between dot releases. But I don't understand why that is necessary?
It would be necessary since marshal uses UTF-8 for storing Unicode literals. Even though it's highly unlikely that the problem cases are used in Python Unicode literals, there's a tiny chance. Without the MAGIC change this could result in PYC files failing to load.
-- Marc-Andre Lemburg CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
- Previous message: [apug] Re: [Python-Dev] Call for clarity ( clarification ;-) )
- Next message: [Python-Dev] utf8 issue
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]