[Python-Dev] 2.2.1 issues (original) (raw)

M.-A. Lemburg mal@lemburg.com
Tue, 19 Feb 2002 15:34:24 +0100


Michael Hudson wrote:

Well, we have the first 2.2 bugfix that isn't a no-brainer to port to 2.2.1. This is to do with the [ #495401 ] Build troubles: --with-pymalloc bug. As far as understand it, there were two problems. 1) with wide unicode characters, some function in unicodeobject.c to do with interpreting escape codes could write into memory it didn't own. 2) something to do with the handling of "unpaired high surrogates" in the utf-8 codec. Were these problems related? I think they got fixed at the same time, but I may have gotten confused.

Right. 1) was caused by 2). Both are fixed now.

1) shouldn't be too much of an issue to get into 2.2.1 (there was some contention about which fix performed better, but for 2.2.1 I don't care too much).

2) is more troublesome, because to fix it properly breaks .pycs, in turn because marshal uses the utf-8 codec to store unicode string constants, and this is a no-no according to PEP 6. Is it possible to worm around 2) by reconstructing valid strings from the bad marshal data, or has information been lost? How severe is the bug? Maybe it would be best to leave it unfixed in 2.2.1.

Well, I posted a message to python-dev or the checkins list about this (don't remember). The situation is basically like this:

In Python <= 2.2.0, you could write

u = u"\uD800"

in a .py file. The first time you import this file, Python will create a .pyc file for it using the broken UTF-8 encoding. The import will succeed. The second time you import the module, Python will try to use the .pyc file. Now reading that file in fails with a UnicodeError and Python also does not revert to the .py file.

As a result, modules using unpaired surrogates in Unicode literals are simply broken in Python <= 2.2.0.

The problem with backporting this patch is that in order for Python to properly recompile any broken module, the magic will have to be changed. Question is whether this is a reasonable thing to do in a patch level release...

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH


Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/