[Python-Dev] Fix Unicode-disabled build of Python 2.7 (original) (raw)

Serhiy Storchaka storchaka at gmail.com
Wed Jun 25 14:55:35 CEST 2014

Previous message: [Python-Dev] Fix Unicode-disabled build of Python 2.7
Next message: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

25.06.14 00:03, Jim J. Jewett написав(ла):

It would be good to fix the tests (and actual library issues). Unfortunately, some of the specifically proposed changes (such as defining and using unicode instead of unicode within python code) look to me as though they would trigger problems in the normal build (where the unicode object does exist, but would no longer be used).

This is recomended by MvL [1] and widely used (19 times in source code) idiom.

[1] http://bugs.python.org/issue8767#msg159473

Other changes, such as the use of \x escapes, appear correct, but make the tests harder to read -- and might end up removing a test for correct unicode funtionality across different spellings.

Even if we assume that the tests are fine, and I'm just an idiot who misread them, the fact that there is any confusion means that these particular changes may be tricky enough to be for a bad tradeoff for 2.7. It might work if you could make a more focused change. For example, instead of leaving the 'unicode' name unbound, provide an object that simply returns false for isinstance and raises a UnicodeError for any other method call. Even this might be too aggressive to 2.7, but the fact that it would only appear in the --disable-unicode builds, and would make them more similar to the regular build are points in its favor.

No, existing code use different approach. "unicode" doesn't exist, while encode/decode methods exist but are useless. If my memory doesn't fail me, there is even special explanatory comment about this historical decision somewhere. This decision was made many years ago.

Before doing that, though, please document what the --disable-unicode mode is actually supposed to do when interacting with byte-streams that a standard defines as UTF-8. (For example, are the changes to xmldumps and xmlloads at http://bugs.python.org/file35758/multiprocessing.patch correct, or do those functions assume they get bytes as input, or should the functions raise an exception any time they are called?)

Looking more carefully, I see that there is a bug in unicode-enable build (wrong backporting from 3.x). In 2.x xmlrpclib.dumps produces already utf-8 encoded string, in 3.x xmlrpc.client.dumps produces unicode string. multiprocessing should fail with non-ascii str or unicode.

Side benefit of my patches is that they expose existing errors in unicode-enable build.

Previous message: [Python-Dev] Fix Unicode-disabled build of Python 2.7
Next message: [Python-Dev] cpython (3.3): Closes #20872: dbm/gdbm/ndbm close methods are not documented
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list