Issue 3502: Inconsistency between string.letters and default encoding. (original) (raw)

In python on Windows, under Idle, the string.letters includes extended characters. But the default codec, used when translating from string to unicode, is still ascii. This behaviour causes crashes with python win32 extensions.

string.letters

'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

But still, unless the user customizes the installation, sys.getdefaultencoding() returns ascii.

The consequence is that after instating a COM object, pywin32 211 issues this exception:

File "C:\Python25\Lib\site-packages\win32com\client\build.py", line 297, in MakeFuncMethod return self.MakeDispatchFuncMethod(entry, name, bMakeClass) File "C:\Python25\Lib\site-packages\win32com\client\build.py", line 318, in MakeDispatchFuncMethod s = linePrefix + 'def ' + name + '(self' + BuildCallList(fdesc, names, defNamedOptArg, defNamedNotOptArg, defUnnamedArg, defOutArg) + '):' File "C:\Python25\Lib\site-packages\win32com\client\build.py", line 604, in BuildCallList argName = MakePublicAttributeName(argName) File "C:\Python25\Lib\site-packages\win32com\client\build.py", line 542, in MakePublicAttributeName return filter( lambda char: char in valid_identifier_chars, className) File "C:\Python25\Lib\site-packages\win32com\client\build.py", line 542, in return filter( lambda char: char in valid_identifier_chars, className) UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 52: ordinal not in range(128)

The line that causes this exception is from win32com.client.build.

This fragment is enough to reproduce the bug (from build.py in win32com/client):

valid_identifier_chars = string.letters + string.digits + "_" ... return filter( lambda char: char in valid_identifier_chars, className)

Try to print the expression in the return statement and set className to anything you wish in Unicode. It will crash

It is contradictory that the default codec does not allow translation of characters 0x83, and that string.letters includes it. If one regards this character as printable, then it should be encoded successfully.