[Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII (original) (raw)
Michael Urman murman at gmail.com
Tue May 10 15:34:38 CEST 2011
- Previous message: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
- Next message: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, May 10, 2011 at 03:03, Victor Stinner <victor.stinner at haypocalc.com> wrote:
If GetProcAddress() expects a byte string encoded to the ANSI code page, my patch is correct because the function used the UTF-8 encoding, not the ANSI code page. We can maybe use GetProcAddressW() to pass a Unicode string. I don't know which encoding is used by GetProcAddressW()...
While I can find references to a GetProcAddressW, most of them seem to agree it doesn't exist. "My kernel32.dll only exports GetProcAddress." This suggests to me it accepts a null-terminated bytestring instead of specifically an ANSI string. What data ends up in the export table is likely similar to the linux filesystem case, only with less likelihood of the environment telling you its encoding.
I already patched PyImportGetDynLoadFunc() for Windows: the path is now a Unicode object instead of a byte string encoded to the filesystem encoding. PyImportGetDynLoadWindows() uses GetFullPathNameW() and LoadLibraryExW(). The work to be fully Unicode compliant (for the path field, not for the name) is not completly done... but I have a pending patch, see: http://bugs.python.org/issue11619
But this patch is huge and creates many functions. I am not sure that we need it, I will work on this later.
I'm comfortable with the idea of requiring UTF-8 encoding for the initmodule entry points of modules named with non-ASCII identifiers, especially if there is nothing which works consistently today. I've only seen pure-ASCII library names in all my C++ work, so I feel it borders on YAGNI (but I like it in theory).
As an alternate approach, one article I read suggested to use ordinals instead of names if you wanted to use non-ASCII names. Python could certainly try to load by ordinal on Windows, and fall back to loading by name. I don't have a clue what the rate of false positives would be.
-- Michael Urman
- Previous message: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
- Next message: [Python-Dev] [Python-checkins] cpython: _PyImport_LoadDynamicModule() encodes the module name explicitly to ASCII
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]