Issue 33242: Support binary symbol names (original) (raw)

Created on 2018-04-08 21:19 by smurfix, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg315096 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-04-08 21:19
ctypes should support binary symbols. Rationale: There's no requirement that the symbol name in question is encoded as ASCII or UTF-8. >>> import ctypes >>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]}) Traceback (most recent call last): File "", line 1, in TypeError: '_fields_' must be a sequence of (name, C type) pairs
msg315097 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-04-08 21:51
Field names define CField descriptor attributes on the class. Attribute names should be strings, not bytes. There's no syntactically clean way to use a bytes name. Consider the example of a generic property on a class: >>> T = type('T', (), {b'p': property(lambda s: 0)}) >>> t = T() >>> t.p Traceback (most recent call last): File "", line 1, in AttributeError: 'T' object has no attribute 'p' >>> getattr(t, b'p') Traceback (most recent call last): File "", line 1, in TypeError: getattr(): attribute name must be string We'd have to dig into the class dict and manually bind the property: >>> vars(T)[b'p'].__get__(t) 0
msg315098 - (view) Author: Matthias Urlichs (smurfix) * Date: 2018-04-08 22:27
Well, the original problem remains: symbol names aren't constrained to UTF-8 … so if I happen to stumble onto one of those (maybe generated by a code obfuscator), the answer is "don't use Python3 then"?
msg315099 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2018-04-08 23:15
If you're automatically wrapping a C source file and don't know the source encoding, you could naively decode it as Latin-1. You're still faced with the problem of characters that Python doesn't allow in identifiers. For example, gcc allows "$" in C identifiers (e.g. a field named "egg$"), but Python doesn't allow this character. At least you can use getattr() to access such names. For example: >>> s = bytes(range(256)).decode('latin-1') >>> T = type('T', (), {s: 0}) >>> t = T() >>> getattr(t, s) 0
History
Date User Action Args
2022-04-11 14:58:59 admin set github: 77423
2018-04-08 23:15:38 eryksun set messages: +
2018-04-08 22:27:31 smurfix set messages: +
2018-04-08 21:54:15 eryksun set resolution: not a bug -> rejected
2018-04-08 21:51:45 eryksun set status: open -> closednosy: + eryksunmessages: + resolution: not a bugstage: resolved
2018-04-08 21:32:52 ned.deily set nosy: + amaury.forgeotdarc, belopolsky, meador.inge
2018-04-08 21:19:25 smurfix create