Issue 999444: compiler module doesn't support unicode characters in laiter (original) (raw)
I'm not positive that this is a bug. The buit-in compile function acepts unicode with non-ascii text in literals:
text = u"print u'''\u0442\u0435\u0441\u0442'''" exec compile(text, 's', 'exec') теÑÑ‚ import compiler exec compiler.compile(text, 's', 'exec') Traceback (most recent call last): File "", line 1, in ? File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 64, in compile gen.compile() File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 111, in compile tree = self._get_tree() File "/usr/local/python/2.3.4/lib/python2.3/compiler/pycodegen.py", line 77, in _get_tree tree = parse(self.source, self.mode) File "/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py", line 50, in parse return Transformer().parsesuite(buf) File "/usr/local/python/2.3.4/lib/python2.3/compiler/transformer.py", line 120, in parsesuite return self.transform(parser.suite(text)) UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128)
Logged In: YES user_id=6656
the immediate problem is that the parser module does support unicode:
import parser parser.suite(u"print u'''\u0442\u0435\u0441\u0442'''") Traceback (most recent call last): File "", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-13: ordinal not in range(128)
there may well be more bugs lurking in Lib/compiler wrt this issue, but this is the first... I don't know how easy this will be to fix (looking at what the builtin compile() function does with unicode might be a good start).