[Python-3000] Invalid \U escape in source code give hard-to-trace error (original) (raw)

Kurt B. Kaiser kbk at shore.net
Wed Jul 18 08:04:13 CEST 2007


"Guido van Rossum" <guido at python.org> writes:

When a source file contains a string literal with an out-of-range \U escape (e.g. "\U12345678"), instead of a syntax error pointing to the offending literal, I get this, without any indication of the file or line:

UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character This is quite hard to track down. (Both the location of the bad literal in the source file, and the origin of the error in the parser. :-) Can someone come up with a fix? I note that raw escapes show a slightly different error. I also note that the same issue exists for u"..." literals in Python 2.5.

For what it's worth, I posted a patch to ast.c against the 2.6 trunk which massages the unicode exception into a SyntaxError showing the location.

That approach lets unicodeobject.c handle the gory details while ast.c handles the SyntaxError generation. It might be a solution until something deeper along the lines of Martin's thoughts is possibly developed.

I don't think that any reference adjustments are needed, but someone should check the patch.

www.python.org/sf/1755885

-- KBK



More information about the Python-3000 mailing list