[Python-3000] Invalid \U escape in source code give hard-to-trace error (original) (raw)
Georg Brandl g.brandl at gmx.net
Wed Jul 18 23:42:56 CEST 2007
- Previous message: [Python-3000] Invalid \U escape in source code give hard-to-trace error
- Next message: [Python-3000] Invalid \U escape in source code give hard-to-trace error
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido van Rossum schrieb:
On 7/17/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> When a source file contains a string literal with an out-of-range \U > escape (e.g. "\U12345678"), instead of a syntax error pointing to the > offending literal, I get this, without any indication of the file or > line: > > UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in > position 0-9: illegal Unicode character > > This is quite hard to track down.
I think the fundamental flaw is that a codec is used to implement the Python syntax (or, rather, lexical rules). Not quite sure what the rationale for this design was; doing it on the lexical level is (was) tricky because \u escapes were allowed only for Unicode literals, and the lexer had no knowledge of the prefix preceding a literal. (In 3k, it's still similar, because \U escapes have no effect in bytes and raw literals). Still, even if it is "only" handled at the parsing level, I don't see why it needs to be a codec. Instead, implementing escapes in the compiler would still allow for proper diagnostics (notice that in the AST the original lexical form of the string literal is gone). I guess because it was deemed useful to have a codec for this purpose too, thereby exposing the algorithm to Python code that needs the same functionality (e.g. the compiler package, RIP).
And it still is useful. If you want to convert a string into a printable representation, you can use repr(), but for the inverse you need this codec. (or eval()...)
Georg
- Previous message: [Python-3000] Invalid \U escape in source code give hard-to-trace error
- Next message: [Python-3000] Invalid \U escape in source code give hard-to-trace error
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]