[Python-Dev] Why aren't escape sequences in literal strings handled by the tokenizer? (original) (raw)

Larry Hastings larry at hastings.org
Thu May 17 15:01:05 EDT 2018

Previous message (by thread): [Python-Dev] Visual Studio Team Services checks on pull requests
Next message (by thread): [Python-Dev] Why aren't escape sequences in literal strings handled by the tokenizer?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I fed this into tokenize.tokenize():

b''' x = "\u1234" '''

I was a bit surprised to see \Uxxxx in the output. Particularly because the output (t.string) was a string and not bytes.

It turns out, Python's tokenizer ignores escape sequences. All it does is ignore the next character so that " does the proper thing. But it doesn't do any substitutions. The escape sequences are only handled when the AST node is created for the literal string!

Maybe I'm making a parade of my ignorance, but I assumed that string literals were parsed by the parser--just like everything else is parsed by the parser, hey it seems like a good place for it--and in particular that the escape sequence substitutions would be done in the tokenizer. Having stared at it a little, I now detect a whiff of "this design solved a real problem". So... what was the problem, and how does this design solve it?

BTW, my use case is that I hoped to use CPython's tokenizer to parse some Python-ish-looking text and handle double-quoted strings for me. Especially all the escape sequences--leveraging all CPython's support for funny things like \U{penguin}. The current behavior of the tokenizer makes me think it'd be easier to roll my own!

//arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20180517/85d055a2/attachment.html>

Previous message (by thread): [Python-Dev] Visual Studio Team Services checks on pull requests
Next message (by thread): [Python-Dev] Why aren't escape sequences in literal strings handled by the tokenizer?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list