[Python-Dev] Parsing f-strings from PEP 498 -- Literal String Interpolation (original) (raw)

Fabio Zadrozny fabiofz at gmail.com
Wed Nov 9 11:20:27 EST 2016


On Sat, Nov 5, 2016 at 10:36 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

On 5 November 2016 at 04:03, Fabio Zadrozny <fabiofz at gmail.com> wrote: > On Fri, Nov 4, 2016 at 3:15 PM, Eric V. Smith <eric at trueblade.com> wrote: >> Using PyParserASTFromString is the easiest possible way to do this. Given >> a string, it returns an AST node. What could be simpler? > > > I think that for implementation purposes, given the python infrastructure, > it's fine, but for specification purposes, probably incorrect... As I don't > think f-strings should accept: > > f"start {import sys; sys.versioninfo[0];} end" (i.e.: > PyParserASTFromString doesn't just return an expression, it accepts any > valid Python code, even code which can't be used in an f-string).

f-strings use the "eval" parsing mode, which starts from the "evalinput" node in the grammar (which is only a couple of nodes higher than 'test', allowing tuples via 'testlist' as well as trailing newlines and EOF): >>> ast.parse("import sys; sys.versioninfo[0];", mode="eval") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.5/ast.py", line 35, in parse return compile(source, filename, mode, PyCFONLYAST) File "", line 1 import sys; sys.versioninfo[0]; ^ SyntaxError: invalid syntax You have to use "exec" mode to get the parser to allow statements, which is why f-strings don't do that: >>> ast.dump(ast.parse("import sys; sys.versioninfo[0];", mode="exec")) "Module(body=[Import(names=[alias(name='sys', asname=None)]), Expr(value=Subscript(value=Attribute(value=Name(id='sys', ctx=Load()), attr='versioninfo', ctx=Load()), slice=Index(value=Num(n=0)), ctx=Load()))])" The unique aspect for f-strings that means they don't permit some otherwise valid Python expressions is that it also does the initial pre-tokenisation based on: 1. Look for an opening '{' 2. Look for a closing '!', ':' or '}' accounting for balanced string quotes, parentheses, brackets and braces Ignoring the surrounding quotes, and using the atom node from Python's grammar to represent the nesting tracking, and TEXT to stand in for arbitrary text, it's something akin to: fstring: (TEXT ['{' maybepyexpr ('!' | ':' | '}')])+ maybepyexpr: (atom | TEXT)+ That isn't quite right, since it doesn't properly account for brace nesting, but it gives the general idea - there's an initial really simple tokenising pass that picks out the potential Python expressions, and then those are run through the AST parser's equivalent of eval(). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

​Hi Nick and Eric,

Just wanted to say thanks for the feedback and point to a grammar I ended up doing on my side (in JavaCC), just in case someone else decides to do a formal grammar later on it can probably be used as a reference (shouldn't be hard to convert it to a bnf grammar):

https://github.com/fabioz/Pydev/blob/master/plugins/org.python.pydev.parser/src/org/python/pydev/parser/grammar_fstrings/grammar_fstrings.jjt

Also, as a feedback, I found it a bit odd that there can't be any space nor new line between the last format specifiers and '}'

I.e.:

f'''{ dict( a = 10 ) !r } '''

​is not valid, whereas ​

​ f'''{ dict( a = 10 ) !r} '''​ is valid -- as a note, this means my grammar has a bug as both versions are accepted -- and I currently don't care enough about that change from the implementation to fix it ;)

Cheers,

Fabio​ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20161109/6f54ede3/attachment.html>



More information about the Python-Dev mailing list