(original) (raw)

On Sat, Nov 5, 2016 at 10:36 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 5 November 2016 at 04:03, Fabio Zadrozny <fabiofz@gmail.com> wrote:
\> On Fri, Nov 4, 2016 at 3:15 PM, Eric V. Smith <eric@trueblade.com> wrote:
>> Using PyParser\_ASTFromString is the easiest possible way to do this. Given
\>> a string, it returns an AST node. What could be simpler?
\>
\>
\> I think that for implementation purposes, given the python infrastructure,
\> it's fine, but for specification purposes, probably incorrect... As I don't
\> think f-strings should accept:
\>
\> f"start {import sys; sys.version\_info\[0\];} end" (i.e.:
\> PyParser\_ASTFromString doesn't just return an expression, it accepts any
\> valid Python code, even code which can't be used in an f-string).

f-strings use the "eval" parsing mode, which starts from the
"eval\_input" node in the grammar (which is only a couple of nodes
higher than 'test', allowing tuples via 'testlist' as well as trailing
newlines and EOF):

\>>> ast.parse("import sys; sys.version\_info\[0\];", mode="eval")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib64/python3.5/ast.py", line 35, in parse
return compile(source, filename, mode, PyCF\_ONLY\_AST)
File "", line 1
import sys; sys.version\_info\[0\];
^
SyntaxError: invalid syntax

You have to use "exec" mode to get the parser to allow statements,
which is why f-strings don't do that:

\>>> ast.dump(ast.parse("import sys; sys.version\_info\[0\];", mode="exec"))
"Module(body=\[Import(names=\[alias(name='sys', asname=None)\]),
Expr(value=Subscript(value=Attribute(value=Name(id='sys', ctx=Load()),
attr='version\_info', ctx=Load()), slice=Index(value=Num(n=0)),
ctx=Load()))\])"

The unique aspect for f-strings that means they don't permit some
otherwise valid Python expressions is that it also does the initial
pre-tokenisation based on:

1\. Look for an opening '{'
2\. Look for a closing '!', ':' or '}' accounting for balanced string
quotes, parentheses, brackets and braces

Ignoring the surrounding quotes, and using the \`atom\` node from
Python's grammar to represent the nesting tracking, and TEXT to stand
in for arbitrary text, it's something akin to:

fstring: (TEXT \['{' maybe\_pyexpr ('!' | ':' | '}')\])+
maybe\_pyexpr: (atom | TEXT)+

That isn't quite right, since it doesn't properly account for brace
nesting, but it gives the general idea - there's an initial really
simple tokenising pass that picks out the potential Python
expressions, and then those are run through the AST parser's
equivalent of eval().

Cheers,
Nick.

\--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick and Eric,

Just wanted to say thanks for the feedback and point to a grammar I ended up doing on my side (in JavaCC), just in case someone else decides to do a formal grammar later on it can probably be used as a reference (shouldn't be hard to convert it to a bnf grammar):

https://github.com/fabioz/Pydev/blob/master/plugins/org.python.pydev.parser/src/org/python/pydev/parser/grammar\_fstrings/grammar\_fstrings.jjt

Also, as a feedback, I found it a bit odd that there can't be any space nor new line between the last format specifiers and '}'

I.e.:

f'''{

dict(

a = 10

)

}

'''

is not valid, whereas

f'''{

dict(

a = 10

)

!r}

'''

is valid -- as a note, this means my grammar has a bug as both versions are accepted -- and I currently don't care enough about that change from the implementation to fix it ;)

Cheers,

Fabio