[Python-Dev] Python Grammar Ambiguity (original) (raw)

Guido van Rossum guido at python.org
Mon Apr 24 16:12:40 CEST 2006


Well, yes, the syntax is supposed to be something like "for varlist in testlist". Could you report this as a doc bug (if you found this information in the docs)?

On 4/24/06, Michael Foord <fuzzyman at voidspace.org.uk> wrote:

Hello all,

I'm working on a parser for part of the Python language (expressions but not statements basically). I'm using PLY to generate the parser and it's mostly done. I've hit on what looks like a fundamental ambiguity in the Python grammar which is difficult to get round with PLY; and I'm wondering why the grammar is defined in this way. It's possible there is a reason that I've missed, which means I need to rethink my workaround. List displays (list comprehensions) are defined as (from http://docs.python.org/ref/lists.html ) :

test ::= andtest ( "or" andtest )* | lambdaform testlist ::= test ( "," test )* [ "," ] listdisplay ::= "[" [listmaker] "]" listmaker ::= expression ( listfor | ( "," expression )* [","] ) listiter ::= listfor | listif listfor ::= "for" expressionlist "in" testlist [listiter] listif ::= "if" test [listiter] The problem is that listfor is defined as : "for" expressionlist "in" testlist This allows arbitrary expressions in the 'assignment' part of a list comprehension. As a result, the following is valid syntax according to the grammar : [x for x + 1 in y] Obviously it isn't valid ! This parses to an ast, but the syntax error is thrown when you compile the resulting ast. The problem is that for the basic case of a list comprehension ( ``[x for x in y]), x in y`` is a valid expression. That makes it extremely hard to disambiguate the grammar so that the in is treated correctly, and not part of an expression. My question is, why are arbitrary expressions allowed here in the grammar ? As far as I can tell, only identifiers (nested in parentheses or brackets) are valid here. I've got round the problem by creating a new node 'identifierlist' and just having that return the expected syntax tree (actually an expression list). This gets round the ambiguity [#]. It worries me that there might be a valid expression allowed here that I haven't thought of. My current rules allow anything that looks like (a, [b, c, (d, e)], f) - any nested identifier list. Would anything else be allowed ? If not, why not modify the grammar so that the compiler has less possible invalid syntax trees to work with ? (Also the grammar definition of string conversion is wrong as it states that a trailing comma is valid, which isn't the case. As far as I can tell that is necessary to allow nesting string conversions.) Fuzzyman http://www.voidspace.org.uk/python/index.shtml .. [#] If I could make precedence work in PLY I could also solve it I guess. However I can't. :-)


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list