[Python-Dev] Python Grammar Ambiguity (original) (raw)
Michael Foord fuzzyman at voidspace.org.uk
Mon Apr 24 21:21:36 CEST 2006
- Previous message: [Python-Dev] Python Grammar Ambiguity
- Next message: [Python-Dev] Python Grammar Ambiguity
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido van Rossum wrote:
This is probably because we have a similar ambiguity in assignments: the grammar says something like
exprlist ('=' exprlist)* but what is actually desired is (varlist '=')* exprlist Unfortunately the latter is not LL1 so we lie to the parser and tell it the first form, and then in the code generator checks that the actual parse tree for all exprlists except for the last conforms to the more limited syntax for varlist (which is only in our head). If not, it issues a "phase 2" SyntaxError. (One that doesn't have the exact column information.) We didn't have to do this for the syntax of the for-statement, list-comprehensions etc. because the ambiguity doesn't exist there (the 'for' keyword disambiguates the situation); but the current approach allows more code sharing in the code generator. I suggest you go ahead and write the second form for your own parser. Coming up with the correct rules for varlist is not hard. We've already done that, I was just checking there wasn't some case we'd missed were an expression was possible.
Thanks
Michael
--Guido
On 4/24/06, Michael Foord <michael.foord at resolversystems.com> wrote:
(oops - should have gone to list)
Guido van Rossum wrote:
Well, yes, the syntax is supposed to be something like "for varlist in testlist". Could you report this as a doc bug (if you found this information in the docs)?
I think the documentation (which does put expressionlist there) reflects the current state of the parser. First of all the grammar in SVN also has expressionlist there and the following does successfully parse to an ast (but fails when you compile the ast) : [x for x + 1 in y] All the best, Michael Foord
On 4/24/06, Michael Foord <fuzzyman at voidspace.org.uk> wrote:
Hello all, I'm working on a parser for part of the Python language (expressions but not statements basically). I'm using PLY to generate the parser and it's mostly done. I've hit on what looks like a fundamental ambiguity in the Python grammar which is difficult to get round with PLY; and I'm wondering why the grammar is defined in this way. It's possible there is a reason that I've missed, which means I need to rethink my workaround. List displays (list comprehensions) are defined as (from http://docs.python.org/ref/lists.html ) :
test ::= andtest ( "or" andtest )* | lambdaform testlist ::= test ( "," test )* [ "," ] listdisplay ::= "[" [listmaker] "]" listmaker ::= expression ( listfor | ( "," expression )* [","] ) listiter ::= listfor | listif listfor ::= "for" expressionlist "in" testlist [listiter] listif ::= "if" test [listiter] The problem is that listfor is defined as : "for" expressionlist "in" testlist This allows arbitrary expressions in the 'assignment' part of a list comprehension. As a result, the following is valid syntax according to the grammar : [x for x + 1 in y] Obviously it isn't valid ! This parses to an ast, but the syntax error is thrown when you compile the resulting ast. The problem is that for the basic case of a list comprehension ( ``[x for x in y]
),
x in y`` is a valid expression. That makes it extremely hard to disambiguate the grammar so that thein
is treated correctly, and not part of an expression. My question is, why are arbitrary expressions allowed here in the grammar ? As far as I can tell, only identifiers (nested in parentheses or brackets) are valid here. I've got round the problem by creating a new node 'identifierlist' and just having that return the expected syntax tree (actually an expression list). This gets round the ambiguity [#]. It worries me that there might be a valid expression allowed here that I haven't thought of. My current rules allow anything that looks like(a, [b, c, (d, e)], f)
- any nested identifier list. Would anything else be allowed ? If not, why not modify the grammar so that the compiler has less possible invalid syntax trees to work with ? (Also the grammar definition of string conversion is wrong as it states that a trailing comma is valid, which isn't the case. As far as I can tell that is necessary to allow nesting string conversions.) Fuzzyman http://www.voidspace.org.uk/python/index.shtml .. [#] If I could make precedence work in PLY I could also solve it I guess. However I can't. :-)
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
- Previous message: [Python-Dev] Python Grammar Ambiguity
- Next message: [Python-Dev] Python Grammar Ambiguity
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]