(original) (raw)

On Sep 5, 2015 11:32 AM, "Eric V. Smith" <eric@trueblade.com> wrote:
\>
\> > Actually, my current implementation doesn't use the lexer, although I
\> > suppose it could. I'm currently manually scanning the string, keeping
\> > track of strings and parens. To find the end of an expression, it looks
\> > for a '!', ':', or non-doubled '}', not inside of a string or (), \[\], or
\> > {}. There's a special case for '!=' so the bang isn't seen as ending the
\> > expression.
\>
\> Ignore the part about non-doubled '}'. The actual description is:
\>
\> To find the end of an expression, it looks for a '!', ':', or '}', not
\> inside of a string or (), \[\], or {}. There's a special case for '!=' so
\> the bang isn't seen as ending the expression.

Sounds like you're reimplementing a lot of the lexer... I guess that's doable, but how confident are you that your definition of "inside a string" matches the original in all corner cases?

In any case the abstract language definition part should be phrased in terms of the python lexer -- the expression ends when you encounter the first } \*token\* that is not nested inside () \[\] {} \*tokens\*, and then you can implement it however makes sense...

(This is then the same rule that patsy uses to find the end of python expressions embedded inside patsy formula strings: patsy.readthedocs.org)

-n