[Python-Dev] PEP 498: Literal String Interpolation is ready for pronouncement (original) (raw)

Nathaniel Smith njs at pobox.com
Sun Sep 6 01:12:02 CEST 2015

Previous message (by thread): [Python-Dev] PEP 498: Literal String Interpolation is ready for pronouncement
Next message (by thread): [Python-Dev] PEP 498: Literal String Interpolation is ready for pronouncement
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, Sep 5, 2015 at 1:00 PM, Eric V. Smith <eric at trueblade.com> wrote:

On 9/5/2015 3:23 PM, Nathaniel Smith wrote:

On Sep 5, 2015 11:32 AM, "Eric V. Smith" <eric at trueblade.com_ _<mailto:eric at trueblade.com>> wrote:

Ignore the part about non-doubled '}'. The actual description is:

To find the end of an expression, it looks for a '!', ':', or '}', not inside of a string or (), [], or {}. There's a special case for '!=' so the bang isn't seen as ending the expression. Sounds like you're reimplementing a lot of the lexer... I guess that's doable, but how confident are you that your definition of "inside a string" matches the original in all corner cases? Well, this is 35 lines of code (including comments), and it's much simpler than a lexer (in the sense of "something that generates tokens"). So I don't think I'm reimplementing a lot of the lexer. However, your point is valid: if I don't do the same thing the lexer would do, I could either prematurely find the end of an expression, or look too far. In either case, when I call ast.parse() I'll get a syntax error, and/or I'll get an error when parsing/lexing the remainder of the string. But it's not like I have to agree with the lexer: no larger error will occur if I get it wrong. Everything is confined to a single f-string, since I've already used the lexer to find the f-string in its entirety. I only need to make sure the users understand how expressions are extracted from f-strings. I did look at using the actual lexer (Parser/tokenizer.c) to do this, but it would require a large amount of surgery. I think it's overkill for this task. So far, I've tested it enough to have reasonable confidence that it's correct. But the implementation could always be swapped out for an improved version. I'm certainly open to that, if we find cases that the simple scanner can't deal with. In any case the abstract language definition part should be phrased in terms of the python lexer -- the expression ends when you encounter the first } token that is not nested inside () [] {} tokens, and then you can implement it however makes sense... I'm not sure that's an improvement on Guido's description when you're trying to explain it to a user. But when time comes to write the documentation, we can discuss it then.

I'm not talking about end-user documentation, I'm talking about the formal specification, like in the Python Language Reference.

I'm pretty sure that just calling the tokenizer will be easier for Cython or PyPy than implementing a special purpose scanner :-)

(This is then the same rule that patsy uses to find the end of python expressions embedded inside patsy formula strings: patsy.readthedocs.org <http://patsy.readthedocs.org>) I don't see where patsy looks for expressions in parts of strings. Let me know if I'm missing it.

Patsy parses strings like

"np.sin(a + b) + c"

using a grammar that supports some basic arithmetic-like infix operations (+, *, parentheses, etc.), and in which the atoms are arbitrary Python expressions. So the above string is parsed into a patsy-AST that looks something like:

Add(PyExpr("np.sin(a + b)"), PyExpr("c"))

The rule it uses to do this is that it uses the Python tokenizer, counts nesting of () [] {}, and when it sees a valid unnested patsy operator, then that's the end of the embedded expression:

https://github.com/pydata/patsy/blob/master/patsy/parse_formula.py#L37

Not tremendously relevant, but that's why I've thought this through before :-)

-n

-- Nathaniel J. Smith -- http://vorpus.org

Previous message (by thread): [Python-Dev] PEP 498: Literal String Interpolation is ready for pronouncement
Next message (by thread): [Python-Dev] PEP 498: Literal String Interpolation is ready for pronouncement
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list