[Python-Dev] PEP 3101 Update (original) (raw)

Talin talin at acm.org
Fri May 19 10:41:33 CEST 2006

Previous message: [Python-Dev] PEP 3101 Update
Next message: [Python-Dev] PEP 3101 Update
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Guido van Rossum wrote:

On 5/6/06, Talin <talin at acm.org> wrote:

I've updated PEP 3101 based on the feedback collected so far. [http://www.python.org/dev/peps/pep-3101/] I think this is a step in the right direction.

Cool, and thanks for the very detailed feedback.

I wonder if we shouldn't borrow more from .NET. I read this URL that you referenced:

http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp

They have special syntax to support field width, e.g. {0,10} formats item 0 in a field of (at least) 10 positions wide, right-justified; {0,-10} does the same left-aligned. This is done independently from

We already have that now, don't we? If you look at the docs for "String Formatting Operations" in the library reference, it shows that a negative sign on a field width indicates left justification.

the type-specific formatting. (I'm not proposing that we use .NET's format specifiers after the colon, but I'm also no big fan for keeping the C specific stuff we have now; we should put some work in designing something with the same power as the current %-based system for floats and ints, that would cover it.)

Agreed. As you say, the main work is in handling floats and ints, and everything else can either be formatted as plain str(), or use a custom format specifier syntax (as in my strftime example.)

.NET's solution for quoting { and } as {{ and }} respectively also sidesteps the issue of how to quote \ itself -- since '\{' is a 2-char string containing one \ and one {, you'd have to write either '\\{0}' or r'\{0}' to produce a single literal \ followed by formatted item 0. Any time there's the need to quadruple a backslash I think we've lost the battle. (Or you might search the web for Tcl quoting hell. :-)

I'm fine with not having a solution for doing variable substitution within the format parameters. That could be done instead by building up the format string with an extra formatting step: instead of "{x:{y}}".format(x=whatever, y=3) you could write "{{x,{y}}}".format(y=3).format(x=whatever). (Note that this is subtle: the final }}} are parsed as } followed by }}. Once the parser has seen a single {, the first } it sees is the matching closing } and adding another } after it won't affect it. The specifier cannot contain { or } at all.

There is another solution to this which is equally subtle, although fairly straightforward to parse. It involves defining the rules for escapes as follows:

'{{' is an escaped '{'
'}}' is an escaped '}', unless we are within a field.

So you can write things like {0:10,{1}}, and the final '}}' will be parsed as two separate closing brackets, since we're within a field definition.

From a parsing standpoint, this is unambiguous, however I've held off on suggesting it because it might appear to be ambiguous to a casual reader.

I like having a way to reuse the format parsing code while substituting something else for the formatting itself.

The PEP appears silent on what happens if there are too few or too many positional arguments, or if there are missing or unused keywords. Missing ones should be errors; I'm not sure about redundant (unused) ones. On the one hand complaining about those gives us more certainty that the format string is correct. On the other hand there are some use cases for passing lots of keyword parameters (e.g. simple web templating could pass a fixed set of variables using **dict). Even in i18n (translation) apps I could see the usefulness of allowing unused parameters

I am undecided on this issue as well, which is the reason that it's not mentioned in the PEP (yet).

On the issue of {a.b.c}: like several correspondents, I don't like the ambiguity of attribute vs. key refs much, even though it appears useful enough in practice in web frameworks I've used. It seems to violate the Zen of Python: "In the face of ambiguity, refuse the temptation to guess."

Unfortunately I'm pretty lukewarm about the proposal to support {a[b].c} since b is not a variable reference but a literal string 'b'. It is also relatively cumbersome to parse. I wish I could propose {a+b.c} for this case but that's so arbitrary...

Actually, it's not all that hard to parse, especially given that there is no need to deal with the 'nested' case.

I will be supplying a Python implementation of the parser along with the PEP. What I would prefer not to supply (although I certainly can if you feel it's necessary) is an optimized C implementation of the same parser, as well as the implementations of the various type-specific formatters.

Even more unfortunately, I expect that dict key access is a pretty important use case so we'll have to address it somehow. I don't think there's an important use case for the ambiguity -- in any particular situation I expect that the programmer will know whether they are expecting a dict or an object with attributes.

Hm, perhaps {a at b.c} might work? It's not an existing binary operator. Or perhaps # or !.

[] is the most intuitive syntax by far IMHO. Let's run it up the flagpole and see if anybody salutes :)

It's too late to think straight so this will have to be continued...

One additional issue that I would like some feedback on:

The way I have set up the API for writing custom formatters (not talking about the format method here) allows the custom formatter object to examine the entire output string, not merely the part that it is responsible for; And moreover, the custom formatter is free to modify the entire string. So for example, a custom formatter could tabify or un-tabify all previous text within the string.

The API could be made slightly simpler by eliminating this feature. The reason that I added it was specifically so that custom formatters could perform column-specific operations, like the old BASIC function that would print spaces up to a given column. Having generated my share of reports back in the old days (COBOL programming in the USAF), I thought it might be useful to have the ability to do operations based on the absolute column number.

Currently the API specifies that a custom formatter is passed an array object, and the custom formatter should append its data to the end of the array, but it is also free to examine and modify the rest of the array.

If I were to remove this feature, then instead of using an array, we'd simply have the custom formatter return a string like format does.

So the question is - is the use case useful enough to keep this feature? What do people think of the use of the Python array type in this case?

-- Talin

Previous message: [Python-Dev] PEP 3101 Update
Next message: [Python-Dev] PEP 3101 Update
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list