[Python-Dev] transitioning from % to {} formatting (original) (raw)
Glenn Linderman glenn at nevcal.com
Thu Oct 1 18:49:24 CEST 2009
- Previous message: [Python-Dev] transitioning from % to {} formatting
- Next message: [Python-Dev] transitioning from % to {} formatting
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On approximately 9/30/2009 4:03 PM, came the following characters from the keyboard of Vinay Sajip:
Steven Bethard <steven.bethard gmail.com> writes:
There's a lot of code already out there (in the standard library and other places) that uses %-style formatting, when in Python 3.0 we should be encouraging {}-style formatting. We should really provide some sort of transition plan. Consider an example from the logging docs:
logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") We'd like to support both this style as well as the following style: logging.Formatter("{asctime} - {name} - {levelname} - {message}") In logging at least, there are two different places where the formatting issue crops up. The first is creating the "message" part of the the logging event, which is made up of a format string and arguments. The second is the one Steven's mentioned: formatting the message along with other event data such as time of occurrence, level, logger name etc. into the final text which is output.
It seems to me that most of the discussion of in this thread is concerned with the first issue... and yet I see the second as the harder issue, and it has gotten less press.
Abstracting this away from logger, I think the problem has three cases:
Both the format message and all the parameters are supplied in a single API call. This is really a foolish API, because
def API( fmt, p1, p2, p3 ): str = fmt % (p1, p2, p3)
could have just as easily been documented originally as
def API( str ):
where the user is welcome to supply a string such as
API( fmt % (p1, p2, p3 ))
and if done this way, the conversion to .format is obvious... and all under the users control.
The format message and the parameters are supplied to separate APIs, because the format message is common to many invocations of the other APIs that supply parameters, and is cached by the API. This is sufficient to break the foolishness of #1, but is really just a subset of #3, so any solutions to #3 apply here.
The format message and the parameters for it may be supplied by the same or separate APIs, but one or both are incomplete, and are augmented by the API. In other words, one or both of the following cases:
3a) The user supplied format message may include references to named parameters that are documented by the API, and supplied by the API, rather than by the user.
3b) The user supplied format string may be embedded into a larger format string by the API, which contains references to other values that the user must also supply.
In either case of 3a or 3b, the user has insufficient information to perform the whole format operation and pass the result to the API.
In both cases, the API that accepts the format string must be informed whether it is a % or {} string, somehow. This could be supplied to the API that accepts the string, or to some other related API that sets a format mode. Internally, the code would have to be able to manipulate both types of formats.
Support for both % and {} forms in logging would need to be considered in these two places. I sort of liked Martin's proposal about using different keyword arguments, but apart from the ugliness of "dicttemplate" and the fact that "fmt" is already used in Formatter.init as a keyword argument, it's possible that two different keyword arguments "fmt" and "format" both referring to format strings might be confusing to some users.
Benjamin's suggestion of providing a flag to Formatter seems slightly better, as it doesn't change what existing positional or keyword parameters do, and just adds an additional, optional parameter which can start off with a default of False and transition to a default of True. However, AFAICT these approaches only cover the second area where formatting options are chosen - not the creation of the message from the parameters passed to the logging call itself.
The above three paragraphs are unclear to me. I think they might be referring to case 2 or 3, though.
Of course one can pass arbitrary objects as messages which contain their own formatting logic. This has been possible since the very first release but I'm not sure that it's widely used, as it's usually easier to pass strings. So instead of passing a string and arguments such as
logger.debug("The %s is %d", "answer", 42) one can currently pass, for a fictitious class PercentMessage, logger.debug(PercentMessage("The %s is %d", "answer", 42)) and when the time comes to obtain the formatted message, LogRecord.getMessage calls str() on the PercentMessage instance, whose str will use %-formatting to get the actual message. Of course, one can also do for example logger.debug(BraceMessage("The {} is {}", "answer", 42)) where the str() method on the BraceMessage will do {} formatting. Of course, I'm not suggesting we actually use the names PercentMessage and BraceMessage, I've just used them there for clarity.
It seems that the above is only referring to case 1? And doesn't help with case 2 or 3?
Also, although Raymond has pointed out that it seems likely that no one ever needs both types of format string, what about the case where application A depends on libraries B and C, and they don't all share the same preferences regarding which format style to use? ISTM no-one's brought this up yet, but it seems to me like a real issue. It would certainly appear to preclude any approach that configured a logging-wide or logger-wide flag to determine how to interpret the format string.
Agreed here... a single global state would not make modular upgrades to a complex program easy... the state would be best included with particular instance objects, especially when such instance objects exist already. The format type parameter could be provided to the instance, instead of globally.
Another potential issue is where logging events are pickled and sent over sockets to be finally formatted and output on different machines. What if a sending machine has a recent version of Python, which supports {} formatting, but a receiving machine doesn't? It seems that at the very least, it would require a change to SocketHandler and DatagramHandler to format the "message" part into the LogRecord before pickling and sending. While making this change is simple, it represents a potential backwards-incompatible problem for users who have defined their own handlers for doing something similar.
Apart from thinking through the above issues, the actual formatting only happens in two locations - LogRecord.getMessage and Formatter.format - so making the code do either %- or {} formatting would be simple, as long as it knows which of % and {} to pick. Does it seems too onerous to expect people to pass an additional "useformat" keyword argument with every logging call to indicate how to interpret the message format string? Or does the PercentMessage/BraceMessage type approach have any mileage? What do y'all think?
These last 3 paragraphs seem to be very related to logger, specifically. The first of the 3 does point out a concern for systems that interoperate across networks: if the format strings and parameters are exposed separately across networks, whatever types are sent must be usable at the receiver, or at least appropriate version control must be required so that incompatible systems can be detected and reported.
On approximately 9/30/2009 5:47 PM, came the following characters from the keyboard of Antoine Pitrou:
Vinay Sajip <vinaysajip yahoo.co.uk> writes:
Does it seems too onerous to expect people to pass an additional "useformat" keyword argument with every logging call to indicate how >> to interpret the message format string? Or does the PercentMessage/BraceMessage type approach have any mileage? What do y'all think? What about the proposal I made earlier? (support for giving a callable, so that you pass the "{foobar}".format method when you want new-style formatting)
This "callable" technique seems to only support case 1 and 2, but not 3, unless I misunderstand it.
-- Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
- Previous message: [Python-Dev] transitioning from % to {} formatting
- Next message: [Python-Dev] transitioning from % to {} formatting
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]