[Python-Dev] Backporting PEP 3101 to 2.6 (original) (raw)

Eric Smith eric+python-dev at trueblade.com
Thu Jan 10 20:12:14 CET 2008


Guido van Rossum wrote:

On Jan 10, 2008 9:57 AM, Eric Smith <eric+python-dev at trueblade.com> wrote:

Eric Smith wrote:

1: How should the builtin format() work? It takes 2 parameters, an object o and a string s, and returns o.format(s). If s is None, it returns o.format(emptystring). In 3.0, the empty string is of course unicode. For 2.6, should I use u'' or ''? I just re-read PEP 3101, and it doesn't mention this behavior with None. The way the code actually works is that the specifier is optional, and if it isn't present then it defaults to an empty string. This behavior isn't mentioned in the PEP, either.

This feature came from a request from Talin[0]. We should either add this to the PEP (and docs), or remove it. If we document it, it should mention the 2.x behavior (as other places in the PEP do). If we removed it, it would remove the one place in the backport that's not just hard, but ambiguous. I'd just as soon see the feature go away, myself. IIUC, the 's' argument is the format specifier. Format specifiers are written in a very conservative character set, so I'm not sure it matters. Or are you assuming that the type of 's' also determines the type of the output?

Yes, 's' is the format specifier. I should have used its actual name. I'm am saying that the type of 's' determines the type of the output. Maybe that's a needless assumption for the builtin format(), since it doesn't inspect the value of 's' (other than to verify its type). But for ''.format() and u''.format(), I was thinking it will be true (but see below).

It just seems weird to me that the result of format(3, u'd') would be a '3', not u'3'.

I may be in the minority here, but I think I like having a default for 's' (as implemented -- the PEP ought to be updated) and I also think it should default to an 8-bit string, assuming you support 8-bit strings at all -- after all in 2.x 8-bit strings are the default string type (as reflected by their name, 'str').

As long as it's defined, I'm okay with it. I think making the 2.6 default be an empty str is reasonable.

3: Every overridden format() method is going to have to check for _string or unicode, just like object.format() does, and return either a string or unicode object, appropriately. I don't see any way around this, but I'd like to hear any thoughts. I guess there aren't all that many format methods that will be implemented, so this might not be a big burden. I'll of course implement the built in ones. The PEP actually mentions that this is how 2.x will have to work. So I'll go ahead and implement it that way, on the assumption that getting string support into 2.6 is desirable. I think it is. (But then I still live in a predominantly ASCII world. :-)

I live in that same world, which is why I started implementing this to begin with! I've always been more interested in the ascii version for 2.6 than for the 3.0 unicode version. Doing it first in 3.0 was my way of getting it into 2.6.

For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.)

I guess in str.format() I could require the result of format(obj, format_spec) to be a str, and in unicode.format() I could convert it to be unicode, which would either succeed or fail. I think all I need to do is have the numeric formatters work with both unicode and str format specifiers, and always return str results. That should be doable. As you say, the format specifiers for the numerics are restricted to 8-bit strings, anyway.

Now that I think about it, the str .format() will also need to accept unicode and produce a str, for this to work:

u"{0}{1}{2}".format('a', u'b', 3)

I'll give these ideas a shot and see how far I get. Thanks for the feedback!

Eric.



More information about the Python-Dev mailing list