[Python-Dev] [Python-checkins] cpython: Implement PEP 393. (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Sat Oct 1 15:26:01 CEST 2011


Am 29.09.2011 01:21, schrieb Eric V. Smith:

Is there some reason str.format had such major surgery done to it?

Yes: I couldn't figure out how to do it any other way. The formatting code had a few basic assumptions which now break (unless you keep using the legacy API). Primarily, the assumption is that there is a notion of a "STRINGLIB_CHAR" which is the element of a string representation. With PEP 393, no such type exists anymore - it depends on the individual object what the element type for the representation is.

In other cases, I worked around that by compiling the stringlib three times, for Py_UCS1, Py_UCS2, and Py_UCS4. For one, this gives considerable code bloat, which I didn't like for the formatting code (as that is already a considerable amount of code). More importantly, this approach wouldn't have worked well, anyway, since the formatting combines multiple Unicode objects (especially with the OutputString buffer), and different inputs may have different representations. On top of that, OutputString needs widening support, starting out with a narrow string, and widening step-by-step as input strings are more wide than the current output (or not, if the input strings are all ASCII).

It would have been possible to keep the basic structure by doing all formatting in Py_UCS4. This would cost a significant memory and runtime overhead.

In addition, there are outstanding patches that are now broken.

I'm sorry about that. Try applying them to the new files, though - patch may still be able to figure out how to integrate them, as the algorithms and function structure hasn't changed.

I'd prefer it return to how it used to be, and just the minimum changes required for PEP 393 be made to it.

Please try for yourself. On string_format.h, I think there is zero chance, unless you want to compromise and efficiency (in addition to the already-present compromise on code cleanliness, due the the fact that the code is more general than it needs to be).

On formatter.h, it may actually be possible to restore what it was - in particular if you can make a guarantee that all number formatting always outputs ASCII-strings only (which I'm not so sure about, as the thousands separator could be any character, in principle). Without that guarantee, it may indeed be reasonable to compile formatter.h in Py_UCS4, since the resulting strings will be small, so the overhead is probably negligible.

Regards, Martin



More information about the Python-Dev mailing list