[Python-Dev] Optimize Unicode strings in Python 3.3 (original) (raw)

Victor Stinner victor.stinner at gmail.com
Wed May 30 13:26:14 CEST 2012


The "two steps" method is not promising: parsing the format string twice is slower than other methods. The "1.5 steps" method is more promising -- first parse the format string in an efficient internal representation, and then allocate the output string and then write characters (or enlarge and widen the buffer, but with more information in any case). The internal representation can be cached (as for struct module) that for a repeated formatting will reduce the cost of parsing to zero.

I implemented something like that, and it was not efficient and very complex.

See for example the (incomplete) patch for str%args attached to the issue #14687: http://bugs.python.org/file25413/pyunicode_format-2.patch

IMO this approach is less efficient than the "Unicode writer" approach because:

I wrote a much more complex patch for str%args to remember variables of the first step to avoid most of the parsing work in the second step. The patch was very complex and hard to maintain. I chose to not publish it and try another approach (the Unicode writer).

Note: I'm talking about str%args and str.format(args), the Unicode writer is not the most efficient method for any function creating strings!

Victor



More information about the Python-Dev mailing list