[Python-Dev] Optimize Unicode strings in Python 3.3 (original) (raw)

Victor Stinner victor.stinner at gmail.com
Wed May 30 13:26:14 CEST 2012

Previous message: [Python-Dev] Optimize Unicode strings in Python 3.3
Next message: [Python-Dev] Optimize Unicode strings in Python 3.3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The "two steps" method is not promising: parsing the format string twice is slower than other methods. The "1.5 steps" method is more promising -- first parse the format string in an efficient internal representation, and then allocate the output string and then write characters (or enlarge and widen the buffer, but with more information in any case). The internal representation can be cached (as for struct module) that for a repeated formatting will reduce the cost of parsing to zero.

I implemented something like that, and it was not efficient and very complex.

See for example the (incomplete) patch for str%args attached to the issue #14687: http://bugs.python.org/file25413/pyunicode_format-2.patch

IMO this approach is less efficient than the "Unicode writer" approach because:

you have to create many substrings or temporary strings in the first step, or (worse) compute each argument twice: the writer approach is more efficient here because it avoids computing substrings and temporary strings
you have to parse the format string twice, or you have to write two versions of the code: first create a list of items, then concatenate items. The PyAccu method concatenates substrings at the end, it is less efficient than the writer method (e.g. it has to create a string of N fill characters to pad to WIDTH characters).
the code is more complex than the writer method (which is very similar to what is used in Python 2.7 and 3.2)

I wrote a much more complex patch for str%args to remember variables of the first step to avoid most of the parsing work in the second step. The patch was very complex and hard to maintain. I chose to not publish it and try another approach (the Unicode writer).

Note: I'm talking about str%args and str.format(args), the Unicode writer is not the most efficient method for any function creating strings!

Victor

Previous message: [Python-Dev] Optimize Unicode strings in Python 3.3
Next message: [Python-Dev] Optimize Unicode strings in Python 3.3
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list