[Python-Dev] Optimized string concatenation (original) (raw)

Guido van Rossum guido at python.org
Tue Aug 3 16:13:30 CEST 2004


The SF patch http://www.python.org/sf/980695 about making repeated string concatenations efficient has been reviewed and is acceptable on technical grounds. This is about avoiding the quadratic behavior of

s = '' for x in y: s += somestring(x)

FWIW, I didn't approve it, Raymond did, IMO prematurely. :-(

In the past similar schemes based on refcount were rejected because we couldn't be sure that there wasn't C code with an uncounted reference. Is that ruled out in this scheme?

Also, I believe that even Java doesn't optimize this case in general -- only the much simpler case where you concatenate a number of strings in a single expression.

This leaves open the policy questions:

* first, is that an implementation detail or a published feature? The question is important because the difference in performance is enormous -- we are not talking about 2x or even 10x faster but roughly Nx faster where N is the size of the input data set.

Like tail recursion, it will change the way people write code. I am going to make up a new rule here, and state that implementation features that affect not just the running speed but the O() rate for certain algorithms should be considered language features, because any implementation will be required to implement them in order to ensure code portability.

* if it is a published feature, what about Jython?

* The patch would encourage a coding style that gives program that essentially don't scale with Jython -- nor, for that matter, with 2.3 or older -- and worse, the programs would appear to work on Jython or 2.3 when tested with small or medium-sized data sets, but just appear to hang when run on larger data sets. Obviously, this is problem that has always been here, but if we fix it in 2.4 we can be sure that people will develop and test with 2.4, and less thoroughly on 2.3, and when they deploy on 2.3 platforms it will unexpectedly not scale. * discussed on SF too is whether we should remove the 'a=a+b' acceleration from the patch, keeping only 'a+=b'; see the SF tracker. This seems overkill, but should the acceleration be there but disabled by default? from future import stringconcatenate?

Why do people want this so badly? Please leave well enough alone.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list