[Python-Dev] [pypy-dev] efficient string concatenation (yep, from 2004) (original) (raw)

Maciej Fijalkowski fijall at gmail.com
Wed Feb 13 08:35:28 CET 2013


Hi Christian.

We have it, just not enabled by default. --objspace-with-strbuf I think

On Wed, Feb 13, 2013 at 1:53 AM, Christian Tismer <tismer at stackless.com> wrote:

Hi friends,

efficient string concatenation has been a topic in 2004. Armin Rigo proposed a patch with the name of the subject, more precisely: [Patches] [ python-Patches-980695 ] efficient string concatenation on sourceforge.net, on 2004-06-28. This patch was finally added to Python 2.4 on 2004-11-30. Some people might remember the larger discussion if such a patch should be accepted at all, because it changes the programming style for many of us from "don't do that, stupid" to "well, you may do it in CPython", which has quite some impact on other implementations (is it fast on Jython, now?). It changed for instance my programming and teaching style a lot, of course! But I think nobody but people heavily involved in PyPy expected this: Now, more than eight years after that patch appeared and made it into 2.4, PyPy (!) still does not have it! Obviously I was mislead by other optimizations, and the fact that this patch was from a/the major author of PyPy who invented the initial patch for CPython. That this would be in PyPy as well sooner or later was without question for me. Wrong... ;-) Yes, I agree that for PyPy it is much harder to implement without the refcounting trick, and probably even more difficult in case of the JIT. But nevertheless, I tried to find any reference to this missing crucial optimization, with no success after an hour (*). And I guess many other people are stepping in the same trap. So I can imagine that PyPy looses some of its speed in many programs, because Armin's great hack did not make it into PyPy, and this is not loudly declared somewhere. I believe the efficiency of string concatenation is something that people assume by default and add it to the vague CPython compatibility claim, if not explicitly told otherwise. ---- Some silly proof, using python 2.7.3 vs PyPy 1.9: $ cat strconc.py #!env python from timeit import defaulttimer as timer tim = timer() s = '' for i in xrange(100000): s += 'X' tim = timer() - tim print 'time for {} concats = {:0.3f}'.format(len(s), tim)

$ python strconc.py time for 100000 concats = 0.028 $ pypy strconc.py time for 100000 concats = 0.804 Something is needed - a patch for PyPy or for the documentation I guess. This is not just some unoptimized function in some module, but it is used all over the place and became a very common pattern since introduced. How ironic that a foreseen problem occurs now, and there :-) cheers -- chris (*) http://pypy.readthedocs.org/en/latest/cpythondifferences.html http://pypy.org/compat.html http://pypy.org/performance.html -- Christian Tismer :^) <mailto:tismer at stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : Starship http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/


pypy-dev mailing list pypy-dev at python.org http://mail.python.org/mailman/listinfo/pypy-dev



More information about the Python-Dev mailing list