[Python-Dev] Usage of += on strings in loops in stdlib (original) (raw)
Lennart Regebro regebro at gmail.com
Wed Feb 13 09:15:35 CET 2013
- Previous message: [Python-Dev] Usage of += on strings in loops in stdlib
- Next message: [Python-Dev] Usage of += on strings in loops in stdlib
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Feb 12, 2013 at 10:03 PM, Maciej Fijalkowski <fijall at gmail.com> wrote:
Hi
We recently encountered a performance issue in stdlib for pypy. It turned out that someone commited a performance "fix" that uses += for strings instead of "".join() that was there before.
Can someone show the actual diff? Of this?
I'm making a talk about outdated patterns in Python at DjangoCon EU, prompted by this question, and obsessive avoidance of string concatenation. But all the tests I've done show that ''.join() still is faster or as fast, except when you are joining very few strings, like for example two strings, in which case concatenation is faster or as fast. Both under PyPy and CPython. So I'd like to know in which case ''.hoin() is faster on PyPy and += faster on CPython.
Code with times
x = 100000
s1 = 'X'* x
s2 = 'X'* x
for i in xrange(500):
s1 += s2
Python 3.3: 0.049 seconds PyPy 1.9: 24.217 seconds
PyPy indeed is much much slower than CPython here. But let's look at the join case:
x = 100000
s1 = 'X'* x
s2 = 'X'* x
for i in xrange(500):
s1 = ''.join((s1, s2))
Python 3.3: 18.969 seconds PyPy 1.9: 62.539 seconds
Here PyPy needs twice the time, and CPython needs 387 times as long time. Both are slower.
The best case is of course to make a long list of strings and join them:
x = 100000
s1 = 'X'* x
s2 = 'X'* x
l = [s1]
for i in xrange(500):
l.append(s2)
s1 = ''.join(l)
Python 3.3: 0.052 seconds PyPy 1.9: 0.117 seconds
That's not always feasible though.
//Lennart
- Previous message: [Python-Dev] Usage of += on strings in loops in stdlib
- Next message: [Python-Dev] Usage of += on strings in loops in stdlib
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]