I don't remember where, but I believe that cPython has an optimization built in for repeated string concatenation, which is probably why you aren't seeing big differences between the + and the sum().
">

(original) (raw)

On Thu, Aug 7, 2014 at 4:01 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
I don't remember where, but I believe that cPython has an optimization built in for repeated string concatenation, which is probably why you aren't seeing big differences between the + and the sum().

Indeed -- clearly so.

A little testing shows how to defeat that optimization:

blah = ''

for string in ['booyah'] * 100000:

blah = string + blah



Note the reversed order of the addition.


thanks -- cool trick.

Oh, and the join() timings:
--> timeit.Timer("blah = ''.join(['booya'] * 100000)", "blah = ''").repeat(3, 1)

[0.0014629364013671875, 0.0014190673828125, 0.0011930465698242188]

So, + is three orders of magnitude slower than join.


only one if if you use the optimized form of + and not even that if you need to build up the list first, which is the common use-case.


So my final question is this:

repeated string concatenation is not the "recommended" way to do this -- but nevertheless, cPython has an optimization that makes it fast and efficient, to the point that there is no practical performance reason to prefer appending to a list and calling join()) afterward.

So why not apply a similar optimization to sum() for strings?

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division

NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov