[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7? (original) (raw)
Paul Moore p.f.moore at gmail.com
Fri Nov 25 15:48:00 CET 2011
- Previous message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Next message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 25 November 2011 11:37, Matt Joiner <anacrolix at gmail.com> wrote:
No "wtf" here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration. Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some systems (depends on realloc() being fast). Is there any way to bring back that optimization? a 30 to 100x slow down on probably one of the most common operations... string contatenation, is very noticeable. In python3.3, this is representing a 0.7s stall building a 10MB string. Python 2.7 did this in 0.007s.
It's a fundamental, but sadly not well-understood, consequence of having immutable strings. Concatenating immutable strings in a loop is quadratic. There are many ways of working around it (languages like C# and Java have string builder classes, I believe, and in Python you can use StringIO or build a list and join at the end) but that's as far as it goes.
The optimisation mentioned was an attempt (by mutating an existing string when the runtime determined that it was safe to do so) to hide the consequences of this fact from end-users who didn't fully understand the issues. It was relatively effective, but like any such case (floating point is another common example) it did some level of harm at the same time as it helped (by obscuring the issue further).
It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on it (and it's especially unwise to base benchmarks on it working :-))
Paul.
- Previous message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Next message: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]