[Python-Dev] iterzip() (original) (raw)
Tim Peters tim.one@comcast.net
Mon, 29 Apr 2002 13:43:42 -0400
- Previous message: [Python-Dev] iterzip()
- Next message: [Python-Dev] iterzip()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Guido]
Probably the list-append still has a slight quadratic component -- you're doing a million elements here.
I get:
justpush 1.39 justzip 7.87
from this:
""" from time import clock as now
n = 1000000 zeroes = [0] * n
def justpush(): x = [] push = x.append for i in zeroes: push(i)
def justzip(): zip(zeroes)
def run(func): start = now() func() finish = now() print "%10s %6.2f" % (func.name, finish - start)
run(justpush) run(justzip) """
That is, at first glance it's much faster to do a million appends in pure Python than it is to let zip do them at C speed. I'm curious about what people get for this program on other platforms (pymalloc on or off may make a dramatic difference too -- or not, depending on how the platform malloc works).
The cause on Win98SE is clear enough with a little instrumentation: in the justzip case, listobject.c's ins1 ends up copying the memory much more often, presumably due to that there's a new tuple allocation after every append (preventing realloc from growing into that space). justpush copies the memory at sizes (skipping some tiny ones, and where this is a count of the number of elements in the list, not the number of bytes (multiply by 4 to get the latter)):
247 639 959 2047 45055 163839
and that's all. justzip copies the memory at sizes:
247 639 959 2047 45055 53247 57343 65535 81919 98303 122879 163839
196607 229375 262143 294911 360447 393215 458751 589823 622591
688127 720895 753663 786431 819199 851967 884735 917503 950271
Copying 3-4MB multiple times at the end ain't cheap.
justpush does relatively worse if the order in which they're called is swapped, as then justzip and the custom tuple free list conspire to leave memory in a more-fragmented state.
But I asked something else. How much does the speed difference affect the apps you have? In my experience it's usually used for small lists where the quadratic effect doesn't occur and the timing doesn't matter compared to the rest of the program.
Same here. High volume "vectorized" operations are NumPy's proper domain.
- Previous message: [Python-Dev] iterzip()
- Next message: [Python-Dev] iterzip()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]