[Python-Dev] iterzip() (original) (raw)

Tim Peters tim.one@comcast.net
Mon, 29 Apr 2002 15:35:37 -0400

Previous message: [Python-Dev] iterzip()
Next message: [Python-Dev] iterzip()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Guido]

This suggests that zip() can be sped up considerably by using a hint for the input sequence sizes. If they all respond properly to PySequenceLength(), zip can preallocate the result list to the correct size. It shouldn't believe the numbers fully, but using them as a hint for initial allocation makes a lot of sense, and should greatly reduce the inefficiency.

I'm afraid I misanalyzed the cause. Giving zip a good guess about the result size and using PyList_SET_ITEM up until then leads (as expected) to no list copying in the test case I posted, but only dropped the runtime from 7.87 to 7.31 seconds (which makes sense looking at the numbers more closely: it doesn't take that long to copy a few dozen mebabytes of memory unless you're doing it by hand ).

Check this out:

""" from time import clock as now

n = 1000000

def juststore(x): for i in x: x[i] = 0

def justtups(x): for i in x: 0,

def storetups(x): for i in x: x[i] = 0,

def run(func): x = range(n) start = now() func(x) finish = now() print "%10s %6.2f" % (func.name, finish - start)

run(juststore) run(justtups) run(storetups) """

I get:

juststore 0.93 justtups 0.58 storetups 7.61

list.append is out of the picture here. Creating a million tuples goes fast so long as they're recycled, and storing a million things goes fast, but storing a million distinct tuples takes very much longer. It's a Mystery to me so far. How does it act on Linux?

Previous message: [Python-Dev] iterzip()
Next message: [Python-Dev] iterzip()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]