[Python-Dev] Store x Load x --> DupStore (original) (raw)

Michael Hudson mwh at python.net
Sun Feb 20 22:54:43 CET 2005


"Phillip J. Eby" <pje at telecommunity.com> writes:

At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:

"Phillip J. Eby" <pje at telecommunity.com> writes:

> At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote: >>Where are the attempts to speed up function/method calls? That's an >>area where we could really use a breakthrough... > > Amen! > > So what happened to Armin's pre-allocated frame patch? Did that get into 2.4? No, because it slows down recursive function calls, or functions that happen to be called at the same time in different threads. Fixing that would require things like code specific frame free-lists and that's getting a bit convoluted and might waste quite a lot of memory. Ah. I thought it was just going to fall back to the normal case if the pre-allocated frame wasn't available (i.e., didn't have a refcount of 1).

Well, I don't think that's the test, but that might work. Someone should try it :) (I'm trying something else currently).

Eliminating the blockstack would be nice (esp. if it's enough to get frames small enough that they get allocated by PyMalloc) but this seemed to be tricky too (or at least Armin, Samuele and I spent a cuple of hours yakking about it on IRC and didn't come up with a clear approach). Dynamically allocating the blockstack would be simpler, and might acheive a similar win. (This is all from memory, I haven't thought about specifics in a while). I'm not very familiar with the operation of the block stack, but why does it need to be a stack?

Finally blocks are the problem, I think.

For exception handling purposes, wouldn't it suffice to know the offset of the current handler, and have an opcode to set the current handler location? And for "for" loops, couldn't an anonymous local be used to hold the loop iterator instead of using a stack variable? Hm, actually I think I see the answer; in the case of module-level code there can be no "anonymous local variables" the way there can in functions. Hmm.

I don't think this is the killer blow. I can't remember the details and it's too late to think about them, so I'm going to wait and see if Samuele replies :)

All of it, in easy cases. ISTR that the fast path could be a little wider -- it bails when the called function has default arguments, but I think this case could be handled easily enough. When it has any default arguments, or only when it doesn't have values to supply for them?

When it has any, I think. I also think this is easy to change.

Why are frames so big? Because there are COMAXBLOCKS * 12 bytes in there for the block stack. If there was no need for that, frames could perhaps be allocated via pymalloc. They only have around 100 bytes or so in them, apart from the blockstack and locals/value stack.

What I'm trying is allocating the blockstack separately and see if two pymallocs are cheaper than one malloc.

> Do we need a tpcallmethod that takes an argument array, length, and > keywords, so that we can skip instancemethod allocation in the > common case of calling a method directly?

Hmm, didn't think of that, and I don't think it's how the CALLATTR attempt worked. I presume it would need to take a method name too :) Er, yeah, I thought that was obvious. :)

Someone should try this too :)

Cheers, mwh

-- It is never worth a first class man's time to express a majority opinion. By definition, there are plenty of others to do that. -- G. H. Hardy



More information about the Python-Dev mailing list