[Python-Dev] optimizing non-local object access (original) (raw)

Jeremy Hylton jeremy@zope.com
Thu, 9 Aug 2001 14:15:24 -0400 (EDT)


"SM" == Skip Montanaro <skip@pobox.com> writes:

Jeremy> My worry about your approach is that the track-object Jeremy> opcodes could add a lot of expense for objects only used Jeremy> once or twice. If the function uses math.sin inside a loop, Jeremy> it's an obvious win. If it uses it only once, it's not so Jeremy> clear.

SM> Even if math.sin is used just once you swap a SM> LOAD_GLOBAL/LOAD_ATTR pair for a SM> TRACK_OBJECT/LOAD_FAST/UNTRACK_OBJECT trio, so the hit you take SM> shouldn't be terrible. (My assumption is that the SM> register/unregister cost is fairly low and the actual SM> notification/update code will almost never be executed.) You SM> break even in total instructions executed with two accesses and SM> win after that.

I'm assuming some kind of memory allocation is necessary to accomodate an aribtrary number of handlers. If you need to resize an array that holds the pointers to the tracking callbacks, it could get expensive.

I also wonder if you have to pay for tracking changes whenever the name is rebound in the module, or just when you need to use the name again.

SM> In addition, this might be a strategy left for SM> an optimization pass that would only make the change if the SM> LOAD_ATTR and/or LOAD_GLOBAL instructions are executed in a SM> loop.

Good point.

Jeremy> To be more concrete: The math module would store the sin Jeremy> name in slot X. The first time the foobar module used Jeremy> math.sin it would lookup the slot of sin in the math table. Jeremy> The foobar module would store a pointer to math's fast Jeremy> globals and the index of the sin slot. Then math.sin would Jeremy> be accessed via a single opcode that used the stored Jeremy> information.

SM> Unfortunately, the code that uses math.sin can't know that math SM> is a module. It might be an instance with a sin attribute. SM> Even worse, because of Python's dynamic nature, what the name SM> "math" is bound to can change. You can't assume it will always SM> be bound to a module object, even if it is the first time you SM> set things up. I think you have to work with names and name SM> bindings. I don't think you can make assumptions about what the SM> names are bound to.

No assumptions necessary. The compiler only emits the new opcodes for names bound by import or attributes thereof. If the module name ('math') is rebound, the interpreter is responsible for reseting all of the other bindings that depend on it ('math.sin'). If the object 'math', isn't a module (even though the compiler guessed it would be), the opcodes fall back to the old implementation.

The first time math.sin is used, we do the following:

On future uses, the first step above will discover a valid binding. If the name math is rebound, the interpreter marks as the fast globlas refering to it as unitinitalized.

One advantage of this approach is that the work is shared across all code in a module. If many functions use math.sin, the first one initializes the table and all the rest use it.

SM> The handwaving bit in my post was there because I am not SM> familiar enough with the various possibilities for name SM> rebinding. Does it all boil down to PyDict_SetItem or SM> PyObject_SetAttr as I suspect? Are those functions too SM> low-level, that is, have the names been forgetten completely at SM> that point? If so, perhaps STORE_GLOBAL and STORE_ATTR would SM> have to be modified to use PyDict_SetItemString and SM> PyObject_SetAttrString instead.

I think we hook in at the tp_getattr(o) level. Module objects can detect rebindings there and do whatever bookkeeping is necessary to keep references to its name consistent. I think this is the right approach for either technique we're discussing.

Jeremy