[Python-Dev] Accessing globals without dict lookup (original) (raw)

Guido van Rossum guido@python.org
Mon, 11 Feb 2002 11:28:59 -0500


All right -- i have attempted to diagram a slightly more interesting example, using my interpretation of Guido's scheme. [...] How does it look? Guido, is it anything like what you have in mind?

Yes, exactly. I've added pointers to your images to PEP 280. Maybe you can also create a diagram for Tim's "more aggressive" scheme?

A couple of observations so far:

1. There are going to be lots of global-cell objects. Perhaps they should get their own allocator and free list.

Yes.

2. Maybe we don't have to change the module dict type. We could just use regular dictionaries, with the special case that if retrieving the value yields a cell object, we then do the objptr/cellptr dance to find the value. (The cell objects have to live outside the dictionaries anyway, since we don't want to lose them on a rehashing.)

And who would do the special dance? If PyDict_GetItem, it would add an extra test to code whose speed is critical in lots of other cases (plus it would be impossible to create a dictionary containing cells without having unwanted special magic). If in a wrapper, then .dict[] would return a surprise cell instead of a value.

3. Could we change the name, please? It would really suck to have two kinds of things called "cell objects" in the Python core.

Agreed. Or we could add a cellptr to the existing cell objects; or maybe a scheme could be devised that wouldn't need a cell to have a cellptr, and then we could use the existing cell objects unchanged.

4. I recall Tim asked something about the cellptr-points-to-itself trick. Here's what i make of it -- it saves a branch: instead of

PyObject* cellget(PyGlobalCell* c) { if (c->cellobjptr) return c->cellobjptr; if (c->cellcellptr) return c->cellcellptr->cellobjptr; } it's PyObject* cellget(PyGlobalCell* c) { if (c->cellobjptr) return c->cellobjptr; return c->cellcellptr->cellobjptr; }

That's what my second "additional idea" in PEP 280 proposes:

| - Make c.cellptr equal to c when a cell is created, so that | LOADGLOBALCELL can always dereference c.cellptr without a NULL | check.

This makes no difference when c->cellobjptr is filled, but it saves one check when c->cellobjptr is NULL in a non-shadowed variable (e.g. after "del x"). I believe that's the only case in which it matters, and it seems fairly rare to me that a module function will attempt to access a variable that's been deleted from the module.

Agreed. When x is not defined, it doesn't matter how much extra code we execute as long as we don't dereference NULL. :-)

Because the module can't know what new variables might be introduced into builtin after the module has been loaded, a failed lookup must finally fall back to a lookup in builtin. Given that, it seems like a good idea to set c->cellcellptr = c when c->cellobjptr is set (for both shadowed and non-shadowed variables). In my picture, this would change the cell that spam.max points to, so that it points to itself instead of builtin.max's cell. That is:

PyObject* cellset(PyGlobalCell* c, PyObject* v) { c->cellobjptr = v; c->cellcellptr = c; }

But now you'd have to work harder when you delete the global again (i.e. in cell_delete()); the shadowed built-in must be restored.

This simplifies things further:

PyObject* cellget(PyGlobalCell* c) { return c->cellcellptr->cellobjptr; } This buys us no branches, which might be a really good thing on today's speculative execution styles.

Good idea! (And before I did misread your followup, because I hadn't fully digested this msg. I think you're right that we might be able to use just a PyObject **; but I haven't fully digested Tim's more aggressive idea.)

--Guido van Rossum (home page: http://www.python.org/~guido/)