[Python-Dev] Accessing globals without dict lookup (original) (raw)

Guido van Rossum guido@python.org
Sat, 09 Feb 2002 09:23:01 -0500

Previous message: [Python-Dev] Accessing globals without dict lookup
Next message: [Python-Dev] Accessing globals without dict lookup
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I'm not looking for point-by-point answers here, I'm just pointing out things that were hard to follow so that they may get addressed in a revision.

Do you think it's PEP time yet?

> When you use its getitem method, the PyObject * in the cell is > dereferenced, and if a NULL is found, getitem raises KeyError > even if the cell exists.

Had a hard time with this: [...] 2. Presumably the first "the cell" in this sentence refers to a different cell than the second "the cell" intends.

No, they are the same. See getitem pseudo code.

delitem is missing, but presumably straightforward.

I left it out intentionally because it adds nothing new. Maybe that was wrong -- it's important that deleting a global stores NULL in its cell.objptr but does not delete the cell from the celldict.

> When a function object is created from a regular dict instead of a > celldict, funccells is a NULL pointer.

This part is regrettable, since it's Yet Another NULL check at the top of code using this stuff (meaning it slows the normal case, assuming that it's unusual not to get a celldict). I'm not clear on how code ends up getting created from a regular dict instead of a celldict -- is this because of stuff like "exec whatever in mydict"?

Yes, I don't want to break such code because that's been the politically correct way for ages. We do have to deprecate it to encourage people to use celldicts here.

To avoid the NULL check at the top, we could stuff func_cells with empty cells and do the special-case check at the end (just before we would raise NameError). Then there still needs to be a check for STORE and DELETE, because we don't want to store into the dummy cells. Sound like a hack to assess separately later.

(Another hack probably not worth it right now is to make the module's cell.cellptr point to itself if it's not shadowing a builtin cell -- then the first NULL check for cell.cellptr can be avoided in the case of finding a builtin name successful.)

> - There are fallbacks in the VM for the case where the function's > globals aren't a celldict, and hence funccells is NULL. In that > case, the code object's coglobals is indexed with to find the > name of the corresponding global and this name is used to index the > function's globals dict.

Which may not succeed, so we also need another level to back off to the builtins. I'd like to pursue getting rid of the funccells==NULL special case, even if it means constructing a celldict out of a regular dict for the duration, and feeding mutations back in to the regular dict afterwards.

The problem is that during the execution accessing the dict doesn't give the right results. I don't care about this case being fast (after all it's exec and if people want it faster they can switch to using a celldict). I do care about not changing corners of the semantics.

Note that a chain of 4 test+branches against NULL in "the usual case" for builtins may not be faster on average than inlining the first few useful lines of lookdictstring twice (the expected path in this routine became fat-free for 2.2):

i = hash; ep = &ep0[i]; if (ep->mekey == NULL || ep->mekey == key) return ep; Win or lose, that's usually the end of a dict lookup. That is, I'm certain we're paying significantly more for layers of C-level function call overhead today than for what the dict implementation actually does now (in the usual cases).

This should be tried!!!

> Compare this to Jeremy's scheme using dlicts: > > http://www.zope.org/Members/jeremy/CurrentAndFutureProjects/FastGlobals > > - My approach doesn't require global agreement on the numbering of the > globals; each code object has its own numbering. This avoids the > need for more global analysis,

Don't really care about that.

I do. The C code in compiler.c is already at a level of complexity that nobody understands it in its entirety! (I don't understand what Jeremy added, and Jeremy has to ask me about the original code. :-( )

Switching to the compiler.py package is unrealistic for 2.3; there's a bootstrap problem, plus it's much slower. I know that we cache the bytecode, but there are enough situations where we can't and the slowdown would kill us (imagine starting Zope for the first time from a fresh CVS checkout).

> and allows adding code to a module using exec that introduces new > globals without having to fall back on a less efficient scheme.

That is indeed lovely.

Forgot a there? It seems a pretty minor advantage to me.

I would like to be able to compare the two schemes more before committing to any implementation. Unfortunately there's no description of Jeremy's scheme that we can compare easily (though I'm glad to see he put up his slides on the web: http://www.python.org/~jeremy/talks/spam10/PEP-267-1.html).

I guess there's so much handwaving in Jeremy's proposal about how to deal with exceptional cases that I'm uncomfortable with it. But that could be fixed.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Previous message: [Python-Dev] Accessing globals without dict lookup
Next message: [Python-Dev] Accessing globals without dict lookup
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]