The LOAD_GLOBAL bytecode has a fast-path when globals and builtins have exactly the type 'dict'. It calls the _PyDict_LoadGlobal() function. I propose to implement a similar optimization for LOAD_NAME, see attached patch. The patch also fixes LOAD_GLOBAL and LOAD_NAME bytecodes when locals, globals or builtins are not exactly the type 'dict'. It clears the KeyError before trying the next PyObject_GetItem(). The patch changes also _PyDict_LoadGlobal() to call PyObject_Hash() if the hash was not computed yet. It might make it a little bit faster.
When LOAD_NAME is generated? Is it worth to optimize this case? Aren't LOAD_FAST and LOAD_GLOBAL used in performance critical code? It looks to me that there is a bug in fast path of _PyDict_LoadGlobal. If the first lookup fails, it can raise an exception. We have to add if (PyErr_Occurred()) return NULL; before the second lookup. Just for reference, the fast path for LOAD_GLOBAL was added in 8f8fe990e82c.
> When LOAD_NAME is generated? Is it worth to optimize this case? Aren't LOAD_FAST and LOAD_GLOBAL used in performance critical code? I guess that it's only used to execute the body of modules. > Is it worth to optimize this case? Hum, I don't know :-) > It looks to me that there is a bug in fast path of _PyDict_LoadGlobal. If the first lookup fails, it can raise an exception. Sorry, where exactly? Can you maybe put a comment on the review? I see many checks to handle errors.
Since LOAD_NAME is rare, I don't think that it's worth to optimize it. I pushed a partial version of pydict_loadname-2.patch to add comments to the code.