[Python-Dev] patch: speed up name access by up to 80% (original) (raw)

Oren Tirosh oren-py-d@hishome.net
Mon, 11 Feb 2002 22:09:54 +0200


Problem: Python name lookup in dictionaries is relatively slow.

Possible solutions:

  1. Find ways to bypass dictionary lookup.
  2. Optimize the hell out of dictionary lookup.

Examples of approach #1 are the existing fastlocals mechanism, PEP 266, PEP 267 and the recent proposal by GvR for speeding up instance attribute access. These proposals all face major difficulties resulting from the dynamic nature of Python's namespace and require tricky techniques of code analysis code to check for assignments that may change the visible namespace.

I have chosen to try #2. The biggest difficulty with this approach is that dictionaries are already very well optimized :-) But I have found that Python dictionaries are mostly optimized for general-purpose use, not for use as namespace for running code. There are some characteristics unique to namespace access which have not been used for optimization:

The patch adds the function PyDict_GetItem_Fast to dictobject. This function is equivalent to PyDict_GetItem but is much faster for lookup using interned string keys and for lookups with a negative result. LOAD_NAME and LOAD_GLOBAL have been converted to use this function.

Here are the timings for 200,000,000 accesses to fastlocals, locals, globals and builtins for Python 2.2 with and without the fastnames patch:

            2.2       2.2+fastnames

builtin.py 68.540s 37.900s global.py 54.570s 34.020s local.py 41.210s 32.780s fastlocal.py 24.530s 24.540s

Fastlocals are still significantly faster than locals, but not by such a wide margin. You can notice that the speed differences between locals, globals and builtins are almost gone.

The machine is a Pentium III at 866MHz running linux. Both interpreters were compiled with "./configure ; make". The test code is at the bottom of this message. You will not see any effect on the results of pybench because it uses only fastlocals inside the test loops. Real code that uses a lot of globals and builtins should see a significant improvement.

Get the patch:

http://www.tothink.com/python/fastnames/fastnames.patch

The optimization techniques used:

This new type of slot has a surprisingly small impact on the rest of dictobject.c. Only one assertion had to be removed to accomodate it. All other code treats it as either an active entry (key !=NULL, !=dummy) or a deleted entry (value == NULL, key != NULL) and just happens to do the Right Thing for each case.

If an entry with the same key as a negative entry is subsequently inserted into the dictionary it will overwrite the negative entry and be reflected immediately in the namespace. There is no caching and therefore no cache coherency issues.

Known bugs:

Negative entries do not resize the table. If there is not enough free space in the table they are simply not inserted.

Assumes CACHE_HASH, INTERN_STRINGS without checking.

Future directions:

It should be possible to apply this to more than just LOAD_ATTR and LOAD_GLOBAL: attributes, modules, setting items, etc.

The hit rate for the inline macro varies from 100% in most simple cases to 0% in cases where the first hash position in the table happens to be occupied by another entry. Even in these cases it is still very fast, but I want to get more consistent performance. I am starting to experiment with probabilistic techniques that shuffle entries in the hash table and try to ensure that the entries accessed most often are kept in the first hash positions as much as possible.

Test code:

f()

Oren