[Python-Dev] Dictionary sparseness (original) (raw)

Tim Peters tim.one@comcast.net
Mon, 05 May 2003 13:40:18 -0400


[Jeremy Hylton]

Have you seen the work on gray-box systems?

http://www.cs.wisc.edu/graybox/ The philosophy of this project seems to be "You can observe an awful lot just by watching." (Apologies to Yogi.) The approach is to learn how a particular service is implemented, e.g. what buffer-replacement algorithm is used, by observing its behavior. Then write an application that exploits that knowledge to drive the system into optimized behavior for the application. No madvise() necessary. I wonder if the same can be done for dicts? My first guess would be no, because the sparseness is a fixed policy.

Well, a dict suffers damaging collisions or it doesn't. If it does, the best thing a user can do is rebuild the dict from scratch, inserting keys by decreasing order of access frequency. Then the most frequently accessed keys come earliest in their collision chains. Collisions simply don't matter for rarely referenced keys. (And, for example, if there are any truly damaging collisions in builtin.dict, I expect this gimmick would remove the damage.) The size of the dict can be forced larger by inserting artificial keys, if a user is insane . It's always been possible to eliminate dummy entries by doing "dict = dict.copy()". Note that because Python exposes the hash function used by dicts, you can write a faithful low-level dict emulator in Python, and deduce what effects a sequence of dict inserts and deletes will have. So, overall, I expect there's more you could do to speed dict access (in the handful of bad cases it's not already good enough) yourself than Python could do for you. You'd have to be nuts, though -- or writing papers on gray-box systems.