[Python-Dev] Simple dicts (original) (raw)

Tim Peters tim.one@comcast.net
Sat, 17 May 2003 21:12:12 -0400


[Damien Morton]

... On the other hand, as Tim pointed out to me in a private email, there is so much overhead in just getting to the hashtable inner loop, going around that loop one time instead of two or three seems inconsequential.

For in-cache tables. For out-of-cache tables, each trip around the loop is deadly. Since heavily used portions of small dicts are likely to be in-cache no matter how they're implemented, that's what makes me dubious about pouring lots of effort into reducing collisions for small dicts specifically.

... There seem to be two different ways to get/set/del from a dictionary.

The first is using PyDict[Get|Set|Del]Item() The second is using the embarssingly named dictasssub() and its partner dictsubscript(). Which of these two access methods is most likely to be used?

My guess matches Guido's: PyDict_*, except in programs making heavy use of explicit Python dicts. All programs use dicts under the covers for namespace mapping, and, e.g., instance.attr and module.attr end up calling PyDict_GetItem() directly. Python-level explicit dict subscripting ends up calling dict_*, essentially because Python has no idea at compile-time whether the x in

x[y]

is a dict, so generates code that goes thru the all-purpose type-dispatch machinery. On the third hand, some explicit-dict slinging code seems to use

x = somedict.get(y)

everywhere, and dict_get() doesn't call PyDict_GetItem() or dict_subscript().