[Python-Dev] Simple dicts (original) (raw)
Damien Morton damien.morton@acm.org
Thu, 15 May 2003 18:25:13 -0400
- Previous message: [Python-Dev] Re: [PEP] += on return of function call result
- Next message: [Python-Dev] Simple dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Im currently working on an open-chaining dict. Between paying work and coming to grips with the python innards, it might take a little while.
I was working on an implementation optimised for dicts with <256 entries that attempted to squeeze the most out of memory by using bytes for the 'first' and 'next' pointers. This kind of hashtable can be extremely sparse compared to the current dict implementation. With the byte-oriented open-chaining approach, the break-even point for memory usage (compared to the current approach) happens at a max load factor of about 0.1.
Im not sure that alloc()/free() for each dictentry is a win (if only because of pymalloc call overhead), and instead imagine a scheme whereby each dict would pre-alloc() a block of memory and manages its own free-lists. Theoretically, this makes copying and disposing of dicts much easier. It also helps ensure locality of reference. In fact, immediately after a doubling, the open-addressing hashtable scheme still 'uses' (in the sense of potentially addressing) all of the memory allocated to it, whereas the open-chaining approach 'uses' only the first pointers and the actual dictentries in use - about 2/3 of the space the open-addressing scheme uses.
On the other hand, as Tim pointed out to me in a private email, there is so much overhead in just getting to the hashtable inner loop, going around that loop one time instead of two or three seems inconsequential.
On the third hand, first-miss and first-hit lookups are simple enough that they could easily be put into a macro. I will need to take a closer look at Oren Tirosh's fastnames patch.
I have a question that someone may be able to answer:
There seem to be two different ways to get/set/del from a dictionary.
The first is using PyDict_[Get|Set|Del]Item()
The second is using the embarssingly named dict_ass_sub() and its partner dict_subscript().
Which of these two access methods is most likely to be used?
- Previous message: [Python-Dev] Re: [PEP] += on return of function call result
- Next message: [Python-Dev] Simple dicts
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]