[Python-Dev] A new dict for Xmas? (original) (raw)

Stefan Behnel stefan_ml at behnel.de
Fri Dec 23 13:03:17 CET 2011


Mark Shannon, 23.12.2011 12:21:

Martin v. Löwis wrote:

- it would be useful to have a specialized representation for all-keys-are-strings. In that case, mehash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived. Why do you say that? In a plain 3.3 interpreter, I counted 595 dict objects (see script below). Of these, 563 (so nearly of them) had only strings as keys. Among those, I found 286 different key sets, where 231 key sets occurred only once (i.e. wouldn't be shared). Together, the string dictionaries had 13282 keys, and you could save as many pointers (actually more, because there will be more key slots than keys). The question is how much memory needs to be saved to be worth adding the complexity, 10kb: No, 100Mb: yes. So data from "real" benchmarks would be useful.

Consider taking a parsed MiniDOM tree as a benchmark. It contains so many instances of just a couple of different classes that it just has to make a huge difference if each of those instances is even just a bit smaller. It should also make a clear difference for plain Python ElementTree.

I attached a benchmark script that measures the parsing speed as well as the total memory usage of the in-memory tree. You can get data files from the following places, just download them and pass their file names on the command line:

http://gnosis.cx/download/hamlet.xml

http://www.ibiblio.org/xml/examples/religion/ot/ot.xml

Here are some results from my own machine for comparison:

http://blog.behnel.de/index.php?p=197

Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: etbenchmark.py Type: text/x-python Size: 4760 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20111223/cfc8eb7d/attachment.py>



More information about the Python-Dev mailing list