msg65293 - (view) |
Author: Andreas Eisele (eisele) |
Date: 2008-04-10 16:21 |
I need to count pairs of strings, and I use a defaultdict in a construct like count[a,b] += 1 I am able to count 50K items per second on a very fast machine, which is way too slow for my application. If I count complete strings like count[ab] += 1 it can count 500K items/second, which is more reasonable. I don't see why there is a performance penalty of a factor of 10 for such a simple construct. Do I have to switch to Perl or C to get this done??? Thanks a lot for any insight on this. Best regards, Andreas PS.: The problem seems to exist for ordinary dicts as well, it is not related to the fact that I use a defaultdict PPS: I also tried nested defaultdicts count[a][b] += 1 and get the same slow speed (and 50% more memory consumption) |
|
|
msg65296 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2008-04-10 16:37 |
My guess is that this is due to the fact that strings cache their hash values while tuples don't. IIRC, caching tuple's hash values has been proposed and rejected some time ago. Furthermore, dict lookups are heavily optimized for the case when all keys are strings. |
|
|
msg65309 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-04-10 19:15 |
I cannot reproduce this. The attached script for me gives for 1M iterations a total time of 0.8s, on 3.2GHz Pentium 4. |
|
|
msg65313 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-04-10 19:49 |
Questions about relative performance should be directed to comp.lang.python. Since no bug or suspect behavior was reported, closing as not-a-bug. |
|
|
msg65332 - (view) |
Author: Andreas Eisele (eisele) |
Date: 2008-04-11 02:46 |
Sorry for not giving a good example in the first place. The problem seems to appear only in the presence of sufficiently many distinct tuples. Then I see performance that looks rather like O(n*n) Here is an example that shows the problem: >>> from time import clock >>> d = {} >>> t0 = clock() >>> for i in range(5): for j in range(i*1000000,(i+1)*1000000): d[str(j),str(j)]=j print clock()-t0 13.04 39.51 81.86 134.18 206.66 >>> The same example with str(j)+str(j) works fine. Sorry if this should be a non-issue. For me it is a reason to implement functionality in C or Perl that I would really love to do in Python. I would call such a thing a performance bug, but maybe I'm just too demanding... Best regards, Andreas |
|
|
msg65341 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-04-11 08:44 |
The slowdown is because of the garbage collector, which has more and more objects to traverse (the tuples). If I add "import gc; gc.disable()" at the beginning of your script, it runs much faster, and the timings look linear. Martin's sample is not affected, because there are very few deallocations, and the gc collection is not triggered. Disabling the gc may not be a good idea in a real application; I suggest you to play with the gc.set_threshold function and set larger values, at least while building the dictionary. (700, 1000, 10) seems to yield good results. |
|
|
msg65343 - (view) |
Author: Andreas Eisele (eisele) |
Date: 2008-04-11 09:18 |
Great, that really solves my problem. Thank you so much, Amaury! As you say, the problem is unrelated to dicts, and I observe it also when including the tuples to a set or keeping them in lists. Perhaps your GC thresholds would be better default values than what is currently in force. |
|
|
msg65374 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-04-11 21:27 |
> Perhaps your GC thresholds would be better default > values than what is currently in force. No, the defaults are correct for typical applications. |
|
|
msg65388 - (view) |
Author: Andreas Eisele (eisele) |
Date: 2008-04-12 04:01 |
Even if they mean that creation of a huge number N of objects requires O(N*N) effort? |
|
|
msg65390 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-04-12 04:20 |
This discussion ought to be moved to comp.lang.python. The timing script needs work. It doesn't isolate any one issue for discussion. It is affected by GC, dict resizing, details of creating and hashing string objects, the need to periodically call the system malloc, etc. You get different results if: * a new dictionary is created on each pass * if any of the objects get reused * if range(5) gets replaces with [0]*5 * if you use longs instead of ints * if the strings are of fixed length: str(j)[:5] This is the wrong forum to explain every nuance of what affects performance and whether you think perl is better. |
|
|