[Python-Dev] Rethinking intern() and its data structure (original) (raw)
John Arbash Meinel john.arbash.meinel at gmail.com
Thu Apr 9 22:22:04 CEST 2009
- Previous message: [Python-Dev] Rethinking intern() and its data structure
- Next message: [Python-Dev] Rethinking intern() and its data structure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
I don't have numbers on how much that would improve CPU times, I would imagine improving 'intern()' would impact import times more than run times, simply because import time is interning a lot of strings.
Though honestly, Bazaar would really like this, because startup overhead for us is almost 400ms to 'do nothing', which is a lot for a command line app. Maybe I misunderstand your proposed change: how could the representation of the interning dict possibly change the runtime of interning? (let alone significantly) Regards, Martin
Decreasing memory consumption lets more things fit in cache. Once the size of 'interned' is greater than fits into L2 cache, you start paying the cost of a full memory fetch, which is usually measured in 100s of cpu cycles.
Avoiding double lookups in the dictionary would be less overhead, though the second lookup is probably pretty fast if there are no collisions, since everything would already be in the local CPU cache.
If we were dealing in objects that were KB in size, it wouldn't matter. But as the intern dict quickly gets into MB, it starts to make a bigger difference.
How big of a difference would be very CPU and dataset size specific. But certainly caches make certain things much faster, and once you overflow a cache, performance can take a surprising turn.
So my primary goal is certainly a decrease of memory consumption. I think it will have a small knock-on effect of improving performance, I don't have anything to give concrete numbers.
Also, consider that resizing has to evaluate every object, thus paging in all X bytes, and assigning to another 2X bytes. Cutting X by (potentially 3), would probably have a small but measurable effect.
John =:->
- Previous message: [Python-Dev] Rethinking intern() and its data structure
- Next message: [Python-Dev] Rethinking intern() and its data structure
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]