[Python-Dev] When do sets shrink? (original) (raw)
Josiah Carlson jcarlson at uci.edu
Sun Jan 1 01:42:19 CET 2006
- Previous message: [Python-Dev] When do sets shrink?
- Next message: [Python-Dev] When do sets shrink?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Noam Raphael <noamraph at gmail.com> wrote:
On 12/31/05, Raymond Hettinger <raymond.hettinger at verizon.net> wrote: > [Noam] > > For example, iteration over a set which once had > > 1,000,000 members and now has 2 can take 1,000,000 operations every > > time you traverse all the (2) elements. > > Do you find that to be a common or plausible use case? I don't have a concrete example in this minute, but a set which is repeatedly filled with elements and then emptied by pop operations doesn't seem to me that far-fetched.
It doesn't seem far-fetched, but I've not seen anything like it. List appending and popping, yeah, set differences and intersections and unions, yeah, but not set insertion then removal for large numbers of items.
Note that you provide insertion into a set then repeated popping as an example, which is done faster by other methods.
(I'm speaking of my point of view, which I believe is common. I don't expect programs I write in Python to be super-fast - if that were the case, I would write them in C. I do expect them to take a reasonable amount of time, and in the case of iteration over a set, that means a time proportional to the number of elements in the set.)
That is a reasonable point of view. But realize that depending on the shrinking strategy, popping/deletion will take ammortized 2+ times longer than they do now, and whose benefits include (and are basically limited to): memory cam be freed to the operating system, repeated iteration over a resized-smaller dictionary can be faster.
> Would the case be improved by incurring the time cost of 999,998 tests > for possible resizing (one for each pop) and some non-trivial number of > resize operations along the way (each requiring a full-iteration over > the then current size)? > I believe it would. It seems to me that those 999,998 tests take not much more than a machine clock, which means about 1 milisecond on todays computers. Those resize operations will take some more miliseconds. It all doesn't really matter, since probably all other things will take much more. I now run this code
>>> s = set() >>> for j in xrange(1000000): ... s.add(j) ... >>> while s: ... tmp = s.pop() ... And it took 2.4 seconds to finish. And it's okay - I'm just saying that a few additional clock ticks per operation will usually not matter when the overall complexity is the same, but changes in order of complexity can matter much more.
Doing that while loop will take longer with a constantly resizing set. The only way that resizing a dict/set as it gets smaller will increase overall running speed is if iteration over the dict/set occurs anywhere between 2-100 times (depending on the resizing factor)
> Even if this unique case could be improved, what is the impact on common > cases? Would a downsizing scheme risk thrashing with the > over-allocation scheme in cases with mixed adds and pops? > I think that there shouldn't be additional damage beyond those clock ticks. The simple method I took from "Introduction to Algorithms" works no matter what sequence of adds and pops you have.
You may get more memory fragmentation, depending on the underlying memory manager.
> Is there any new information/research beyond what has been obvious from > the moment the dict resizing scheme was born?
I wanted to say that there isn't any new information, and yet I don't think that I have to assume that everything in current Python is the best that can be. All I did was finding another reason why a downsizing scheme might be good, and posting it to ask if people have thought about it. If you have a document listing all the design decisions that went into dict implementation, then please send it to me and I won't ask about things that were already thought about.
See the source for dictobject.c and dictnotes.txt: http://svn.python.org/view/python/trunk/Objects/dictobject.c?rev=39608&view=auto http://svn.python.org/view/python/trunk/Objects/dictnotes.txt?rev=35428&view=auto
- Josiah
- Previous message: [Python-Dev] When do sets shrink?
- Next message: [Python-Dev] When do sets shrink?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]