[Python-Dev] collections module (original) (raw)
Michael Chermside mcherm at mcherm.com
Tue Jan 13 13:37:46 EST 2004
- Previous message: [Python-Dev] patching webbrowser.py for OS X
- Next message: [Python-Dev] collections module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Guido:]
My own ideal solution [for queues] would be a subtle hack to the list implementation to make pop(0) and insert(0, x) go a lot faster -- this can be done by adding one or two extra fields to the list head and allowing empty space at the front of the list as well as at the end.
[Tim:]
That Python's builtin list calls realloc() on every append is pretty gross, but it's also a tradeoff, saving the bytes in the list header that would be required to remember how many allocated-but-unused slots exist. If you've got a million tiny lists (and some apps do), burning an additional 8MB for something your app doesn't really need isn't attractive (on a 32-bit box today, the list header is 16 bytes; pymalloc aligns to 8-byte boundaries, so the next possible size is 24 bytes).
[Guido:]
Just in case nobody had thought of it before, couldn't the realloc() call be avoided when roundupsize(oldsize) == roundupsize(newsize)???
[Tim:]
I had thought of it before, but never acted on it . roundupsize() has gotten more expensive over the years too, and it may vary across boxes whether it's cheaper to call that a second time than it is to make the system realloc deduce that nothing needs to be done. roundupsize() is certainly cheaper than Microsoft's realloc() in the do-nothing case.
Note that there's an endcase glitch here when oldsize==0 and newsize==1. roundupsize() returns 8 for both, but it doesn't follow that obitem isn't NULL. realloc(p, n) special-cases the snot out of p==NULL to avoid crashing then. [...] Note in passing that roundupsize() is expensive because lists don't remember the amount of overallocated space too: roundupsize() has to arrange to deliver the same result for many consecutive inputs
I'm sure this is just pointing out the obvious, but IF the list header were increased, we could add TWO fields... on for allocated_size, and another for left_side_slots. According to Tim, this would still "only" bump list objects up to 24 bytes, but would allow: [1] O(1) performance for pop(0) [2] No need to call realloc() on every append(), just on "true" resizes.
I guess I'm proposing the following (all ideas previously mentioned): [1] pop(0) would increment ob_item and left_side_slots. [2] insert(0,x) would decrement ob_item and left_side_slots IF left_side_slots were > 0. Lists should still grow at the tail for efficiency, but we can re-use space at the head left by a pop(0). [3] append(0,x) (and other functions that insert) would compare ob_size and allocated_size (do we need left_side_slots in here too?) to see whether new space is needed at the end. If so, it would EITHER call realloc() for more space, OR it would memmove so left_side_slots became 0. I'm still not quite sure what rule to use to determine which to do (although for MOST lists, left_side_slots would be 0, so there wouldn't be a question).
So, to recap advantages and disadvantages: lists would still grow just as they do now. Lists used as Queues by calling append() and pop(0) would be time efficient so long as we avoid calling memmove for each and every append() (that's why we don't want to ALWAYS memmove when left_side_slots > 0). Lists on which pop(0) was never called (that's most of 'em) would have NO degradation of space used, and IMPROVED speed in cases (like Windows) where a realloc() noop takes noticable time. And (this seems the most significant drawback) the size of a list would increase by 8 bytes, even for very short lists.
I wouldn't want to make the change without trying out the performance in some "real applications", but it seems like there are some potentially useful ideas floating around here.
-- Michael Chermside
- Previous message: [Python-Dev] patching webbrowser.py for OS X
- Next message: [Python-Dev] collections module
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]