[Python-3000] ordered dict for p3k collections? (original) (raw)

Guido van Rossum guido at python.org
Wed Sep 26 16:25:13 CEST 2007


On 9/26/07, Mark Summerfield <mark at qtrac.eu> wrote:

On 2007-09-26, skip at pobox.com wrote: > Mark> I have put a new version (incorporating another implementation > Mark> idea from Paul Hankin) on PyPI: > > Mark> http://pypi.python.org/pypi/sorteddict > > From that: > > The main benefit of sorteddicts is that you never have to explicitly > sort. > > Surely there must be something more than that. Wrapping sorted() around a > keys() or values() call is a pretty trivial operation. I didn't see that > the implementation saved anything.

Assuming you have a good sorteddict implementation (i.e., based on a balanced tree or a skiplist, not the one I've put up which is just showing the API) you can gain significant performance benefits.

That depends very much on the use case, and in general I strongly doubt it. I haven't looked this up in Knuth, but I believe that in a sorted dict implementation, the best performance you can get for random access and random insertions is O(log N), which is always beat by the O(1) of a hash table. This translates in O(N log N) for inserting N elements into a sorted dict, vs. O(N) in a hash table. Sorted traversal is O(N) for the sorted dict and O(N log N) for the hash table. So in order to gain a "significant performance benefit" you'd have to have one pass of insertions and two traversals with a small number of insertions or deletions in between (otherwise the sorted result from the hash table could just be cached).

I don't believe that this pattern is common enough, but I don't know your application.

For example, if you have a large dataset that you need to traverse quite frequently in sorted order, calling sorted() each time will be expensive compared to simply traversing an intrinsically sorted data structure.

When I program in C++/Qt I use QMap (a sorteddict) very often; the STL equivalent is called map. Both the Qt and STL libraries have dict equivalents (QHash and unorderedmap), but my impression is that the sorted data structures are used far more frequently than the unsorted versions.

Perhaps out of ignorance? Or perhaps the hash implementations have suboptimal implementations? Or perhaps because no equivalent to sorted() exists?

If you primarily program in Python, using dict + sorted() is very natural because they are built into the language. But using a sorted data structure and never sorting is a very common practice in other languages.

Ah, now the real reason you want this so badly is finally clear: simply because you're more familiar with it. :-)

Is the number of elements in a typical use case large enough that the performance difference even matters?

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list