[Python-Dev] Re: heaps (original) (raw)

David Eppstein eppstein@ics.uci.edu
Mon, 05 May 2003 23:00:24 -0700


In article <LNBBLJKPBEHFEDALKOLCIEAFEFAB.tim.one@comcast.net>, Tim Peters <tim.one@comcast.net> wrote:

> For fairness, it might be interesting to try another run of your test > in which the input sequence is sorted in increasing order rather > than random.

Comparing the worst case of one against the best case of the other isn't my idea of fairness , but sure.

Well, it doesn't seem any fairer to use random data to compare an algorithm with an average time bound that depends on an assumption of randomness in the data...anyway, the point was more to understand the limiting cases. If one algorithm is usually 3x faster than the other, and is never more than 10x slower, that's better than being usually 3x faster but sometimes 1000x slower, for instance.

> I'd have to know more about your application to > have an idea whether the sorted or randomly-permuted case is more > representative.

Of course -- so would I .

My Java KBest code was written to make data subsets for a half-dozen web pages (same data selected according to different criteria). Of these six instances, one is presented the data in roughly ascending order, one in descending order, and the other four are less clear but probably not random.

Robustness in the face of this sort of variation is why I prefer any average-case assumptions in my code's performance to depend only on randomness from a random number generator, and not arbitrariness in the actual input. But I'm not sure I'd usually be willing to pay a 3x penalty for that robustness.

Here's a surprise: I coded a variant of the quicksort-like partitioning method, at the bottom of this mail. On the largest-1000 of a million random-float case, times were remarkably steady across trials (i.e., using a different set of a million random floats each time):

heapq 0.96 seconds sort (micro-optimized) 3.4 seconds KBest (below) 2.6 seconds

Huh. You're almost convincing me that asymptotic analysis works even in the presence of Python's compiled-vs-interpreted anomalies. The other surprise is that (unlike, say, the sort or heapq versions) your KBest doesn't look significantly more concise than my earlier Java implementation.

-- David Eppstein http://www.ics.uci.edu/~eppstein/ Univ. of California, Irvine, School of Information & Computer Science