[Python-Dev] Issue? (original) (raw)

Tim Peters tim.peters at gmail.com
Wed Aug 22 15:49:14 EDT 2018

Previous message (by thread): [Python-Dev] Issue?
Next message (by thread): [Python-Dev] Python 2.7 EOL date
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I wrote Python's sort, so I may know something about it ;-) To my eyes, no, there's not "an issue" here, but a full explanation would be very involved.

For some sorting algorithms, it's possible to guarantee a redundant comparison is never made. For example, a pure insertion sort.

But Python's sort is quite complex, using a number of distinct strategies dynamically adapting to structure found in the data as it goes along. There's a possibility for a small bit of redundant work whenever switching from one strategy to another. That's typical of "hybrid" strategies.

Your case involves a switch between Python's sort's two simplest strategies. Let's just look at what matters:

[22, 33, 11, 11]

The first step is searching for "a natural run", an initial sub-sequence that's already sorted. 22 < 33 says the first two elements are already in order. Then 11 < 33 says that's the end of the run.

That part is over now.

The next step is using a binary insertion sort to move 11 into the already-sorted [22, 33] prefix. A binary search starts looking "in the middle" first, but a 2-element run is so short that 33 "is the middle", so it first compares 11 to 33 again. So it goes. Code to detect that special case could be added, but it would probably cost more than it saves (it always costs time to test for it, but that time is pure waste unless the tests succeed).

This is specific to an ascending natural run of length 2, so is quite a special case. For example, in

[22, 33, 44, 11, 11]

11 < 44 ends the natural run, and then the binary search first compares 11 to 33 ("the middle" of the 3-element natural run), and no redundant comparison gets made. But in

[22, 33, 44. 43, 11]

43 gets compared to 44 again on the second iteration of the binary search loop.

In general, this particular strategy switch ends up doing at most one redundant comparison only if the first element after an ascending natural run belongs immediately before the last element of the run. For technical reasons, this can happen at most

min(len(the_list) / 32, number_of_natural_runs_in_the_list)

times, so it's generally trivial in an average-case O(N log N) process. It's so rare it would be minor in an O(N) process too, unless N is very small.

A principled way to avoid this would be to skip the "find a run" step if N is very small, leaving the whole thing to binary insertion sort. Then a redundant comparison would never be done for small N. But that would end up doing more comparisons overall in common cases where a short list starts with a relatively (compared to N) sizable ascending or descending run.

So I'm happy enough with the tradeoffs already in place.

On Wed, Aug 22, 2018 at 10:37 AM 楼晓峰 <1520306395 at qq.com> wrote:

Why compare twice?

class Student(object): def init(self, name, score): self.name = name self.score = score def str(self): return '(%s: %s)' % (self.name, self.score) repr_ = _str def lt(self, s): #print(self, '--', s) if(self.score<s.score):_ _print(self, '<', s)_ _return True_ _if(self.score>s.score): print(self, '>', s) return False if(self.score==s.score): if(self.name>s.name): print(self, '>', s) return False if(self.name<s.name):_ _print(self, '<', s)_ _return True_ _if(self.name==s.name):_ _print(self, '==', s)_ _return False_ _def _eq_(self, s):_ _return (self.score == s.score) and (self.name == s.name)_ _def _gt_(self, s):_ _return not ((self == s) or (self < s))_ _def _le_(self, s):_ _return ((self == s) or (self < s))_ _def _ge_(self, s):_ _return ((self == s) or (self > s)) def nq(self, s): return not (self == s) L = [Student('Tim', 22), Student('Bob', 33), Student('Kevin', 11), Student('Alice', 11)] print(sorted(L)) Output: (Bob: 33) > (Tim: 22) (Kevin: 11) < (Bob: 33) (Kevin: 11) < (Bob: 33) (Kevin: 11) < (Tim: 22) (Alice: 11) < (Tim: 22) (Alice: 11) < (Kevin: 11) [(Alice: 11), (Kevin: 11), (Tim: 22), (Bob: 33)] Best regards, Xiaofeng

Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/tim.peters%40gmail.com

Previous message (by thread): [Python-Dev] Issue?
Next message (by thread): [Python-Dev] Python 2.7 EOL date
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list