[Python-3000] Please re-add cmp to python 3000 (original) (raw)
David A. Wheeler dwheeler at dwheeler.com
Wed Oct 17 18:40:08 CEST 2007
- Previous message: [Python-3000] Add python-3000-like print function to python 2.6
- Next message: [Python-3000] Please re-add __cmp__ to python 3000
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I said:
I agree with Collin Winter: losing cmp is a loss (see http://oakwinter.com/code/).
Steven Bethard said:
Why can't this just be supplied with a mixin? Here's a recipe providing the appropriate mixins if you want to define a key function: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403
That works from a functional perspective, and if Python3 fails to include direct support for cmp, then I think providing a built-in mixin is necessary.
But mixins for comparison are a BIG LOSER for sort performance if your fundamental operator is a cmp-like function. Sorting is completely dominated by comparison time, and the mixin is a big lose for performance. Basically, sorts always involve an inner loop that does comparisons, so any time comparison is slow, you're potentially dooming the whole program to a slow inner loop. A mixin-calling-cmp model doubles the function call work; it has to find the mixin, call it, which eventually has to find and call the final cmp operator.
I did a test (see below), and the mixin using a simulated cmp took 50% MORE time to sort a list using Python 2.5 (see code below) than when cmp is used directly (as you CAN do in Python 2.5). A few tests with varying list size lead me to believe that this isn't linear - as the lists get longer the % performance hit increases. In other words, it's a LOT slower, and as the size increases it gets far worse. That kind of nasty performance hit will probably lead people to write lots of code that duplicates comparison functionality in each lt, gt, etc. When the comparisons THEMSELVES are nontrivial, that will result in lots of duplicate, potentially-buggy code. All of which is avoidable if cmp is directly supported, as it ALREADY is in Python 1 and 2.
In addition, even IF the performance wasn't a big deal (and I think it is), I believe cmp is the better basic operator in most cases. As a style issue, I strongly prefer cmp unless I have a specific need for comparisons which are atypical, e.g., where sometimes both lt and ge will return false given the same data (IEEE floats do this if you need exactly-IEEE-specified behavior of NaNs, etc.). By preferring cmp I eliminate lots of duplicate code, and once it's right, it's always right for ALL comparisons. Sometimes lt and friends are absolutely needed, e.g., when lt(x,y)==gt(x,y) for some values of x,y, but most of the time I find that they're an indicator of bad code and that cmp should have been used instead. Direct support of cmp is a GOOD thing, not a wart or obsolete feature.
Adding a standard comparison mixin in a library is probably a good idea as well, but restoring cmp is in my mind more important. I can write my own mixin, but working around a failure to call cmp gives a big performance hit.
--- David A. Wheeler
======================================== Here's my quick test code, in two files cmptest and cmptest2. The whitespace may be munged by my mailer or the list, sorry if it is.
==== cmptest2 ====
#!/usr/bin/env python2
cmp-test2.py
import timeit
time1 = timeit.Timer("x = sorted(list)", """ import cmptest import random randomlist = [random.randrange(1,100000) for x in range(100000)] list = [cmptest.NumberWithCmp(x) for x in randomlist] """) time2 = timeit.Timer("x = sorted(list)", """ import cmptest import random randomlist = [random.randrange(1,100000) for x in range(100000)] list = [cmptest.NumberMixinCmp(x) for x in randomlist] """)
finaltime1 = time1.timeit(3) finaltime2 = time2.timeit(3)
print finaltime1 print finaltime2
====== cmptest ======
#!/usr/bin/env python2
cmp-test.py
import random import timeit
class NumberWithCmp(object): "Store 'x' for comparison" def init(self, data): self.x = data def str(self): return str(self.x) def cmp(self, other): if self.x == other.x: return 0 return (-1 if self.x < other.x else 1)
Mixin, similar to http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403
class ComparisonMixin(object): "Implement <, >, etc. by invoking a 'cmp' function." def lt(self, other): return self.cmp(other) < 0 def __le__(self, other): return self.cmp(other) <= 0 def __gt__(self, other): return self.cmp(other) > 0 def ge(self, other): return self.cmp(other) >= 0
class NumberMixinCmp(ComparisonMixin): def init(self, data): self.x = data def str(self): return str(self.x) def cmp(self, other): if self.x == other.x: return 0 return (-1 if self.x < other.x else 1)
- Previous message: [Python-3000] Add python-3000-like print function to python 2.6
- Next message: [Python-3000] Please re-add __cmp__ to python 3000
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]