[Python-Dev] cmp(x,x) (original) (raw)
Armin Rigo arigo at tunes.org
Tue May 18 04:57:33 EDT 2004
- Previous message: [Python-Dev] Weekly Python Bug/Patch Summary
- Next message: [Python-Dev] cmp(x,x)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,
A minor semantic change that creeped in some time ago was an implicit assumption that any object x should "reasonably" be expected to compare equal to itself. The arguments are summarized below (should this be documented, inserted in NEWS, turned in a mini-PEP a posteriori ... ?).
The point is that comparisons now behave differently if they are issued by C or Python code. The expression 'x == y' will always call x.eq(y), because in some settings (e.g. Numeric) the result is not just True or False but an arbitrary object (e.g. a Numeric array of zeroes and ones). You cannot just say that 'x == x' should return True, because of that. So the behavior of PyObject_RichCompare() didn't change.
On the other hand, when C code does a comparison it uses PyObject_RichCompareBool(), meaning it is only interested in a 1 or 0 answer. So PyObject_RichCompareBool() is where the shortcut about comparing an object with itself has been inserted.
The result is the following: say x has a method cmp() that always returns -1.
x == x # x.cmp() is called False x < x # x.cmp() is called True cmp(x,x) # x.cmp() is NOT called 0 [x] == [x] # x.cmp() is NOT called True
The only way to explain the semantic is that the expression 'x == x' always call the special methods, but built-in functions are allowed to assume that any object compares equal to itself. In other words, C code usually checks that objects are "identical or equal", which is equalivalent to the Python expression 'x is y or x == y'. For example, the equality of lists now works as follows:
def eq(lst1, lst2): if len(lst1) != len(lst2): return False for x, y in zip(lst1, lst2): if not (x is y or x == y): return False else: return True
Should any of this be documented?
An alternative behavior would have been to leave PyObject_RichCompareBool() alone and only insert the short-cut on specific object types' comparison methods on a case-by-case basis. For example, identical lists would just compare equal, without going through the loop comparing each element. This would remove the surprize of x.cmp(x) being not always called. The semantics would be easier to explain too: two lists are equal if they are the same list or if elements compare pairwise equal. We would have (with x as above):
cmp(x,x) # x.cmp(x) is called -1 [x] == [x] False # because the lists contain a non-equal element lst = [x]; lst == lst True # because it is the same list
Finally, whatever the final semantic is, we should make sure that existing built-in objects behave in a consistent way. Of course I'm thinking about floats: on my Linux box,
f = float('nan') cmp(f,f) 0 # because f is f f == f False # because float.eq() is called
Note that as discussed below the following behavior is expected and in accordance with standards:
float('nan') is float('nan') False float('nan') == float('nan') False # not the same object
Unless there are serious objections I suggest to (i.e. I plan to) remove the short-cut in PyObject_RichCompareBool() -- performance is probably not an issue here -- and then review all built-in comparison methods and make sure that they return "equal" for identical objects.
-+- summary of arguments that lead to the original change (in cvs head).
The argument in favor of the change is to remove the complex and not useful code trying to compare self-recursive structures: for example, in some setting comparing builtin.dict with itself would trigger recursive comparison of builtin.dict with itself in an endless loop. The complex algorithm was able to spot that. The new semantics immediately assume builtin.dict to be equal to itself. Removing the complex algorithm means that you will now get an endless loop when comparing two non-identical self-recursive structures with the same shape, which is most probably not a problem in practice (on the contrary this implicit algorithm did hide a bug in one of my program).
The argument against used to be that e.g. it makes sense that the float
value 'nan' should be different from itself, as various standards require.
This argument does not apply in Python: these standards are about comparing
values, not objects, so it makes perfect sense to say that even if x is the
result of a computation that yielded an unknown answer 'nan', this answer is
still equal to itself; what it is probably not equal to is another 'nan'
which was obtained differently. In other words two float objects both
containing 'nan' should be different, but one 'nan' object is still equal to
itself. This is sane as long as no code considers 'nan' as a singleton, or
tries to reuse 'nan' objects for different 'nan' values.
-+-
Armin
- Previous message: [Python-Dev] Weekly Python Bug/Patch Summary
- Next message: [Python-Dev] cmp(x,x)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]