[Python-Dev] PyObject_RichCompareBool identity shortcut (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Thu Apr 28 12:11:16 CEST 2011

Previous message: [Python-Dev] PyObject_RichCompareBool identity shortcut
Next message: [Python-Dev] PyObject_RichCompareBool identity shortcut
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Apr 28, 2011 at 6:30 PM, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote:

On Thu, Apr 28, 2011 at 3:57 AM, Nick Coghlan <ncoghlan at gmail.com> wrote: ..

It is an interesting question of what "sane invariants" are. Why you consider the invariants that you listed essential while say

if c1 == c2: assert all(x == y for x,y in zip(c1, c2)) optional? Because this assertion is an assertion about the behaviour of comparisons that violates IEEE754, while the assertions I list are all assertions about the behaviour of containers that can be made true regardless of IEEE754 by checking identity explicitly. AFAIK, IEEE754 says nothing about comparison of containers, so my invariant cannot violate it. What you probably wanted to say is that my invariant cannot be achieved in the presence of IEEE754 conforming floats, but this observation by itself does not make my invariant less important than yours. It just makes yours easier to maintain.

No, I meant what I said. Your assertion includes a direct comparison between values (the "x == y" part) which means that IEEE754 has a bearing on whether or not it is a valid assertion. Every single one of my stated invariants consists solely of relationships between containers, or between a container and its contents. This keeps them all out of the domain of IEEE754 since the container implementers get to decide whether or not to factor object identity into the management of the container contents.

The core containment invariant is really only this one:

for x in c:
    assert x in c

That is, if we iterate over a container, all entries returned should be in the container. Hopefully it is non-controversial that this is a sane and reasonable invariant for a container user to expect.

The comparison invariants follow from the definition of set equivalence as:

set1 == set2 iff all(x in set2 for x in set1) and all(y in set1 for y in set2)

Again, notice that there is no comparison of items here - merely a consideration of the way items relate to containers.

The rationale behind the count() and index() assertions is harder to define in implementation neutral terms, but their behaviour does follow naturally from the internal enforcement of reflexivity needed to guarantee that core invariant.

In mathematics, this is all quite straightforward and non-controversial, since it can be taken for granted that equality is reflexive (as it's part of the definition of what equality means - equivalence relations are relations that are symmetric, transitive and reflexive. Lose any one of those three properties and it isn't an equivalence relation any more).

However, when we confront the practical reality of IEEE754 floating point values and the lack of reflexivity in the presence of NaN, we're faced with a choice of (at least) 4 alternatives:

Deny it. Say equality is reflexive at the language level, and we don't care that it makes it impossible to fully implement IEEE754 semantics. This is what Eiffel does, and if you don't care about interoperability and the possibility of algorithmic equivalence with hardware implementations, it's probably not a bad idea. After all, why discard centuries of mathematical experience based on a decision that the IEEE754 committee can't clearly recall the rationale for, and didn't clearly document?
Tolerate it, but attempt to confine the breakage of mathematical guarantees to the arithmetic operations actually covered by the relevant standards. This is what CPython currently does by enforcing the container invariants at an implementation level, and, as I think it's a good way to handle the situation, this is what I am advocating lifting up to the language level through appropriate updates to the library and language reference. (Note that even changing the behaviour of float() leaves Python in this situation, since third party types will still be free to follow IEEE754. Given that, it seems relatively pointless to change the behaviour of builtin floats after all the effort that has gone into bringing them ever closer to IEEE754).
Signal it. We already do this in some cases (e.g. for ZeroDivisionError), and I'm personally quite happy with the idea of raising ValueError in other cases, such as when attempting to perform ordering comparisons on NaN values.
Embrace it. Promote NaN to a language level construct, define semantics allowing it to propagate through assorted comparison and other operations (including short-circuiting logic operators) without being coerced to True as it is now.

Documenting the status quo is the only necessary step in all of this (and Raymond has already adopted the relevant tracker issue). There are tweaks to the current semantics that may be useful (specifically ValueError when attempting to order NaN), but changing the meaning of equality for floats probably isn't one of them (since that only fixes one type, while fixing the affected algorithms fixes all types).

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message: [Python-Dev] PyObject_RichCompareBool identity shortcut
Next message: [Python-Dev] PyObject_RichCompareBool identity shortcut
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list