[Python-Dev] == on object tests identity in 3.x (original) (raw)

Steven D'Aprano steve at pearwood.info
Tue Jul 8 18:57:45 CEST 2014

Previous message: [Python-Dev] == on object tests identity in 3.x
Next message: [Python-Dev] == on object tests identity in 3.x
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jul 08, 2014 at 04:53:50PM +0900, Stephen J. Turnbull wrote:

Chris Angelico writes:

> The reason NaN isn't equal to itself is because there are X bit > patterns representing NaN, but an infinite number of possible > non-numbers that could result from a calculation. I understand that. But you're missing at least two alternatives that involve raising on some calculations involving NaN, as well as the fact that forcing inequality of two NaNs produced by equivalent calculations is arguably just as wrong as allowing equality of two NaNs produced by the different calculations.

I don't think so. Floating point == represents numeric equality, not (for example) equality in the sense of "All Men Are Created Equal". Not even numeric equality in the most general sense, but specifically in the sense of (approximately) real-valued numbers, so it's an extremely precise definition of "equal", not fuzzy in any way.

In an early post, you suggested that NANs don't have a value, or that they have a value which is not a value. I don't think that's a good way to look at it. I think the obvious way to think of it is that NAN's value is Not A Number, exactly like it says on the box. Now, if something is not a number, obviously you cannot compare it numerically:

"Considered as numbers, is the sound of rain on a tin roof
 numerically equal to the sight of a baby smiling?"

Some might argue that the only valid answer to this question is "Mu",

https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question

but if we're forced to give a Yes/No True/False answer, then clearly False is the only sensible answer. No, Virginia, Santa Claus is not the same number as Santa Claus.

To put it another way, if x is not a number, then x != y for all possible values of y -- including x.

[Disclaimer: despite the name, IEEE-754 arguably does not intend NANs to be Not A Number in the sense that Santa Claus is not a number, but more like "it's some number, but it's impossible to tell which". However, despite that, the standard specifies behaviour which is best thought of in terms of as the Santa Claus model.]

That's where things get fuzzy for me -- in Python I would expect that preserving invariants would be more important than computational efficiency, but evidently it's not.

I'm not sure what you're referring to here. Is it that containers such as lists and dicts are permitted to optimize equality tests with identity tests for speed?

py> NAN = float('NAN') py> a = [1, 2, NAN, 4] py> NAN in a # identity is checked before equality True py> any(x == NAN for x in a) False

When this came up for discussion last time, the clear consensus was that this is reasonable behaviour. NANs and other such "weird" objects are too rare and too specialised for built-in classes to carry the burden of having to allow for them. If you want a "NAN-aware list", you can make one yourself.

I assume that I would have a better grasp on why Python chose to go this way rather than that if I understood IEEE 754 better.

See the answer by Stephen Canon here:

http://stackoverflow.com/questions/1565164/

[quote]

It is not possible to specify a fixed-size arithmetic type that satisfies all of the properties of real arithmetic that we know and love. The 754 committee has to decide to bend or break some of them. This is guided by some pretty simple principles:

When we can, we match the behavior of real arithmetic.
When we can't, we try to make the violations as predictable and as 
easy to diagnose as possible.

[end quote]

In particular, reflexivity for NANs was dropped for a number of reasons, some stronger than others:

One of the weaker reasons for NAN non-reflexivity is that it preserved the identity x == y <=> x - y == 0. Although that is the cornerstone of real arithmetic, it's violated by IEEE-754 INFs, so violating it for NANs is not a big deal either.
Dropping reflexivity preserves the useful property that NANs compare unequal to everything.
Practicality beats purity: dropping reflexivity allowed programmers to identify NANs without waiting years or decades for programming languages to implement isnan() functions. E.g. before Python had math.isnan(), I made my own:

def isnan(x): return isinstance(x, float) and x != x
Keeping reflexivity for NANs would have implied some pretty nasty things, e.g. if log(-3) == log(-5), then -3 == -5.

Basically, and I realise that many people disagree with their decision (notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the IEEE-754 committee led by William Kahan decided that the problems caused by having NANs compare unequal to themselves were much less than the problems that would have been caused without it.

-- Steven

Previous message: [Python-Dev] == on object tests identity in 3.x
Next message: [Python-Dev] == on object tests identity in 3.x
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list