[Python-Dev] == on object tests identity in 3.x (original) (raw)
Andreas Maier andreas.r.maier at gmx.de
Fri Jul 11 16:04:35 CEST 2014
- Previous message: [Python-Dev] == on object tests identity in 3.x
- Next message: [Python-Dev] == on object tests identity in 3.x
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 09.07.2014 03:48, schrieb Raymond Hettinger:
On Jul 7, 2014, at 4:37 PM, Andreas Maier <andreas.r.maier at gmx.de> wrote:
I do not really buy into the arguments that try to show how identity and value are somehow the same. They are not, not even in Python.
The argument I can absolutely buy into is that the implementation cannot be changed within a major release. So the real question is how we document it. Once every few years, someone discovers IEEE-754, learns that NaNs aren't supposed to be equal to themselves and becomes inspired to open an old debate about whether the wreck Python in a effort to make the world safe for NaNs. And somewhere along the way, people forget that practicality beats purity. Here are a few thoughts on the subject that may or may not add a little clarity ;-) * Python already has IEEE-754 compliant NaNs: assert float('NaN') != float('NaN') * Python already has the ability to filter-out NaNs: [x for x in container if not math.nan(x)] * In the numeric world, the most common use of NaNs is for missing data (much like we usually use None). The property of not being equality to itself is primarily useful in low level code optimized to run a calculation to completion without running frequent checks for invalid results (much like @n/a is used in MS Excel). * Python also lets containers establish their own invariants to establish correctness, improve performance, and make it possible to reason about our programs: for x in c: assert x in c * Containers like dicts and sets have always used the rule that identity-implies equality. That is central to their implementation. In particular, the check of interned string keys relies on identity to bypass a slow character-by-character comparison to verify equality. * Traditionally, a relation R is considered an equality relation if it is reflexive, symmetric, and transitive: R(x, x) -> True R(x, y) -> R(y, x) R(x, y) ^ R(y, z) -> R(x, z) * Knowingly or not, programs tend to assume that all of those hold. Test suites in particular assume that if you put something in a container that assertIn() will pass. * Here are some examples of cases where non-reflexive objects would jeopardize the pragmatism of being able to reason about the correctness of programs: s = SomeSet() s.add(x) assert x in s s.remove(x) # See collections.abc.Set.remove assert not s s.clear() # See collections.abc.Set.clear asset not s * What the above code does is up to the implementer of the container. If you use the Set ABC, you can choose to implement contains() and discard() to use straight equality or identity-implies equality. Nothing prevents you from making containers that are hard to reason about. * The builtin containers make the choice for identity-implies equality so that it is easier to build fast, correct code. For the most part, this has worked out great (dictionaries in particular have had identify checks built-in from almost twenty years). * Years ago, there was a debate about whether to add an is() method to allow overriding the is-operator. The push for the change was the "pure" notion that "all operators should be customizable". However, the idea was rejected based on the "practical" notions that it would wreck our ability to reason about code, it slow down all code that used identity checks, that library modules (ours and third-party) already made deep assumptions about what "is" means, and that people would shoot themselves in the foot with hard to find bugs. Personally, I see no need to make the same mistake by removing the identity-implies-equality rule from the built-in containers. There's no need to upset the apple cart for nearly zero benefit.
Containers delegate the equal comparison on the container to their elements; they do not apply identity-based comparison to their elements. At least that is the externally visible behavior.
Only the default comparison behavior implemented on type object follows the identity-implies-equality rule.
As part of my doc patch, I will upload an extension to the test_compare.py test suite, which tests all built-in containers with values whose order differs the identity order, and it shows that the value order and equality wins over identity, if implemented.
IMO, the proposed quest for purity is misguided. There are many practical reasons to let the builtin containers continue work as the do now.
As I said, I can accept compatibility reasons. Plus, the argument brought up by Benjamin about the desire for the the identity-implies-equality rule as a default, with no corresponding rule for order comparison (and I added both to the doc patch).
Andy
- Previous message: [Python-Dev] == on object tests identity in 3.x
- Next message: [Python-Dev] == on object tests identity in 3.x
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]