(original) (raw)

The docs don't seem to make any guarantee about calling \_\_eq\_\_ or \_\_hash\_\_:

d\[key\]
Return the item of d with key key. Raises a KeyError if key is not in the map.

which seems to indicate that this kind of optimization should be fine.

In fact I would very much like messing with the semantics of \_\_eq\_\_ be discouraged in the docs. Currently, the docs merely say "The truth of x==y does not imply that x!=y is false." Of course, once \_\_eq\_\_ and \_\_ne\_\_ are separately overridable nothing can be guaranteed but I've seen code where x == y and x != y do completely different things and that was not pretty.

2014-03-18 21:27 GMT-07:00 Nick Coghlan <ncoghlan@gmail.com>:

On 19 March 2014 11:09, Steven D'Aprano <steve@pearwood.info> wrote:
\> Although I have tentatively said I think this is okay, it is a change in
\> actual semantics of Python code: what you write is no longer what gets
\> run. That makes this \*very\* different from changing the implementation
\> of sort -- by analogy, its more like changing the semantics of
\>
\> a = f(x) + f(x)
\>
\> to only call f(x) once. I don't think you would call that an
\> implementation detail, would you? Even if justified -- f(x) is a pure,
\> deterministic function with no side-effects -- it would still be a
\> change to the high-level behaviour of the code.

Ah, I think this is a good alternative example. Given the stated
assumptions (true builtin dict, not modified between operations),
would we be OK with PyPI optimising the following to only do a single
dict lookup:

a = d\[x\] + d\[x\]

It's essentially the same optimisation as the one being discussed - in
the code \*as written\*, there are two lookups visible, but for any
sensible \_\_hash\_\_ and \_\_eq\_\_ implementation, they should always give
the same answer for a true builtin dict that isn't modified between
the two lookups. (and yes, PyPy has the infrastructure to enforce
those constraints safely and fall back to normal execution if they
aren't met - that ability to take opportunistic advantage of known
behaviours of particular types is one of the key things that makes it
such a powerful tool for implicit optimisations, as compared to things
like Cython and Numba, which change semantics more markedly, but also
have to be explicitly applied to specific sections of your code rather
than being applied automatically).

I think it's certainly borderline (it's the kind of surprising
behavioural change that irritates people about C/C++ optimisers), but
I also think it's a defensible optimisation if the gain is significant
enough.

Regards,

Nick.

\--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/antony.lee%40berkeley.edu