[Python-Dev] Intricacies of calling eq (original) (raw)

Kevin Modzelewski kmod at dropbox.com
Wed Mar 19 03:24:06 CET 2014


I think in this case, though, if we say for the sake of argument that the guaranteed semantics of a dictionary lookup are zero or more calls to hash plus zero or more calls to eq, then two back-to-back dictionary lookups wouldn't have any observable differences from doing only one, unless you start to make assumptions about the behavior of the implementation. To me there seems to be a bit of a gap between seeing a dictionary lookup and knowing the exact sequence of user-functions that get called, far more than for example something like "a < b". I would feel differently if the question was if it's ok to fold something like

x = a < b y = a < b

into a single comparison, since I'd agree with the way you described it that you look at this code and would expect lt to be called twice. I guess maybe I just disagree about whether dictionaries are contractually bound to call hash and eq? For instance, maybe the dict could have a small cache of recently-looked-up elements to skip hashing / equality tests if they get accessed again; I have no idea if that would be profitable or not, but it seems like that would be a roughly-equivalent change that's still "doing two dictionary lookups" except the second one simply wouldn't call back into the user's python code. Or maybe the dictionary could be implemented as a simple list for small sizes and skip calling hash until it decides to switch to a hash-table strategy; again I'd still say it's "doing the lookups" but it just calls a different set of python functions.

Although I have tentatively said I think this is okay, it is a change in actual semantics of Python code: what you write is no longer what gets run. That makes this very different from changing the implementation of sort -- by analogy, its more like changing the semantics of

a = f(x) + f(x) to only call f(x) once. I don't think you would call that an implementation detail, would you? Even if justified -- f(x) is a pure, deterministic function with no side-effects -- it would still be a change to the high-level behaviour of the code.

To me, the optimization stage of a compiler's job is to transform a program to an equivalent one that runs faster, where equivalence is defined w.r.t. a certain set of rules defining the behavior of the language. If f() can really be proven to be a function that is deterministic, has no side-effects, does no runtime introspection, and returns a type that supports the identity "x + x == 2 * x" (quite a bit of work for a dynamic language jit, but definitely possible!), then I'd say that I have a fairly different understanding of the "high-level behavior" the runtime is contracted to follow. As a simpler example, I think the runtime should be very free to condense "a = 1 + 1" into "a = 2" without doing the addition.

Anyway, as I alluded to about the lt / gt usage in sort(), just because I might want alternative implementations to have flexibility to do something doesn't mean it's reasonable to say it's so. I'm biased since the implementation I'm working on uses std::unordered_map to implement Python dictionaries, and I have no idea how that's actually implemented and I'd rather not have to :) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20140318/0e16c1ed/attachment.html>



More information about the Python-Dev mailing list