msg243969 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2015-05-24 05:08 |
The specialized lookdict_* variants replace themselves with the generic lookdict as soon as a non-unicode key is looked up. They could reasonably leave the replacement to insertdict (line 819, currently assert rather than a replacement), when a non-unicode key is actually inserted into the dict. While inserts are less common than (all lookups including insert), I see the main advantage as reducing the number of duplications of the replacement logic. |
|
|
msg244014 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2015-05-25 04:31 |
You could have a non-unicode key that compares equal to a unicode string. |
|
|
msg244024 - (view) |
Author: Mark Shannon (Mark.Shannon) *  |
Date: 2015-05-25 09:50 |
I don't understand why this has been closed. I agree with Jim's analysis. Lookups do not change the dict and the choice of lookdict_* variant depends solely on the set of keys. In fact, lookdict_split *doesn't* replace itself, it merely calls look_dict, leaving maintaining the invariant to insertdict. Benjamin, could you please reopen this and mark it as needing a patch. |
|
|
msg244026 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-05-25 10:09 |
It is not obvious that the patch is needed. If you have ready patch and good benchmark results, you could reopen the issue. Otherwise status quo wins. |
|
|
msg385876 - (view) |
Author: Hristo Venev (h.venev) * |
Date: 2021-01-28 18:54 |
I've attached a patch. I will soon provide benchmark results. |
|
|
msg385877 - (view) |
Author: Hristo Venev (h.venev) * |
Date: 2021-01-28 18:58 |
Benchmark program attached. 0. Creates a dict with str keys 1. str lookups 2. str subclass lookups on the same dict 3. str lookups on the same dict Results before patch: 0.9493069459404069 +- 0.004707371313935551 1.47313450980997 +- 0.01350596115630158 1.3181799192904144 +- 0.006550182814933545 Results after patch: 0.9498907704499289 +- 0.003721378313122522 1.4936510094799451 +- 0.057905386684185135 0.9494844124498195 +- 0.0029465760297623235 |
|
|
msg385994 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2021-01-30 20:41 |
This was originally "can be reopened if a patch is submitted" and Hristo Venev has now done so. Therefore, I am reopening. |
|
|
msg385996 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2021-01-30 20:59 |
Based on Hristo's timing, it appears to be a clear win. A near-wash for truly string-only dicts that shouldn't be effected; a near-wash for looking up non-(exact-)strings, and a nearly 40% speedup for the target case of looking up but not inserting a non-string or string subclass, then looking up strings thereafter. Additional comments: Barring objections, I will promote from patch review to commit review when I've had a chance to look more closely. I don't have commit privs, but I think some of the others following this issue do. The test looks pretty good enough -- good enough that I wonder if I'm missing something on the parts that seem odd. It would be great if you either cleaned them up or commented to explain why: Why is the first key vx1, which seems, if anything, like a variable? Why not k1 or string_key? Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"? Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression. This seems like a test that could probably be rolled into a bigger testfile for the actual commit. I don't have the name of such an appropriate file at hand right now, but will try to find it on a deeper review. |
|
|
msg385997 - (view) |
Author: Hristo Venev (h.venev) * |
Date: 2021-01-30 21:28 |
> Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"? I wanted to construct a key that is equal to, but not the same object as, `'x1'`. Consider this example: assert 'x1' is 'x1' spam = 'x1' assert spam is 'x1' eggs = 'x' eggs += '1' assert eggs == 'x1' assert eggs is not 'x1' assert sys.intern(eggs) is 'x1' When doing a dict lookup and the lookup key is the same object as a stored entry, `__eq__` is not called. Lookups are then significantly faster, maybe 20%. Consider this example: class EqTest: def __eq__(self, other): raise RuntimeError def __hash__(self): return id(self) adict = {} k1 = EqTest() k2 = EqTest() adict[k1] = 42 adict[k2] = 43 print(adict[k1], adict[k2]) Here `k1` is considered the same as `k1` and `k2` is considered the same as `k2`. However, `k1` and `k2` are considered distinct and never compared because they have different hashes. However, if we were to set `EqTest.__hash__ = lambda self: 42`, we'd get a RuntimeError when we try to set `adict[k2]` because it would get compared for equality with `k1`. Even if `__eq__` works, we can get some interesting behaviors. For example, when using multiple instances of `float('nan')` as keys. > Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression. I also tested a custom class that compares equal to strings. Other than being much slower, there weren't any significant differences. I also did some checks with int key lookups, which obviously fail with KeyError. They did not make the performance worse for the subsequent str lookups. I will try to make a proper test tomorrow. |
|
|
msg386000 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2021-01-30 22:24 |
Please test also creating a large dict with string-only or non-string keys. Since you add a code in insertdict() it can potentially affects performance. |
|
|
msg386040 - (view) |
Author: Hristo Venev (h.venev) * |
Date: 2021-01-31 20:34 |
I've attached a patch that should also contain a test. I also ran some benchmarks on dict creation/inserts. I couldn't notice any difference in performance. |
|
|
msg387505 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2021-02-22 08:09 |
I'm +1 for this. Would you create a pull request on GitHub? |
|
|
msg389933 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2021-03-31 21:12 |
What is the status on this? If you are losing interest, would you like someone else to turn your patch into a pull request? |
|
|
msg389934 - (view) |
Author: Hristo Venev (h.venev) * |
Date: 2021-03-31 21:15 |
Sorry, I must have missed Inada Naoki's reply. I will try to send a pull request this weekend. |
|
|
msg392273 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2021-04-29 02:06 |
New changeset 8557edbfa8f74514de82feea4c62f5963e4e0aa7 by Hristo Venev in branch 'master': bpo-24275: Don't downgrade unicode-only dicts to mixed on lookups (GH-25186) https://github.com/python/cpython/commit/8557edbfa8f74514de82feea4c62f5963e4e0aa7 |
|
|