msg310427 - (view) |
Author: Yury Selivanov (yselivanov) *  |
Date: 2018-01-22 16:37 |
We should investigate whether we want dicts to compact themselves on del/pop operations. Currently we have to rely on workarounds to have compactable dict.copy (see issue 31179) etc. |
|
|
msg310433 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-01-22 16:57 |
Note: It was recently discussed if "del dict[key]" should keep the insertion order. If I understood correctly: yes, the order must be preserved on deletion. https://mail.python.org/pipermail/python-dev/2017-November/150142.html |
|
|
msg310548 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2018-01-24 01:53 |
For insert/pop loop, reduce table size aggressively on pop may cause performance regression. So reducing size should be conservative. So my opinion is: * When dict size become 0, make the dict shared-empty, like dict.clear() * When (dict size < dk_size/8), call insertion_resize() |
|
|
msg310570 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-01-24 09:29 |
I agree that an heuristic is needed to decide when a dict should be compacted. > * When (dict size < dk_size/8), call insertion_resize() In bpo-31179, I suggested to Yury to use 2/3 ratio... to avoid integer overflow :-) He first used 80%, but I dislike using the FPU in the dictobject.c. I'm not sure of the cost of switching from integers to floats, and more generally I hate rounding issues, so I prefer to use regular integers ;-) + (3) if 'mp' is non-compact ('del' operation does not resize dicts), + do fast-copy only if it has at most 1/3 non-used keys. + + The last condition (3) is important to guard against a pathalogical + case when a large dict is almost emptied with multiple del/pop + operations and copied after that. In cases like this, we defer to + PyDict_Merge, which produces a compacted copy. By the way, if dict automatically compacts itself automatically, do we still need Yury's test "is the dict compact"? |
|
|
msg310571 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-01-24 09:31 |
This issue is a matter of CPU vs memory tradeoff. It reminds me the PEP 393: str type doesn't make tradeoff anymore, CPython guarantee that str always use the most efficient storage (least memory) for characters. But here the tradeoff is different. A dict is mutable, whereas str is immutable ;-) |
|
|
msg310604 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2018-01-24 16:02 |
> * When dict size become 0, make the dict shared-empty, like dict.clear() This will cause significant performance regression for `dict[a]=None; del dict[a]` loop. del/pop shouldn't do clear(). > * When (dict size < dk_size/8), call insertion_resize() This is bad too. When ma_used=127 and dk_size=1024, new size will be 1024! It's because current GROWTH_RATE is used*2 + size/2. This GROWTH_RATE is set in . We should understand it before changing anything. |
|
|
msg310607 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-01-24 16:20 |
"This will cause significant performance regression for `dict[a]=None; del dict[a]` loop. del/pop shouldn't do clear()." Should we make sure that all dicts have at least MINSIZE entries? |
|
|
msg310649 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2018-01-25 00:46 |
> Should we make sure that all dicts have at least MINSIZE entries? I don't think so. I think "allocate on first insert" is good idea for dict.clear() And I think it's good idea for new dict too: https://github.com/python/cpython/pull/1080 |
|
|
msg310654 - (view) |
Author: Inada Naoki (methane) *  |
Date: 2018-01-25 02:07 |
I think I understand #17563, and I should fix GROWTH_RATE. Before compact-ordered dict, we can avoid resizing in "the number of deletions is on a par with the number of insertions." scenario, by large GROWTH_RATE. That's because new entry can reuse dummy entries. But in compact-ordered dict, we can't do that. We need resizing always, and resize is much faster than legacy dict. I think GROWTH_RATE should be ma_used*3 for now. |
|
|