Issue 32623: Resize dict on del/pop (original) (raw)

Created on 2018-01-22 16:37 by yselivanov, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 5297 closed methane,2018-01-24 12:57
PR 5300 closed methane,2018-01-24 15:07
Messages (9)
msg310427 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-22 16:37
We should investigate whether we want dicts to compact themselves on del/pop operations. Currently we have to rely on workarounds to have compactable dict.copy (see issue 31179) etc.
msg310433 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-22 16:57
Note: It was recently discussed if "del dict[key]" should keep the insertion order. If I understood correctly: yes, the order must be preserved on deletion. https://mail.python.org/pipermail/python-dev/2017-November/150142.html
msg310548 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-24 01:53
For insert/pop loop, reduce table size aggressively on pop may cause performance regression. So reducing size should be conservative. So my opinion is: * When dict size become 0, make the dict shared-empty, like dict.clear() * When (dict size < dk_size/8), call insertion_resize()
msg310570 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-24 09:29
I agree that an heuristic is needed to decide when a dict should be compacted. > * When (dict size < dk_size/8), call insertion_resize() In bpo-31179, I suggested to Yury to use 2/3 ratio... to avoid integer overflow :-) He first used 80%, but I dislike using the FPU in the dictobject.c. I'm not sure of the cost of switching from integers to floats, and more generally I hate rounding issues, so I prefer to use regular integers ;-) + (3) if 'mp' is non-compact ('del' operation does not resize dicts), + do fast-copy only if it has at most 1/3 non-used keys. + + The last condition (3) is important to guard against a pathalogical + case when a large dict is almost emptied with multiple del/pop + operations and copied after that. In cases like this, we defer to + PyDict_Merge, which produces a compacted copy. By the way, if dict automatically compacts itself automatically, do we still need Yury's test "is the dict compact"?
msg310571 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-24 09:31
This issue is a matter of CPU vs memory tradeoff. It reminds me the PEP 393: str type doesn't make tradeoff anymore, CPython guarantee that str always use the most efficient storage (least memory) for characters. But here the tradeoff is different. A dict is mutable, whereas str is immutable ;-)
msg310604 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-24 16:02
> * When dict size become 0, make the dict shared-empty, like dict.clear() This will cause significant performance regression for `dict[a]=None; del dict[a]` loop. del/pop shouldn't do clear(). > * When (dict size < dk_size/8), call insertion_resize() This is bad too. When ma_used=127 and dk_size=1024, new size will be 1024! It's because current GROWTH_RATE is used*2 + size/2. This GROWTH_RATE is set in . We should understand it before changing anything.
msg310607 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-24 16:20
"This will cause significant performance regression for `dict[a]=None; del dict[a]` loop. del/pop shouldn't do clear()." Should we make sure that all dicts have at least MINSIZE entries?
msg310649 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-25 00:46
> Should we make sure that all dicts have at least MINSIZE entries? I don't think so. I think "allocate on first insert" is good idea for dict.clear() And I think it's good idea for new dict too: https://github.com/python/cpython/pull/1080
msg310654 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-25 02:07
I think I understand #17563, and I should fix GROWTH_RATE. Before compact-ordered dict, we can avoid resizing in "the number of deletions is on a par with the number of insertions." scenario, by large GROWTH_RATE. That's because new entry can reuse dummy entries. But in compact-ordered dict, we can't do that. We need resizing always, and resize is much faster than legacy dict. I think GROWTH_RATE should be ma_used*3 for now.
History
Date User Action Args
2022-04-11 14:58:56 admin set github: 76804
2020-08-04 16:02:32 vstinner set nosy: - vstinner
2020-08-04 04:34:20 rhettinger set nosy: + tim.peters, rhettinger
2020-08-04 03:34:42 gvanrossum set nosy: + gvanrossum
2018-04-02 11:56:14 methane set dependencies: + GROWTH_RATE prevents dict shrinking
2018-01-31 04:23:23 xgdomingo set nosy: + xgdomingo
2018-01-25 02:07:15 methane set messages: +
2018-01-25 00:46:17 methane set messages: +
2018-01-24 16:20:32 vstinner set messages: +
2018-01-24 16:02:09 methane set messages: +
2018-01-24 15:07:04 methane set pull_requests: + <pull%5Frequest5147>
2018-01-24 12:57:18 methane set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest5144>
2018-01-24 09:31:37 vstinner set messages: +
2018-01-24 09:29:40 vstinner set messages: +
2018-01-24 01:53:52 methane set messages: +
2018-01-22 16:57:06 vstinner set messages: +
2018-01-22 16:37:38 yselivanov create