Issue 19332: Guard against changing dict during iteration (original) (raw)

Created on 2013-10-21 14:02 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dict_mutating_iteration.patch serhiy.storchaka,2013-10-21 14:02 review
dict_mutating_iteration_2.patch serhiy.storchaka,2013-10-23 19:46 review
Messages (9)
msg200784 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-21 14:02
Currently dict iterating is guarded against changing dict's size. However when dict changed during iteration so that it's size left unchanged, this modification left unnoticed. >>> d = dict.fromkeys('abcd') >>> for i in d: ... print(i) ... d[i + 'x'] = None ... del d[i] ... d a dx dxx ax c b In general iterating over mutating dict considered logical error. It is good detect it as early as possible. The proposed patch introduces a counter which changed every time when added or removed key. If an iterator detects that this counter is changed, it raises runtime error.
msg200995 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-23 04:20
The decision to not monitor adding or removing keys was intentional. It is just not worth the cost in either time or space.
msg201062 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-23 19:46
In the first patch the counter was placed in the _dictkeysobject structure. In the second place it is placed in the PyDictObject so it now has no memory cost. Access time to new counter for non-modifying operations is same as in current code. The only additional cost is time cost for modifying operations. But modifying operations is usually much rare than non-modifying operations, and the incrementing one field takes only small part of the time needed for all operation. I don't think this will affect total performance of real programs.
msg201065 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-10-23 20:10
If there's no performance regression, then this sounds like a reasonable idea. The remaining question would be whether it can break existing code. Perhaps you should ask python-dev?
msg201156 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-24 16:34
I disagree with adding such unimportant code to the critical path.
msg201780 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013-10-30 21:47
Raymond, please don't be so concise. Is the code unimportant because the scenario is so rare, or something else?
msg202262 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2013-11-06 12:32
Duplicate of this: http://bugs.python.org/issue6017
msg202287 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-11-06 20:56
A few thoughts: * No existing, working code will benefit from this patch; however, almost all code will pay a price for it -- bigger size for an empty dict and a runtime cost (possibly very small) on the critical path (every time a value is stored in a dict). * The sole benefit of the patch is provide an earlier warning that someone is doing something weird. For most people, this will never come up (we have 23 years of Python history indicating that there isn't a real problem to that needs to be solved). * The normal rule (not just for Python) is that a data structures have undefined behavior for mutating while iterating, unless there is a specific guarantee (for example, we guarantee that the dicts are allowed to mutate values but not keys during iteration and we guarantee the behavior of list iteration while iterating). * It is not clear that other implementations such as IronPython and Jython would be able to implement this behavior (Jython wraps the Java ConcurrentHashMap). * The current patch second guesses a decision that was made long ago to only detect size changes (because it is cheap, doesn't take extra memory, isn't on the critical path, and handles the common case). * The only case whether we truly need a stronger protection is when it is needed to defend against a segfault. That is why collections.deque() implement a change counter. It has a measureable cost that slows down deque operations (increasing the number of memory accesses per append, pop, or next) but it is needed to prevent the iterator from spilling into freed memory.
msg257946 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-01-10 23:43
New changeset a576199a5350 by Victor Stinner in branch 'default': PEP 509 https://hg.python.org/peps/rev/a576199a5350
History
Date User Action Args
2022-04-11 14:57:52 admin set github: 63531
2017-02-02 14:38:17 r.david.murray link issue29420 superseder
2016-01-10 23:43:55 python-dev set nosy: + python-devmessages: +
2013-11-06 20:56:30 rhettinger set messages: +
2013-11-06 12:32:40 steven.daprano set nosy: - steven.daprano
2013-11-06 12:32:11 steven.daprano set nosy: + steven.dapranomessages: +
2013-10-30 21:47:23 ethan.furman set nosy: + ethan.furmanmessages: +
2013-10-28 06:15:54 rhettinger set status: open -> closedresolution: rejected
2013-10-24 16:34:30 rhettinger set messages: +
2013-10-23 20:10:35 pitrou set messages: +
2013-10-23 19:58:35 pitrou set nosy: + tim.peters
2013-10-23 19:46:02 serhiy.storchaka set files: + dict_mutating_iteration_2.patchmessages: +
2013-10-23 04:20:22 rhettinger set assignee: rhettingermessages: +
2013-10-21 14:02:54 serhiy.storchaka create