Issue 28629: Emit ResourceWarning when implicitly terminating a suspended frame? (original) (raw)
Issue28629
Created on 2016-11-07 02:50 by ncoghlan, last changed 2022-04-11 14:58 by admin.
Messages (11) | ||
---|---|---|
msg280185 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2016-11-07 02:50 |
There have been a few discussions recently regarding the fact that generator and coroutine cleanup can suppress ResourceWarnings from other components. For example: def mygen(fname): with open(fname) as f: yield from f print(next(mygen("README.md"))) Here, the opened file is actually being cleaned up by __del__ on the generator-iterator instance (when it throws GeneratorExit into the suspended frame), but there's no ResourceWarning as the *file itself* is being cleaned up by the file's __exit__ method. I've been thinking about how we might be able to help developers better manage this, and one conclusion I've come to is that it means we need to start thinking about suspended frames that don't terminate naturally during the course of program execution as resources to be managed - their locals can and do reference scarce system resources, and folks are expecting "with" and "try-finally" statements to provide reliable management of those resources, even when there's a "yield", "yield from" or "await" in the nested suite. So what do folks think of the idea of making __del__ on generator-iterator objects emit ResourceWarning in cases where: - gi_frame is not None (i.e. the frame hasn't completed execution) - gi_frame.f_lasti isn't -1 (i.e. the frame has started execution) and similarly for coroutines and cr_frame? | ||
msg280189 - (view) | Author: Nathaniel Smith (njs) * ![]() |
Date: 2016-11-07 06:06 |
+1 to the change for generators. This is actually in the PEP 533 draft: https://www.python.org/dev/peps/pep-0533/#modifications-to-basic-iterator-types For coroutines, doesn't this overlap with the existing "coroutine not awaited" warning? | ||
msg280193 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-11-07 08:56 |
To take a decision, I would suggest to implement the warning, run the Python test suite, and run the test suite of a few applications like Twisted and Django, to see how much code needs to be modified. Well, let's say that my code consumes a generator but emits the warning because it doesn't consume completely the warning. What am I supposed to do? Let's say the the generator is an infine suite like (2**i for i in itertools.count()). Should I call gen.close()? If a generator holds a resource like a file, does gen.close() immediately release the resource? Or does it rely on the garbage collector? I'm curious to see if using source=frame when emitting the warning helps to identify quickly the origin of the bug. I added the source parameter in Python 3.6. It requires the usage of tracemalloc to log where the source object (the frame in this case) was allocated. | ||
msg280200 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2016-11-07 14:29 |
As Victor suggests, the reason I ended the issue title with a question mark is because I'm worried that doing this by default is likely to be too noisy to be useful. It's worth trying out though, even if it ends up being something we have to hide behind a -X flag. If that concern does prove well-founded, another possible way to go would be to change the way generators interact with the context management protocol. In order to ensure a clear exception when you accidentally leave out `contextlib.contextmanager`, they currently simply don't implement it, so you have to use `contextlib.closing` instead. However, you can't wrap an existing generator in that unconditionally, as you'll break direct calls (`closing` relies on `__enter__` to give callers transparent access to the original object). One way to thread that needle would be to offer a `contextlib.managed_generator` alternative decorator that tweaked the behaviour of that particular generator-iterator instance to: - emit a ResourceWarning in __del__ - support the context management protocol Unfortunately, I haven't been able to think of a particularly clean implementation strategy that wouldn't introduce undesirable runtime performance overheads. | ||
msg280728 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-11-14 03:10 |
This would be hard to get right, but maybe not impossible. If the frame is suspended, and will do special handling of GeneratorExit, a ResourceWarning would be nice in most cases. But if the frame will not catch any exception, there should be no ResourceWarning. I think this is the big difference between mygen() and Victor’s generator expression: gen = (2**i for i in itertools.count()) next(gen) # Generator suspended but won’t catch any exception del gen # Like deleting a plain iterator; no ResourceWarning expected One more complication: If the frame is suspended at something like “yield from open(fname)”, it should trigger a ResourceWarning, because GeneratorExit would cause the file’s close() method to be called. But if “yield from” is calling a simpler generator-iterator with nothing to clean up, it should not emit a ResourceWarning. | ||
msg280736 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-11-14 07:15 |
> del gen # Like deleting a plain iterator; no ResourceWarning expected Hum, wait, I'm not sure that I got the whole point of this issue. If the generator uses "with obj:", "del gen" will not call obj.__exit__(). Do you consider that "del gen" is better to clear obj than not known when gen and obj are destroyed? Is there a solution to call obj.__exit__()? | ||
msg280737 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-11-14 07:38 |
Perhaps I shouldn’t have mentioned “del gen”. That was meant to represent the garbage collector running and calling gen.__del__() or equivalent. Basically what I was trying to say is I think there will be two classes of generators: 1. Simple generators just implementing the plain iterator protocol, where there is nothing to clean up. In these cases a ResourceWarning is unwanted. Example: generator expressions. 2. Complex generators that have special cleanup code. Ideally the user should either exhaust the generator or call its close() method, and a ResourceWarning would be useful if neither of these happen. Example: Nick’s mygen(). | ||
msg280738 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-11-14 07:59 |
BTW my understanding is that currently, if the garbage collector deletes a generator, it effectively calls its close() method. When close() is called, it throws GeneratorExit into the suspended generator. This is meant to cause all the “with” and “try” statements to clean up. So, yes, explicitly calling close() should immediately release the resources, call obj.__exit__(), etc. The idea of the ResourceWarning is to tell the programmer they forgot to call close(). | ||
msg280752 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2016-11-14 11:37 |
Martin raises an interesting point - the cases where ResourceWarning is likely to be appropriate are going to be those where the frame has been suspended inside a with statement or a try/finally statement. However, we don't currently track those at runtime or at compile time in any way - they're implicit in the state of the frame stack. That doesn't mean we *couldn't* track them though, and I think co_lnotab provides a hint as to how we could do it with minimal runtime overhead: add a new co_cleanup data table to code objects that provides efficiently packed storage for a sequence of "setup/cleanup" bytecode offset 2-tuples. All code objects would gain an additional pointer, and those that used with and try/finally would get fractionally larger than that, but frame objects would remain the same size they are today. Glossing over the complexities of large bytecode offsets (ala co_lnotab), consider: >>> def g_with(): ... with ...: ... pass ... >>> dis.dis(g_with) 2 0 LOAD_CONST 1 (Ellipsis) 3 SETUP_WITH 5 (to 11) 6 POP_TOP 3 7 POP_BLOCK 8 LOAD_CONST 0 (None) >> 11 WITH_CLEANUP_START 12 WITH_CLEANUP_FINISH 13 END_FINALLY 14 LOAD_CONST 0 (None) 17 RETURN_VALUE Here, co_cleanup would include a single 2-tuple with '0x03' as an absolute offset (for the setup) and '0x08' as a relative offset (for the cleanup). If '(f_lasti - 0x03) < 0x08' then __del__ on a generator-iterator or coroutine would raise ResourceWarning. >>> def g_finally(): ... try: ... pass ... finally: ... pass ... >>> dis.dis(g_finally) 2 0 SETUP_FINALLY 4 (to 7) 3 3 POP_BLOCK 4 LOAD_CONST 0 (None) 5 >> 7 END_FINALLY 8 LOAD_CONST 0 (None) 11 RETURN_VALUE Here, co_cleanup would include a single 2-tuple with '0x00' as an absolute offset (for the setup) and '0x07' as a relative offset (for the cleanup). If '(f_lasti - 0x00) < 0x07' then __del__ on a generator-iterator or coroutine would raise ResourceWarning. The key is that *only* __del__ would get slower (not explicit close() calls), and it would only get slower in cases where the frame was still suspended *and* co_cleanup was populated. | ||
msg280762 - (view) | Author: Alyssa Coghlan (ncoghlan) * ![]() |
Date: 2016-11-14 12:16 |
Oops, I forgot to mention one other potential cost-minimisation strategy for a `co_cleanuptab` field: only populate it with setup/cleanup opcode pairs that include a yield, yield from, or await operation between them. The potential benefit I can see to *not* doing that is that if the information is always available (even on synchronous frames), then greenlet based frameworks like gevent may be able to make use of it. | ||
msg290953 - (view) | Author: Nathaniel Smith (njs) * ![]() |
Date: 2017-04-01 02:10 |
It does make sense to skip emitting a warning if there's no try or with block active. Don't we already have the ability to check for this, though, without any extensions to the frame or code objects? That's what the public PyGen_NeedsFinalizing does, right? It's implemented as a for loop over the frame's block stack, which in most cases should be extremely small, so the performance cost of running this from generator.__del__ seems likely to be minimal. (I think currently the implementation is not *quite* correct if there's a 'yield from' active – in most cases it won't much matter because if A is yielding from B and A is collected, then B will probably be collected a moment later and have its own __del__ called. But this is not *quite* the same as what should happen, which is that collecting A should immediately call B.close(), which in principle might do something arbitrarily different than B.__del__. But adding a check for whether a 'yield from' is active would be pretty trivial.) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:39 | admin | set | github: 72815 |
2017-07-19 09:12:49 | jstasiak | set | nosy: + jstasiak |
2017-04-01 02:10:10 | njs | set | messages: + |
2016-11-14 12:16:23 | ncoghlan | set | messages: + |
2016-11-14 11:37:44 | ncoghlan | set | messages: + |
2016-11-14 07:59:46 | martin.panter | set | messages: + |
2016-11-14 07:38:50 | martin.panter | set | messages: + |
2016-11-14 07:15:53 | vstinner | set | messages: + |
2016-11-14 03:10:56 | martin.panter | set | nosy: + martin.pantermessages: + |
2016-11-07 14:29:24 | ncoghlan | set | messages: + |
2016-11-07 08:56:04 | vstinner | set | messages: + |
2016-11-07 06:06:55 | njs | set | messages: + |
2016-11-07 02:50:31 | ncoghlan | create |