[Python-Dev] PEP 525, third round, better finalization (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Sat Sep 3 11:42:41 EDT 2016
- Previous message (by thread): [Python-Dev] PEP 525, third round, better finalization
- Next message (by thread): [Python-Dev] PEP 525, third round, better finalization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2 September 2016 at 19:13, Nathaniel Smith <njs at pobox.com> wrote:
This works OK on CPython because the reference-counting gc will call handle.del() at the end of the scope (so on CPython it's at level 2), but it famously causes huge problems when porting to PyPy with it's much faster and more sophisticated gc that only runs when triggered by memory pressure. (Or for "PyPy" you can substitute "Jython", "IronPython", whatever.) Technically this code doesn't actually "leak" file descriptors on PyPy, because handle.del() will get called eventually (this code is at level 1, not level 0), but by the time "eventually" arrives your server process has probably run out of file descriptors and crashed. Level 1 isn't good enough. So now we have all learned to instead write
# good modern Python style: def getfilecontents(path): with open(path) as handle: return handle.read()
This only works if the file fits in memory - otherwise you just have to accept the fact that you need to leave the file handle open until you're "done with the iterator", which means deferring the resource management to the caller.
and we have fancy tools like the ResourceWarning machinery to help us catch these bugs.
Here's the analogous example for async generators. This is a useful, realistic async generator, that lets us incrementally read from a TCP connection that streams newline-separated JSON documents: async def readjsonlinesfromserver(host, port): async for line in asyncio.openconnection(host, port)[0]: yield json.loads(line) You would expect to use this like: async for data in readjsonlinesfromserver(host, port): ...
The actual synchronous equivalent to this would look more like:
def read_data_from_file(path):
with open(path) as f:
for line in f:
yield f
(Assume we're doing something interesting to each line, rather than reproducing normal file iteration behaviour)
And that has the same problem as your asynchronous example: the caller needs to worry about resource management on the generator and do:
with closing(read_data_from_file(path)) as itr:
for line in itr:
...
Which means the problem causing your concern doesn't arise from the generator being asynchronous - it comes from the fact the generator actually needs to hold the FD open in order to work as intended (if it didn't, then the code wouldn't need to be asynchronous).
BUT, with the current PEP 525 proposal, trying to use this generator in this way is exactly analogous to the open(path).read() case: on CPython it will work fine -- the generator object will leave scope at the end of the 'async for' loop, cleanup methods will be called, etc. But on PyPy, the weakref callback will not be triggered until some arbitrary time later, you will "leak" file descriptors, and your server will crash.
That suggests the PyPy GC should probably be tracking pressure on more resources than just memory when deciding whether or not to trigger a GC run.
For correct operation, you have to replace the simple 'async for' loop with this lovely construct:
async with aclosing(readjsonlinesfromserver(host, port)) as ait: async for data in ait: ... Of course, you only have to do this on loops whose iterator might potentially hold resources like file descriptors, either currently or in the future. So... uh... basically that's all loops, I guess? If you want to be a good defensive programmer?
At that level of defensiveness in asynchronous code, you need to start treating all external resources (including file descriptors) as a managed pool, just as we have process and thread pools in the standard library, and many database and networking libraries offer connection pooling. It limits your per-process concurrency, but that limit exists anyway at the operating system level - modelling it explicitly just lets you manage how the application handles those limits.
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message (by thread): [Python-Dev] PEP 525, third round, better finalization
- Next message (by thread): [Python-Dev] PEP 525, third round, better finalization
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]