[Python-Dev] PEP 525, third round, better finalization (original) (raw)

Oscar Benjamin oscar.j.benjamin at gmail.com
Sat Sep 3 14:38:15 EDT 2016


On 3 September 2016 at 16:42, Nick Coghlan <ncoghlan at gmail.com> wrote:

On 2 September 2016 at 19:13, Nathaniel Smith <njs at pobox.com> wrote:

This works OK on CPython because the reference-counting gc will call handle.del() at the end of the scope (so on CPython it's at level 2), but it famously causes huge problems when porting to PyPy with it's much faster and more sophisticated gc that only runs when triggered by memory pressure. (Or for "PyPy" you can substitute "Jython", "IronPython", whatever.) Technically this code doesn't actually "leak" file descriptors on PyPy, because handle.del() will get called eventually (this code is at level 1, not level 0), but by the time "eventually" arrives your server process has probably run out of file descriptors and crashed. Level 1 isn't good enough. So now we have all learned to instead write ... BUT, with the current PEP 525 proposal, trying to use this generator in this way is exactly analogous to the open(path).read() case: on CPython it will work fine -- the generator object will leave scope at the end of the 'async for' loop, cleanup methods will be called, etc. But on PyPy, the weakref callback will not be triggered until some arbitrary time later, you will "leak" file descriptors, and your server will crash. That suggests the PyPy GC should probably be tracking pressure on more resources than just memory when deciding whether or not to trigger a GC run.

PyPy's GC is conformant to the language spec AFAICT: https://docs.python.org/3/reference/datamodel.html#object._del_

""" object.del(self)

Called when the instance is about to be destroyed. This is also called a destructor. If a base class has a del() method, the derived class’s del() method, if any, must explicitly call it to ensure proper deletion of the base class part of the instance. Note that it is possible (though not recommended!) for the del() method to postpone destruction of the instance by creating a new reference to it. It may then be called at a later time when this new reference is deleted. It is not guaranteed that del() methods are called for objects that still exist when the interpreter exits. """

Note the last sentence. It is also not guaranteed (across different Python implementations and regardless of the CPython-specific notes in the docs) that any particular object will cease to exist before the interpreter exits. Taken together these two imply that it is not guaranteed that any del method will ever be called.

Antoine's excellent work in PEP 442 has improved the situation with CPython but the language spec (covering all implementations) remains the same and changing that requires a new PEP and coordination with other implementations. Without changing it is a mistake to base a new core language feature (async finalisation) on CPython-specific implementation details. Already using with (or try/finally etc.) inside a generator function behaves differently under PyPy:

$ cat gentest.py

def generator_needs_finalisation(): try: for n in range(10): yield n finally: print('Doing important cleanup')

for obj in generator_needs_finalisation(): if obj == 5: break

print('Process exit')

$ python gentest.py Doing important cleanup Process exit

So here the cleanup is triggered by the reference count of the generator falling at the break statement. Under CPython this corresponds to Nathaniel's "level 2" cleanup. If we keep another reference around it gets done at process exit:

$ cat gentest2.py

def generator_needs_finalisation(): try: for n in range(10): yield n finally: print('Doing important cleanup')

gen = generator_needs_finalisation() for obj in gen: if obj == 5: break

print('Process exit')

$ python gentest2.py Process exit Doing important cleanup

So that's Nathaniel's "level 1" cleanup. However if you run either of these scripts under PyPy the cleanup simply won't occur (i.e. "level 0" cleanup):

$ pypy gentest.py Process exit $ pypy gentest2.py Process exit

I don't think PyPy is in breach of the language spec here. Python made a decision a long time ago to shun RAII-style implicit cleanup in favour if with-style explicit cleanup.

The solution to this problem is to move resource management outside of the generator functions. This is true for ordinary generators without an event-loop etc. The example in the PEP is

async def square_series(con, to): async with con.transaction(): cursor = con.cursor( 'SELECT generate_series(0, $1) AS i', to) async for row in cursor: yield row['i'] ** 2

async for i in square_series(con, 1000): if i == 100: break

The normal generator equivalent of this is:

def square_series(con, to): with con.transaction(): cursor = con.cursor( 'SELECT generate_series(0, $1) AS i', to) for row in cursor: yield row['i'] ** 2

This code is already broken: move the with statement outside to the caller of the generator function.

Going back to Nathaniel's example:

def get_file_contents(path): with open(path) as handle: return handle.read()

Nick wants it to be generator function so we don't have to load the whole file into memory i.e.:

def get_file_lines(path): with open(path) as handle: yield from handle

However this is now broken if the iterator is not fully consumed:

for line in get_file_lines(path): if line.startswith('#'): break

The answer is to move the with statement outside and pass the handle into your generator function:

def get_file_lines(handle): yield from handle

with open(path) as handle: for line in get_file_lines(handle): if line.startswith('#'): break

Of course in this case get_file_lines is trivial and can be omitted but this fix works more generally in the case that get_file_lines actually does some processing on the lines of the file: move the with statement outside and turn the generator function into an iterator-style filter.

-- Oscar



More information about the Python-Dev mailing list