[Python-Dev] PEP 525, third round, better finalization (original) (raw)

Yury Selivanov yselivanov.ml at gmail.com
Thu Sep 1 18:34:06 EDT 2016


Hi,

I've spent quite a while thinking and experimenting with PEP 525 trying to figure out how to make asynchronous generators (AG) finalization reliable. I've tried to replace the callback for GCed with a callback to intercept first iteration of AGs. Turns out it's very hard to work with weak-refs and make asyncio event loop to reliably track and shutdown all open AGs.

My new approach is to replace the "sys.set_asyncgen_finalizer(finalizer)" function with "sys.set_asyncgen_hooks(firstiter=None, finalizer=None)".

This design allows us to:

  1. intercept first iteration of an AG. That makes it possible for event loops to keep a weak set of all "open" AGs, and to implement a "shutdown" method to close the loop and close all AGs reliably.

  2. intercept AGs GC. That makes it possible to call "aclose" on GCed AGs to guarantee that 'finally' and 'async with' statements are properly closed.

  3. in later Python versions we can add more hooks, although I can't think of anything else we need to add right now.

I'm posting below the only updated PEP section. The latest PEP revision should also be available on python.org shortly.

All new proposed changes are available to play with in my fork of CPython here: https://github.com/1st1/cpython/tree/async_gen

Finalization


PEP 492 requires an event loop or a scheduler to run coroutines. Because asynchronous generators are meant to be used from coroutines, they also require an event loop to run and finalize them.

Asynchronous generators can have try..finally blocks, as well as async with. It is important to provide a guarantee that, even when partially iterated, and then garbage collected, generators can be safely finalized. For example::

 async def square_series(con, to):
     async with con.transaction():
         cursor = con.cursor(
             'SELECT generate_series(0, $1) AS i', to)
         async for row in cursor:
             yield row['i'] ** 2

 async for i in square_series(con, 1000):
     if i == 100:
         break

The above code defines an asynchronous generator that uses async with to iterate over a database cursor in a transaction. The generator is then iterated over with async for, which interrupts the iteration at some point.

The square_series() generator will then be garbage collected, and without a mechanism to asynchronously close the generator, Python interpreter would not be able to do anything.

To solve this problem we propose to do the following:

  1. Implement an aclose method on asynchronous generators returning a special awaitable. When awaited it throws a GeneratorExit into the suspended generator and iterates over it until either a GeneratorExit or a StopAsyncIteration occur.

    This is very similar to what the close() method does to regular Python generators, except that an event loop is required to execute aclose().

  2. Raise a RuntimeError, when an asynchronous generator executes a yield expression in its finally block (using await is fine, though)::

      async def gen():
          try:
              yield
          finally:
              await asyncio.sleep(1)   # Can use 'await'.
    
              yield                    # Cannot use 'yield',
                                       # this line will trigger a
                                       # RuntimeError.
  3. Add two new methods to the sys module: set_asyncgen_hooks() and get_asyncgen_hooks().

The idea behind sys.set_asyncgen_hooks() is to allow event loops to intercept asynchronous generators iteration and finalization, so that the end user does not need to care about the finalization problem, and everything just works.

sys.set_asyncgen_hooks() accepts two arguments:

When an asynchronous generator is iterated for the first time, it stores a reference to the current finalizer. If there is none, a RuntimeError is raised. This provides a strong guarantee that every asynchronous generator object will always have a finalizer installed by the correct event loop.

When an asynchronous generator is about to be garbage collected, it calls its cached finalizer. The assumption is that the finalizer will schedule an aclose() call with the loop that was active when the iteration started.

For instance, here is how asyncio is modified to allow safe finalization of asynchronous generators::

# asyncio/base_events.py

class BaseEventLoop:

    def run_forever(self):
        ...
        old_hooks = sys.get_asyncgen_hooks()

sys.set_asyncgen_hooks(finalizer=self._finalize_asyncgen) try: ... finally: sys.set_asyncgen_hooks(*old_hooks) ...

    def _finalize_asyncgen(self, gen):
        self.create_task(gen.aclose())

The second argument, firstiter, allows event loops to maintain a weak set of asynchronous generators instantiated under their control. This makes it possible to implement "shutdown" mechanisms to safely finalize all open generators and close the event loop.

sys.set_asyncgen_hooks() is thread-specific, so several event loops running in parallel threads can use it safely.

sys.get_asyncgen_hooks() returns a namedtuple-like structure with firstiter and finalizer fields.



More information about the Python-Dev mailing list