[Python-Dev] PEP 550 v4 (original) (raw)

Yury Selivanov yselivanov.ml at gmail.com
Fri Aug 25 18:32:22 EDT 2017

Previous message (by thread): [Python-Dev] Scope, not context? (was Re: PEP 550 v3 naming)
Next message (by thread): [Python-Dev] PEP 550 v4
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

This is the 4th iteration of the PEP that Elvis and I have rewritten from scratch.

The specification section has been separated from the implementation section, which makes them easier to follow.

During the rewrite, we realized that generators and coroutines should work with the EC in exactly the same way (coroutines used to be created with no LC in prior versions of the PEP).

We also renamed Context Keys to Context Variables which seems to be a more appropriate name.

Hopefully this update will resolve the remaining questions about the specification and the proposed implementation, and will allow us to focus on refining the API.

Yury

PEP: 550 Title: Execution Context Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Yury Selivanov <yury at magic.io>, Elvis Pranskevichus <elvis at magic.io> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017

Abstract

This PEP adds a new generic mechanism of ensuring consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines.

Thread-local storage, such as threading.local(), is inadequate for programs that execute concurrently in the same OS thread. This PEP proposes a solution to this problem.

Rationale

Prior to the advent of asynchronous programming in Python, programs used OS threads to achieve concurrency. The need for thread-specific state was solved by threading.local() and its C-API equivalent, PyThreadState_GetDict().

A few examples of where Thread-local storage (TLS) is commonly relied upon:

Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings.
Request-related data, such as security tokens and request data in web applications, language context for gettext etc.
Profiling, tracing, and logging in large code bases.

Unfortunately, TLS does not work well for programs which execute concurrently in a single thread. A Python generator is the simplest example of a concurrent program. Consider the following::

def fractions(precision, x, y):
    with decimal.localcontext() as ctx:
        ctx.prec = precision
        yield Decimal(x) / Decimal(y)
        yield Decimal(x) / Decimal(y**2)

g1 = fractions(precision=2, x=1, y=3)
g2 = fractions(precision=6, x=2, y=3)

items = list(zip(g1, g2))

The expected value of items is::

[(Decimal('0.33'), Decimal('0.666667')),
 (Decimal('0.11'), Decimal('0.222222'))]

Rather surprisingly, the actual result is::

[(Decimal('0.33'), Decimal('0.666667')),
 (Decimal('0.111111'), Decimal('0.222222'))]

This is because Decimal context is stored as a thread-local, so concurrent iteration of the fractions() generator would corrupt the state. A similar problem exists with coroutines.

Applications also often need to associate certain data with a given thread of execution. For example, a web application server commonly needs access to the current HTTP request object.

The inadequacy of TLS in asynchronous code has lead to the proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases.

The current status quo is that any library (including the standard library), which relies on TLS, is likely to be broken when used in asynchronous code or with generators (see [3]_ as an example issue.)

Some languages, that support coroutines or generators, recommend passing the context manually as an argument to every function, see [1]_ for an example. This approach, however, has limited use for Python, where there is a large ecosystem that was built to work with a TLS-like context. Furthermore, libraries like decimal or numpy rely on context implicitly in overloaded operator implementations.

The .NET runtime, which has support for async/await, has a generic solution for this problem, called ExecutionContext (see [2]_).

Goals

The goal of this PEP is to provide a more reliable threading.local() alternative, which:

provides the mechanism and the API to fix non-local state issues with coroutines and generators;
has no or negligible performance impact on the existing code or the code that will be using the new mechanism, including libraries like decimal and numpy.

High-Level Specification

The full specification of this PEP is broken down into three parts:

High-Level Specification (this section): the description of the overall solution. We show how it applies to generators and coroutines in user code, without delving into implementation details.
Detailed Specification: the complete description of new concepts, APIs, and related changes to the standard library.
Implementation Details: the description and analysis of data structures and algorithms used to implement this PEP, as well as the necessary changes to CPython.

For the purpose of this section, we define execution context as an opaque container of non-local state that allows consistent access to its contents in the concurrent execution environment.

A context variable is an object representing a value in the execution context. A new context variable is created by calling the new_context_var() function. A context variable object has two methods:

lookup(): returns the value of the variable in the current execution context;
set(): sets the value of the variable in the current execution context.

Regular Single-threaded Code

In regular, single-threaded code that doesn't involve generators or coroutines, context variables behave like globals::

var = new_context_var()

def sub():
    assert var.lookup() == 'main'
    var.set('sub')

def main():
    var.set('main')
    sub()
    assert var.lookup() == 'sub'

Multithreaded Code

In multithreaded code, context variables behave like thread locals::

var = new_context_var()

def sub():
    assert var.lookup() is None  # The execution context is empty
                                 # for each new thread.
    var.set('sub')

def main():
    var.set('main')

    thread = threading.Thread(target=sub)
    thread.start()
    thread.join()

    assert var.lookup() == 'main'

Generators

In generators, changes to context variables are local and are not visible to the caller, but are visible to the code called by the generator. Once set in the generator, the context variable is guaranteed not to change between iterations::

var = new_context_var()

def gen():
    var.set('gen')
    assert var.lookup() == 'gen'
    yield 1

    assert var.lookup() == 'gen'
    yield 2

def main():
    var.set('main')

    g = gen()
    next(g)
    assert var.lookup() == 'main'

    var.set('main modified')
    next(g)
    assert var.lookup() == 'main modified'

Changes to caller's context variables are visible to the generator (unless they were also modified inside the generator)::

var = new_context_var()

def gen():
    assert var.lookup() == 'var'
    yield 1

    assert var.lookup() == 'var modified'
    yield 2

def main():
    g = gen()

    var.set('var')
    next(g)

    var.set('var modified')
    next(g)

Now, let's revisit the decimal precision example from the Rationale_ section, and see how the execution context can improve the situation::

import decimal

decimal_prec = new_context_var()  # create a new context variable

# Pre-PEP 550 Decimal relies on TLS for its context.
# This subclass switches the decimal context storage
# to the execution context for illustration purposes.
#
class MyDecimal(decimal.Decimal):
    def __init__(self, value="0"):
        prec = decimal_prec.lookup()
        if prec is None:
            raise ValueError('could not find decimal precision')
        context = decimal.Context(prec=prec)
        super().__init__(value, context=context)

def fractions(precision, x, y):
    # Normally, this would be set by a context manager,
    # but for simplicity we do this directly.
    decimal_prec.set(precision)

    yield MyDecimal(x) / MyDecimal(y)
    yield MyDecimal(x) / MyDecimal(y**2)

g1 = fractions(precision=2, x=1, y=3)
g2 = fractions(precision=6, x=2, y=3)

items = list(zip(g1, g2))

The value of items is::

[(Decimal('0.33'), Decimal('0.666667')),
 (Decimal('0.11'), Decimal('0.222222'))]

which matches the expected result.

Coroutines and Asynchronous Tasks

In coroutines, like in generators, context variable changes are local and are not visible to the caller::

import asyncio

var = new_context_var()

async def sub():
    assert var.lookup() == 'main'
    var.set('sub')
    assert var.lookup() == 'sub'

async def main():
    var.set('main')
    await sub()
    assert var.lookup() == 'main'

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

To establish the full semantics of execution context in couroutines, we must also consider tasks. A task is the abstraction used by asyncio, and other similar libraries, to manage the concurrent execution of coroutines. In the example above, a task is created implicitly by the run_until_complete() function. asyncio.wait_for() is another example of implicit task creation::

async def sub():
    await asyncio.sleep(1)
    assert var.lookup() == 'main'

async def main():
    var.set('main')

    # waiting for sub() directly
    await sub()

    # waiting for sub() with a timeout
    await asyncio.wait_for(sub(), timeout=2)

    var.set('main changed')

Intuitively, we expect the assertion in sub() to hold true in both invocations, even though the wait_for() implementation actually spawns a task, which runs sub() concurrently with main().

Thus, tasks must capture a snapshot of the current execution context at the moment of their creation and use it to execute the wrapped coroutine whenever that happens. If this is not done, then innocuous looking changes like wrapping a coroutine in a wait_for() call would cause surprising breakage. This leads to the following::

import asyncio

var = new_context_var()

async def sub():
    # Sleeping will make sub() run after
    # `var` is modified in main().
    await asyncio.sleep(1)

    assert var.lookup() == 'main'

async def main():
    var.set('main')
    loop.create_task(sub())  # schedules asynchronous execution
                             # of sub().
    assert var.lookup() == 'main'
    var.set('main changed')

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

In the above code we show how sub(), running in a separate task, sees the value of var as it was when loop.create_task(sub()) was called.

Like tasks, the intuitive behaviour of callbacks scheduled with either Loop.call_soon(), Loop.call_later(), or Future.add_done_callback() is to also capture a snapshot of the current execution context at the point of scheduling, and use it to run the callback::

current_request = new_context_var()

def log_error(e):
    logging.error('error when handling request %r',
                  current_request.lookup())

async def render_response():
    ...

async def handle_get_request(request):
    current_request.set(request)

    try:
        return await render_response()
    except Exception as e:
        get_event_loop().call_soon(log_error, e)
        return '500 - Internal Server Error'

Detailed Specification

Conceptually, an execution context (EC) is a stack of logical contexts. There is one EC per Python thread.

A logical context (LC) is a mapping of context variables to their values in that particular LC.

A context variable is an object representing a value in the execution context. A new context variable object is created by calling the sys.new_context_var(name: str) function. The value of the name argument is not used by the EC machinery, but may be used for debugging and introspection.

The context variable object has the following methods and attributes:

name: the value passed to new_context_var().
lookup(): traverses the execution context top-to-bottom, until the variable value is found. Returns None, if the variable is not present in the execution context;
set(): sets the value of the variable in the topmost logical context.

Generators

When created, each generator object has an empty logical context object stored in its __logical_context__ attribute. This logical context is pushed onto the execution context at the beginning of each generator iteration and popped at the end::

var1 = sys.new_context_var('var1')
var2 = sys.new_context_var('var2')

def gen():
    var1.set('var1-gen')
    var2.set('var2-gen')

    # EC = [
    #     outer_LC(),
    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
    # ]
    n = nested_gen()  # nested_gen_LC is created
    next(n)
    # EC = [
    #     outer_LC(),
    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
    # ]

    var1.set('var1-gen-mod')
    var2.set('var2-gen-mod')
    # EC = [
    #     outer_LC(),
    #     gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'})
    # ]
    next(n)

def nested_gen():
    # EC = [
    #     outer_LC(),
    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
    #     nested_gen_LC()
    # ]
    assert var1.lookup() == 'var1-gen'
    assert var2.lookup() == 'var2-gen'

    var1.set('var1-nested-gen')
    # EC = [
    #     outer_LC(),
    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
    #     nested_gen_LC({var1: 'var1-nested-gen'})
    # ]
    yield

    # EC = [
    #     outer_LC(),
    #     gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}),
    #     nested_gen_LC({var1: 'var1-nested-gen'})
    # ]
    assert var1.lookup() == 'var1-nested-gen'
    assert var2.lookup() == 'var2-gen-mod'

    yield

# EC = [outer_LC()]

g = gen()  # gen_LC is created for the generator object `g`
list(g)

# EC = [outer_LC()]

The snippet above shows the state of the execution context stack throughout the generator lifespan.

contextlib.contextmanager

Earlier, we've used the following example::

import decimal

# create a new context variable
decimal_prec = sys.new_context_var('decimal_prec')

# ...

def fractions(precision, x, y):
    decimal_prec.set(precision)

    yield MyDecimal(x) / MyDecimal(y)
    yield MyDecimal(x) / MyDecimal(y**2)

Let's extend it by adding a context manager::

@contextlib.contextmanager
def precision_context(prec):
    old_rec = decimal_prec.lookup()

    try:
        decimal_prec.set(prec)
        yield
    finally:
        decimal_prec.set(old_prec)

Unfortunately, this would not work straight away, as the modification to the decimal_prec variable is contained to the precision_context() generator, and therefore will not be visible inside the with block::

def fractions(precision, x, y):
    # EC = [{}, {}]

    with precision_context(precision):
        # EC becomes [{}, {}, {decimal_prec: precision}] in the
        # *precision_context()* generator,
        # but here the EC is still [{}, {}]

        # raises ValueError('could not find decimal precision')!
        yield MyDecimal(x) / MyDecimal(y)
        yield MyDecimal(x) / MyDecimal(y**2)

The way to fix this is to set the generator's __logical_context__ attribute to None. This will cause the generator to avoid modifying the execution context stack.

We modify the contextlib.contextmanager() decorator to set genobj.__logical_context__ to None to produce well-behaved context managers::

def fractions(precision, x, y):
    # EC = [{}, {}]

    with precision_context(precision):
        # EC = [{}, {decimal_prec: precision}]

        yield MyDecimal(x) / MyDecimal(y)
        yield MyDecimal(x) / MyDecimal(y**2)

    # EC becomes [{}, {decimal_prec: None}]

asyncio

asyncio uses Loop.call_soon, Loop.call_later, and Loop.call_at to schedule the asynchronous execution of a function. asyncio.Task uses call_soon() to further the execution of the wrapped coroutine.

We modify Loop.call_{at,later,soon} to accept the new optional execution_context keyword argument, which defaults to the copy of the current execution context::

def call_soon(self, callback, *args, execution_context=None):
    if execution_context is None:
        execution_context = sys.get_execution_context()

    # ... some time later

    sys.run_with_execution_context(
        execution_context, callback, args)

The sys.get_execution_context() function returns a shallow copy of the current execution context. By shallow copy here we mean such a new execution context that:

lookups in the copy provide the same results as in the original execution context, and
any changes in the original execution context do not affect the copy, and
any changes to the copy do not affect the original execution context.

Either of the following satisfy the copy requirements:

a new stack with shallow copies of logical contexts;
a new stack with one squashed logical context.

The sys.run_with_execution_context(ec, func, *args, **kwargs) function runs func(*args, **kwargs) with ec as the execution context. The function performs the following steps:

Set ec as the current execution context stack in the current thread.
Push an empty logical context onto the stack.
Run func(*args, **kwargs).
Pop the logical context from the stack.
Restore the original execution context stack.
Return or raise the func() result.

These steps ensure that ec cannot be modified by func, which makes run_with_execution_context() idempotent.

asyncio.Task is modified as follows::

class Task:
    def __init__(self, coro):
        ...
        # Get the current execution context snapshot.
        self._exec_context = sys.get_execution_context()

        self._loop.call_soon(
            self._step,
            execution_context=self._exec_context)

    def _step(self, exc=None):
        ...
        self._loop.call_soon(
            self._step,
            execution_context=self._exec_context)
        ...

Generators Transformed into Iterators

Any Python generator can be represented as an equivalent iterator. Compilers like Cython rely on this axiom. With respect to the execution context, such iterator should behave the same way as the generator it represents.

This means that there needs to be a Python API to create new logical contexts and run code with a given logical context.

The sys.new_logical_context() function creates a new empty logical context.

The sys.run_with_logical_context(lc, func, *args, **kwargs) function can be used to run functions in the specified logical context. The lc can be modified as a result of the call.

The sys.run_with_logical_context() function performs the following steps:

Push lc onto the current execution context stack.
Run func(*args, **kwargs).
Pop lc from the execution context stack.
Return or raise the func() result.

By using new_logical_context() and run_with_logical_context(), we can replicate the generator behaviour like this::

class Generator:

    def __init__(self):
        self.logical_context = sys.new_logical_context()

    def __iter__(self):
        return self

    def __next__(self):
        return sys.run_with_logical_context(
            self.logical_context, self._next_impl)

    def _next_impl(self):
        # Actual __next__ implementation.
        ...

Let's see how this pattern can be applied to a real generator::

# create a new context variable
decimal_prec = sys.new_context_var('decimal_precision')

def gen_series(n, precision):
    decimal_prec.set(precision)

    for i in range(1, n):
        yield MyDecimal(i) / MyDecimal(3)

# gen_series is equivalent to the following iterator:

class Series:

    def __init__(self, n, precision):
        # Create a new empty logical context on creation,
        # like the generators do.
        self.logical_context = sys.new_logical_context()

        # run_with_logical_context() will pushes
        # self.logical_context onto the execution context stack,
        # runs self._next_impl, and pops self.logical_context
        # from the stack.
        return sys.run_with_logical_context(
            self.logical_context, self._init, n, precision)

    def _init(self, n, precision):
        self.i = 1
        self.n = n
        decimal_prec.set(precision)

    def __iter__(self):
        return self

    def __next__(self):
        return sys.run_with_logical_context(
            self.logical_context, self._next_impl)

    def _next_impl(self):
        decimal_prec.set(self.precision)
        result = MyDecimal(self.i) / MyDecimal(3)
        self.i += 1
        return result

For regular iterators such approach to logical context management is normally not necessary, and it is recommended to set and restore context variables directly in __next__::

class Series:

    def __next__(self):
        old_prec = decimal_prec.lookup()

        try:
            decimal_prec.set(self.precision)
            ...
        finally:
            decimal_prec.set(old_prec)

Asynchronous Generators

The execution context semantics in asynchronous generators does not differ from that of regular generators and coroutines.

Implementation

Execution context is implemented as an immutable linked list of logical contexts, where each logical context is an immutable weak key mapping. A pointer to the currently active execution context is stored in the OS thread state::

                  +-----------------+
                  |                 |     ec
                  |  PyThreadState  +-------------+
                  |                 |             |
                  +-----------------+             |
                                                  |
ec_node             ec_node             ec_node   v
+------+------+     +------+------+     +------+------+
| NULL |  lc  |<----| prev |  lc  |<----| prev |  lc  |
+------+--+---+     +------+--+---+     +------+--+---+
          |                   |                   |
LC        v         LC        v         LC        v
+-------------+     +-------------+     +-------------+
| var1: obj1  |     |    EMPTY    |     | var1: obj4  |
| var2: obj2  |     +-------------+     +-------------+
| var3: obj3  |
+-------------+

The choice of the immutable list of immutable mappings as a fundamental data structure is motivated by the need to efficiently implement sys.get_execution_context(), which is to be frequently used by asynchronous tasks and callbacks. When the EC is immutable, get_execution_context() can simply copy the current execution context by reference::

def get_execution_context(self):
    return PyThreadState_Get().ec

Let's review all possible context modification scenarios:

The ContextVariable.set() method is called::

def ContextVar_set(self, val): # See a more complete set() definition # in the Context Variables section. tstate = PyThreadState_Get() top_ec_node = tstate.ec top_lc = top_ec_node.lc new_top_lc = top_lc.set(self, val) tstate.ec = ec_node( prev=top_ec_node.prev, lc=new_top_lc)
The sys.run_with_logical_context() is called, in which case the passed logical context object is appended to the execution context::

def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node
The sys.run_with_execution_context() is called, in which case the current execution context is set to the passed execution context with a new empty logical context appended to it::

def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_top_ec_node = tstate.ec new_lc = sys.new_logical_context() new_top_ec_node = ec_node(prev=ec, lc=new_lc) try: tstate.ec = new_top_ec_node return func(*args, **kwargs) finally: tstate.ec = old_top_ec_node
Either genobj.send(), genobj.throw(), genobj.close() are called on a genobj generator, in which case the logical context recorded in genobj is pushed onto the stack::

PyGen_New(PyGenObject *gen): gen.logical_context = sys.new_logical_context() gen_send(PyGenObject *gen, ...): tstate = PyThreadState_Get() if gen.logical_context is not None: old_top_ec_node = tstate.ec new_top_ec_node = ec_node( prev=old_top_ec_node, lc=gen.logical_context) try: tstate.ec = new_top_ec_node return _gen_send_impl(gen, ...) finally: gen.logical_context = tstate.ec.lc tstate.ec = old_top_ec_node else: return _gen_send_impl(gen, ...)
Coroutines and asynchronous generators share the implementation with generators, and the above changes apply to them as well.

In certain scenarios the EC may need to be squashed to limit the size of the chain. For example, consider the following corner case::

async def repeat(coro, delay):
    await coro()
    await asyncio.sleep(delay)
    loop.create_task(repeat(coro, delay))

async def ping():
    print('ping')

loop = asyncio.get_event_loop()
loop.create_task(repeat(ping, 1))
loop.run_forever()

In the above code, the EC chain will grow as long as repeat() is called. Each new task will call sys.run_in_execution_context(), which will append a new logical context to the chain. To prevent unbounded growth, sys.get_execution_context() checks if the chain is longer than a predetermined maximum, and if it is, squashes the chain into a single LC::

def get_execution_context():
    tstate = PyThreadState_Get()

    if tstate.ec_len > EC_LEN_MAX:
        squashed_lc = sys.new_logical_context()

        ec_node = tstate.ec
        while ec_node:
            # The LC.merge() method does not replace existing keys.
            squashed_lc = squashed_lc.merge(ec_node.lc)
            ec_node = ec_node.prev

        return ec_node(prev=NULL, lc=squashed_lc)
    else:
        return tstate.ec

Logical Context

Logical context is an immutable weak key mapping which has the following properties with respect to garbage collection:

ContextVar objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their collection by the GC.
Values put in the Execution Context are guaranteed to be kept alive while there is a ContextVar key referencing them in the thread.
If a ContextVar is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed.
If a thread has ended its execution, its thread state will be cleaned up along with its ExecutionContext, cleaning up all values bound to all context variables in the thread.

As discussed earluier, we need sys.get_execution_context() to be consistently fast regardless of the size of the execution context, so logical context is necessarily an immutable mapping.

Choosing dict for the underlying implementation is suboptimal, because LC.set() will cause dict.copy(), which is an O(N) operation, where N is the number of items in the LC.

get_execution_context(), when squashing the EC, is a O(M) operation, where M is the total number of context variable values in the EC.

So, instead of dict, we choose Hash Array Mapped Trie (HAMT) as the underlying implementation of logical contexts. (Scala and Clojure use HAMT to implement high performance immutable collections [5], [6].)

With HAMT .set() becomes an O(log N) operation, and get_execution_context() squashing is more efficient on average due to structural sharing in HAMT.

See Appendix: HAMT Performance Analysis_ for a more elaborate analysis of HAMT performance compared to dict.

Context Variables

The ContextVar.lookup() and ContextVar.set() methods are implemented as follows (in pseudo-code)::

class ContextVar:

    def get(self):
        tstate = PyThreadState_Get()

        ec_node = tstate.ec
        while ec_node:
            if self in ec_node.lc:
                return ec_node.lc[self]
            ec_node = ec_node.prev

        return None

    def set(self, value):
        tstate = PyThreadState_Get()
        top_ec_node = tstate.ec

        if top_ec_node is not None:
            top_lc = top_ec_node.lc
            new_top_lc = top_lc.set(self, value)
            tstate.ec = ec_node(
                prev=top_ec_node.prev,
                lc=new_top_lc)
        else:
            top_lc = sys.new_logical_context()
            new_top_lc = top_lc.set(self, value)
            tstate.ec = ec_node(
                prev=NULL,
                lc=new_top_lc)

For efficient access in performance-sensitive code paths, such as in numpy and decimal, we add a cache to ContextVar.get(), making it an O(1) operation when the cache is hit. The cache key is composed from the following:

The new uint64_t PyThreadState->unique_id, which is a globally unique thread state identifier. It is computed from the new uint64_t PyInterpreterState->ts_counter, which is incremented whenever a new thread state is created.
The uint64_t ContextVar->version counter, which is incremented whenever the context variable value is changed in any logical context in any thread.

The cache is then implemented as follows::

class ContextVar:

    def set(self, value):
        ...  # implementation
        self.version += 1


    def get(self):
        tstate = PyThreadState_Get()

        if (self.last_tstate_id == tstate.unique_id and
                self.last_version == self.version):
            return self.last_value

        value = self._get_uncached()

        self.last_value = value  # borrowed ref
        self.last_tstate_id = tstate.unique_id
        self.last_version = self.version

        return value

Note that last_value is a borrowed reference. The assumption is that if the version checks are fine, the object will be alive. This allows the values of context variables to be properly garbage collected.

This generic caching approach is similar to what the current C implementation of decimal does to cache the the current decimal context, and has similar performance characteristics.

Performance Considerations

Tests of the reference implementation based on the prior revisions of this PEP have shown 1-2% slowdown on generator microbenchmarks and no noticeable difference in macrobenchmarks.

The performance of non-generator and non-async code is not affected by this PEP.

Summary of the New APIs

Python

The following new Python APIs are introduced by this PEP:

The sys.new_context_var(name: str='...') function to create ContextVar objects.
The ContextVar object, which has:
- the read-only .name attribute,
- the .lookup() method which returns the value of the variable in the current execution context;
- the .set() method which sets the value of the variable in the current execution context.
The sys.get_execution_context() function, which returns a copy of the current execution context.
The sys.new_execution_context() function, which returns a new empty execution context.
The sys.new_logical_context() function, which returns a new empty logical context.
The sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs) function, which runs func with the provided execution context.
The sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs) function, which runs func with the provided logical context on top of the current execution context.

C API

PyContextVar * PyContext_NewVar(char *desc): create a PyContextVar object.
PyObject * PyContext_LookupVar(PyContextVar *): return the value of the variable in the current execution context.
int PyContext_SetVar(PyContextVar *, PyObject *): set the value of the variable in the current execution context.
PyLogicalContext * PyLogicalContext_New(): create a new empty PyLogicalContext.
PyLogicalContext * PyExecutionContext_New(): create a new empty PyExecutionContext.
PyExecutionContext * PyExecutionContext_Get(): return the current execution context.
int PyExecutionContext_Set(PyExecutionContext *): set the passed EC object as the current for the active thread state.
int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, PyLogicalContext *): allows to implement sys.run_with_logical_context Python API.

Design Considerations

Should `PyThreadState_GetDict()` use the execution context?

No. PyThreadState_GetDict is based on TLS, and changing its semantics will break backwards compatibility.

PEP 521

:pep:521 proposes an alternative solution to the problem, which extends the context manager protocol with two new methods: __suspend__() and __resume__(). Similarly, the asynchronous context manager protocol is also extended with __asuspend__() and __aresume__().

This allows implementing context managers that manage non-local state, which behave correctly in generators and coroutines.

For example, consider the following context manager, which uses execution state::

class Context:

    def __init__(self):
        self.var = new_context_var('var')

    def __enter__(self):
        self.old_x = self.var.lookup()
        self.var.set('something')

    def __exit__(self, *err):
        self.var.set(self.old_x)

An equivalent implementation with PEP 521::

local = threading.local()

class Context:

    def __enter__(self):
        self.old_x = getattr(local, 'x', None)
        local.x = 'something'

    def __suspend__(self):
        local.x = self.old_x

    def __resume__(self):
        local.x = 'something'

    def __exit__(self, *err):
        local.x = self.old_x

The downside of this approach is the addition of significant new complexity to the context manager protocol and the interpreter implementation. This approach is also likely to negatively impact the performance of generators and coroutines.

Additionally, the solution in :pep:521 is limited to context managers, and does not provide any mechanism to propagate state in asynchronous tasks and callbacks.

Can Execution Context be implemented outside of CPython?

No. Proper generator behaviour with respect to the execution context requires changes to the interpreter.

Should we update sys.displayhook and other APIs to use EC?

APIs like redirecting stdout by overwriting sys.stdout, or specifying new exception display hooks by overwriting the sys.displayhook function are affecting the whole Python process by design. Their users assume that the effect of changing them will be visible across OS threads. Therefore we cannot just make these APIs to use the new Execution Context.

That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP.

Greenlets

Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts.

Conceptually, the behaviour of greenlets is very similar to that of generators, which means that similar changes around greenlet entry and exit can be done to add support for execution context.

Backwards Compatibility

This proposal preserves 100% backwards compatibility.

Appendix: HAMT Performance Analysis

.. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100%

Figure 1. Benchmark code can be found here: [9]_.

The above chart demonstrates that:

HAMT displays near O(1) performance for all benchmarked dictionary sizes.
dict.copy() becomes very slow around 100 items.

.. figure:: pep-0550-lookup_hamt.png :align: center :width: 100%

Figure 2. Benchmark code can be found here: [10]_.

Figure 2 compares the lookup costs of dict versus a HAMT-based immutable mapping. HAMT lookup time is 30-40% slower than Python dict lookups on average, which is a very good result, considering that the latter is very well optimized.

Thre is research [8]_ showing that there are further possible improvements to the performance of HAMT.

The reference implementation of HAMT for CPython can be found here: [7]_.

Acknowledgments

Thanks to Victor Petrovykh for countless discussions around the topic and PEP proofreading and edits.

Thanks to Nathaniel Smith for proposing the ContextVar design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state.

Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19]_.

Version History

Initial revision, posted on 11-Aug-2017 [20]_.
V2 posted on 15-Aug-2017 [21]_.

The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19]_.)

Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs.
V3 posted on 18-Aug-2017 [22]_.

Updates:
- Local Context was renamed to Logical Context. The term "local" was ambiguous and conflicted with local name scopes.
- Context Item was renamed to Context Key, see the thread with Nick Coghlan, Stefan Krah, and Yury Selivanov [23]_ for details.
- Context Item get cache design was adjusted, per Nathaniel Smith's idea in [25]_.
- Coroutines are created without a Logical Context; ceval loop no longer needs to special case the await expression (proposed by Nick Coghlan in [24]_.)
V4 posted on 25-Aug-2017: the current version.
- The specification section has been completely rewritten.
- Context Key renamed to Context Var.
- Removed the distinction between generators and coroutines with respect to logical context isolation.