[Python-Dev] PEP 567 -- Context Variables (original) (raw)
Yury Selivanov yselivanov.ml at gmail.com
Tue Dec 12 12:33:24 EST 2017
- Previous message (by thread): [Python-Dev] Last call for PEP approvals before the holidays
- Next message (by thread): [Python-Dev] PEP 567 -- Context Variables
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
This is a new proposal to implement context storage in Python.
It's a successor of PEP 550 and builds on some of its API ideas and datastructures. Contrary to PEP 550 though, this proposal only focuses on adding new APIs and implementing support for it in asyncio. There are no changes to the interpreter or to the behaviour of generator or coroutine objects.
PEP: 567 Title: Context Variables Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Yury Selivanov <yury at magic.io> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Dec-2017 Python-Version: 3.7 Post-History: 12-Dec-2017
Abstract
This PEP proposes the new contextvars
module and a set of new
CPython C APIs to support context variables. This concept is
similar to thread-local variables but, unlike TLS, it allows
correctly keeping track of values per asynchronous task, e.g.
asyncio.Task
.
This proposal builds directly upon concepts originally introduced
in :pep:550
. The key difference is that this PEP is only concerned
with solving the case for asynchronous tasks, and not generators.
There are no proposed modifications to any built-in types or to the
interpreter.
Rationale
Thread-local variables are insufficient for asynchronous tasks which
execute concurrently in the same OS thread. Any context manager that
needs to save and restore a context value and uses
threading.local()
, will have its context values bleed to other
code unexpectedly when used in async/await code.
A few examples where having a working context local storage for asynchronous code is desired:
Context managers like decimal contexts and
numpy.errstate
.Request-related data, such as security tokens and request data in web applications, language context for
gettext
etc.Profiling, tracing, and logging in large code bases.
Introduction
The PEP proposes a new mechanism for managing context variables.
The key classes involved in this mechanism are contextvars.Context
and contextvars.ContextVar
. The PEP also proposes some policies
for using the mechanism around asynchronous tasks.
The proposed mechanism for accessing context variables uses the
ContextVar
class. A module (such as decimal) that wishes to
store a context variable should:
declare a module-global variable holding a
ContextVar
to serve as a "key";access the current value via the
get()
method on the key variable;modify the current value via the
set()
method on the key variable.
The notion of "current value" deserves special consideration:
different asynchronous tasks that exist and execute concurrently
may have different values. This idea is well-known from thread-local
storage but in this case the locality of the value is not always
necessarily to a thread. Instead, there is the notion of the
"current Context
" which is stored in thread-local storage, and
is accessed via contextvars.get_context()
function.
Manipulation of the current Context
is the responsibility of the
task framework, e.g. asyncio.
A Context
is conceptually a mapping, implemented using an
immutable dictionary. The ContextVar.get()
method does a
lookup in the current Context
with self
as a key, raising a
LookupError
or returning a default value specified in
the constructor.
The ContextVar.set(value)
method clones the current Context
,
assigns the value
to it with self
as a key, and sets the
new Context
as a new current. Because Context
uses an
immutable dictionary, cloning it is O(1).
Specification
A new standard library module contextvars
is added with the
following APIs:
get_context() -> Context
function is used to get the currentContext
object for the current OS thread.ContextVar
class to declare and access context variables.Context
class encapsulates context state. Every OS thread stores a reference to its currentContext
instance. It is not possible to control that reference manually. Instead, theContext.run(callable, *args)
method is used to run Python code in another context.
contextvars.ContextVar
The ContextVar
class has the following constructor signature:
ContextVar(name, *, default=no_default)
. The name
parameter
is used only for introspection and debug purposes. The default
parameter is optional. Example::
# Declare a context variable 'var' with the default value 42.
var = ContextVar('var', default=42)
ContextVar.get()
returns a value for context variable from the
current Context
::
# Get the value of `var`.
var.get()
ContextVar.set(value) -> Token
is used to set a new value for
the context variable in the current Context
::
# Set the variable 'var' to 1 in the current context.
var.set(1)
contextvars.Token
is an opaque object that should be used to
restore the ContextVar
to its previous value, or remove it from
the context if it was not set before. The ContextVar.reset(Token)
is used for that::
old = var.set(1)
try:
...
finally:
var.reset(old)
The Token
API exists to make the current proposal forward
compatible with :pep:550
, in case there is demand to support
context variables in generators and asynchronous generators in the
future.
ContextVar
design allows for a fast implementation of
ContextVar.get()
, which is particularly important for modules
like decimal
an numpy
.
contextvars.Context
Context
objects are mappings of ContextVar
to values.
To get the current Context
for the current OS thread, use
contextvars.get_context()
method::
ctx = contextvars.get_context()
To run Python code in some Context
, use Context.run()
method::
ctx.run(function)
Any changes to any context variables that function
causes, will
be contained in the ctx
context::
var = ContextVar('var')
var.set('spam')
def function():
assert var.get() == 'spam'
var.set('ham')
assert var.get() == 'ham'
ctx = get_context()
ctx.run(function)
assert var.get('spam')
Any changes to the context will be contained and persisted in the
Context
object on which run()
is called on.
Context
objects implement the collections.abc.Mapping
ABC.
This can be used to introspect context objects::
ctx = contextvars.get_context()
# Print all context variables in their values in 'ctx':
print(ctx.items())
# Print the value of 'some_variable' in context 'ctx':
print(ctx[some_variable])
asyncio
asyncio
uses Loop.call_soon()
, Loop.call_later()
,
and Loop.call_at()
to schedule the asynchronous execution of a
function. asyncio.Task
uses call_soon()
to run the
wrapped coroutine.
We modify Loop.call_{at,later,soon}
to accept the new
optional context keyword-only argument, which defaults to
the current context::
def call_soon(self, callback, *args, context=None):
if context is None:
context = contextvars.get_context()
# ... some time later
context.run(callback, *args)
Tasks in asyncio need to maintain their own isolated context.
asyncio.Task
is modified as follows::
class Task:
def __init__(self, coro):
...
# Get the current context snapshot.
self._context = contextvars.get_context()
self._loop.call_soon(self._step, context=self._context)
def _step(self, exc=None):
...
# Every advance of the wrapped coroutine is done in
# the task's context.
self._loop.call_soon(self._step, context=self._context)
...
CPython C API
TBD
Implementation
This section explains high-level implementation details in pseudo-code. Some optimizations are omitted to keep this section short and clear.
The internal immutable dictionary for Context
is implemented
using Hash Array Mapped Tries (HAMT). They allow for O(log N) set
operation, and for O(1) get_context()
function. For the purposes
of this section, we implement an immutable dictionary using
dict.copy()
::
class _ContextData:
def __init__(self):
self.__mapping = dict()
def get(self, key):
return self.__mapping[key]
def set(self, key, value):
copy = _ContextData()
copy.__mapping = self.__mapping.copy()
copy.__mapping[key] = value
return copy
def delete(self, key):
copy = _ContextData()
copy.__mapping = self.__mapping.copy()
del copy.__mapping[key]
return copy
Every OS thread has a reference to the current _ContextData
.
PyThreadState
is updated with a new context_data
field that
points to a _ContextData
object::
PyThreadState:
context : _ContextData
contextvars.get_context()
is implemented as follows:
def get_context():
ts : PyThreadState = PyThreadState_Get()
if ts.context_data is None:
ts.context_data = _ContextData()
ctx = Context()
ctx.__data = ts.context_data
return ctx
contextvars.Context
is a wrapper around _ContextData
::
class Context(collections.abc.Mapping):
def __init__(self):
self.__data = _ContextData()
def run(self, callable, *args):
ts : PyThreadState = PyThreadState_Get()
saved_data : _ContextData = ts.context_data
try:
ts.context_data = self.__data
callable(*args)
finally:
self.__data = ts.context_data
ts.context_data = saved_data
# Mapping API methods are implemented by delegating
# `get()` and other Mapping calls to `self.__data`.
contextvars.ContextVar
interacts with
PyThreadState.context_data
directly::
class ContextVar:
def __init__(self, name, *, default=NO_DEFAULT):
self.__name = name
self.__default = default
@property
def name(self):
return self.__name
def get(self, default=NO_DEFAULT):
ts : PyThreadState = PyThreadState_Get()
data : _ContextData = ts.context_data
try:
return data.get(self)
except KeyError:
pass
if default is not NO_DEFAULT:
return default
if self.__default is not NO_DEFAULT:
return self.__default
raise LookupError
def set(self, value):
ts : PyThreadState = PyThreadState_Get()
data : _ContextData = ts.context_data
try:
old_value = data.get(self)
except KeyError:
old_value = NO_VALUE
ts.context_data = data.set(self, value)
return Token(self, old_value)
def reset(self, token):
if token.__used:
return
if token.__old_value is NO_VALUE:
ts.context_data = data.delete(token.__var)
else:
ts.context_data = data.set(token.__var,
token.__old_value)
token.__used = True
class Token:
def __init__(self, var, old_value):
self.__var = var
self.__old_value = old_value
self.__used = False
Backwards Compatibility
This proposal preserves 100% backwards compatibility.
Libraries that use threading.local()
to store context-related
values, currently work correctly only for synchronous code. Switching
them to use the proposed API will keep their behavior for synchronous
code unmodified, but will automatically enable support for
asynchronous code.
Appendix: HAMT Performance Analysis
.. figure:: pep-0550-hamt_vs_dict-v2.png :align: center :width: 100%
Figure 1. Benchmark code can be found here: [1]_.
The above chart demonstrates that:
HAMT displays near O(1) performance for all benchmarked dictionary sizes.
dict.copy()
becomes very slow around 100 items.
.. figure:: pep-0550-lookup_hamt.png :align: center :width: 100%
Figure 2. Benchmark code can be found here: [2]_.
Figure 2 compares the lookup costs of dict
versus a HAMT-based
immutable mapping. HAMT lookup time is 30-40% slower than Python dict
lookups on average, which is a very good result, considering that the
latter is very well optimized.
The reference implementation of HAMT for CPython can be found here: [3]_.
References
.. [1] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd
.. [2] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
.. [3] https://github.com/1st1/cpython/tree/hamt
Copyright
This document has been placed in the public domain.
.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
- Previous message (by thread): [Python-Dev] Last call for PEP approvals before the holidays
- Next message (by thread): [Python-Dev] PEP 567 -- Context Variables
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]