[Python-Dev] Timeout for PEP 550 (original) (raw)

[Python-Dev] Timeout for PEP 550 / Execution Context discussion

Guido van Rossum guido at python.org
Wed Oct 18 13:06:24 EDT 2017


On Tue, Oct 17, 2017 at 9:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

On 18 October 2017 at 05:55, Yury Selivanov <yselivanov.ml at gmail.com> wrote:

I actually like what you did in https://github.com/gvanrossum/pep550/blob/master/simpler.py, it seems reasonable. The only thing that I'd change is to remove "setctx" from the public API and add "Context.run(callable)". This makes the API more flexible to potential future changes and amendments.

Yep, with that tweak, I like Guido's suggested API as well.

I've added the suggested Context.run() method.

Attempting to explain why I think we want "Context.run(callable)" rather "contextvars.setctx()" by drawing an analogy to thread local storage: 1. In C, the compiler & CPU work together to ensure you can't access another thread's thread locals.

But why is that so important? I wouldn't recommend doing it, but it might be handy for a debugger to be able to inspect a thread's thread-locals. As it is, it seems a debugger can only access thread-locals for the thread in which the debugger itself runs. It has better access to the real locals on the thread's stack of frames!

2. In Python's thread locals API, we do the same thing: you can only get access to the running thread's thread locals, not anyone else's

But there's no real benefit in this. In C, I could imagine a compiler optimizing access to thread-locals, but in Python that's moot.

At the Python API layer, we don't expose the ability to switch explicitly to another thread state while remaining within the current function. Instead, we only offer two options: starting a new thread, and waiting for a thread to finish execution. The lifecycle of the thread local storage is then intrinsically linked to the lifecycle of the thread it belongs to.

To me this feels more a side-effect of the implementation (perhaps inherited from C's implementation) than an intentional design.

To be clear, I think it's totally fine for clients of the ContextVar API -- e.g. numpy or decimal -- to assume that their context doesn't change arbitrarily while they're happily executing in a single frame or calling stuff they trust not to change the context. (IOW all changes to a particular ContextVar would be through that ContextVar object, not through behind-the-scenes manipulation of the thread's current context).

But for frameworks (e.g. asyncio or Twisted) I find it simpler to think about the context in terms of set_ctx and get_ctx, and I worry that hiding these might block off certain API design patterns that some framework might want to use -- who knows, maybe Nathaniel (who is fond of with <http://trio.readthedocs.io/en/latest/reference-core.html#a-simple-timeout-example>) might come up with a context manager to run a block of code in a different context (perhaps cloned from the current one).

That intrinsic link makes various aspects of thread local storage easier to reason about, since the active thread state can't change in the middle of a running function - even if the current thread gets suspended by the OS, resuming the function also implies resuming the original thread.

I don't feel reasoning would be much impaired. When reasoning about code we make assumptions that are theoretically unsafe all the time (e.g. "nobody will move the clock back").

Including a "contextvars.setctx" API would be akin to making PyThreadStateSwap a public Python-level API, rather than only exposing thread.startnewthread the way we do now.

It's different for threads, because they are the bedrock of execution, and nobody is interested in implementing their own threading framework that doesn't build on this same bedrock.

One reason we don't do that is because it would make thread locals much harder to reason about - every function call could have an implicit side effect of changing the active thread state, which would mean the thread locals at the start of the function could differ from those at the end of the function, even if the function itself didn't do anything to change them.

Hm. Threads are still hard to reason about, because for everything but thread-locals there is always the possibility that it's being mutated by another thread... So I don't think we should get our knickers twisted over thread-local variables.

Only offering Context.run(callable) provides a similar "the only changes to the execution context will be those this function, or a function it called, explicitly initiates" protection for context variables, and Guido's requested API simplifications make this aspect even easier to reason about: after any given function call, you can be certain of being back in the context you started in, because we wouldn't expose any Python level API that allowed an execution context switch to persist beyond the frame that initiated it.

And as long as you're not calling something that's a specific framework's API for messing with the context, that's a fine assumption. I just don't see the need to try to "enforce" this by hiding the underlying API. (Especially since I presume that at the C API level it will still be possible -- else how would Context.run() itself be implemented?)

====

The above is my main rationale for preferring contextvars.Context.run() to contextvars.setctx(), but it's not the only reason I prefer it. At a more abstract design philosophy level, I think the distinction between symmetric and asymmetric coroutines is relevant here [2]: * in symmetric coroutines, there's a single operation that says "switch to running this other coroutine" * in asymmetric coroutines, there are separate operations for starting or resuming coroutine and for suspending the currently running one Python's native coroutines are asymmetric - we don't provide a "switch to this coroutine" primitive, we instead provide an API for starting or resuming a coroutine (via cr.next(), cr.send() & cr.throw()), and an API for suspending one (via await). The contextvars.setctx() API would be suitable for symmetric coroutines, as there's no implied notion of parent context/child context, just a notion of switching which context is active. The Context.run() API aligns better with asymmetric coroutines, as there's a clear distinction between the parent frame (the one initiating the context switch) and the child frame (the one running in the designated context).

Sure. But a framework might build something different.

As a practical matter, Context.run also composes nicely (in combination with functools.partial) for use with any existing API based on submitting functions for delayed execution, or execution in another thread or process:

- sched - concurrent.futures - arbitrary callback APIs - method based protocols (including iteration) By contrast, "contextvars.setctx" would need various wrappers to handle correctly reverting the context change, and would hence be prone to "changed the active context without changing it back" bugs (which can be especially fun when you're dealing with a shared pool of worker threads or processes).

So let's have both.

Cheers,

Nick.

[1] Technically C extensions can play games with this via PyThreadStateSwap, but I'm not going to worry about that here [2] https://stackoverflow.com/questions/41891989/what-is- the-difference-between-asymmetric-and-symmetric-coroutines -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

-- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20171018/0cc5b348/attachment.html>



More information about the Python-Dev mailing list