Message 291467 - Python tracker (original) (raw)

In trip_signal [1], the logic goes:

  1. set the flag saying that this particular signal was tripped
  2. write to the wakeup fd
  3. set the global is_tripped flag saying "at least one signal was tripped", and do Py_AddPendingCall (which sets some global flags that the bytecode interpreter checks on every pass through the loop)

So the problem here is that it's step (2) that wakes up the main thread to check for signals, but it's step (3) that actually arranges for the Python-level signal handler to run. (Step (1) turns out to be irrelevant, because no-one looks at the per-signal flags unless the global is_tripped flag is set. This might be why no-one noticed this bug through code inspection though – I certainly missed it, despite explicitly checking for it several times!)

The result is that the following sequence of events is possible:

It turns out that this is a real thing that can actually happen; it's causing an annoying intermittent failure in the trio testsuite on appveyor; and under the correct conditions I can reproduce it very reliably in my local Windows VM. See [2].

I think the fix is just to swap the order of steps (2) and (3), so we write to the wakeup fd last. Unfortunately I can't easily test this because I don't have a way to build CPython on Windows. But [2] has some IMHO pretty compelling evidence that this is what's happening.

[1] https://github.com/python/cpython/blob/6fab78e9027f9ebd6414995580781b480433e595/Modules/signalmodule.c#L238-L291 [2] https://github.com/python-trio/trio/issues/119