Design: alternative scheduling models · Issue #32 · python-trio/trio (original) (raw)
Currently we use a simple round-robin scheduling strategy. It isn't technically FIFO, because we randomize the order of execution within each "batch" in an attempt to force people not to make too many assumptions about the scheduling strategy :-). But basically FIFO in its sophistication.
Can we do better?
There's a rich literature on fancy scheduling, e.g. the weighted-fair queuing used by the Linux kernel's CFS ("completely fair scheduler"). And there are a lot of challenges in building async applications that come down to scheduling issues (e.g., #14 and https://stackoverflow.com/questions/40601119). But AFAICT a lot of the existing scheduling literature assumes that you have a pre-emptive scheduler; there's very little out there on applying these ideas to cooperative scheduling systems. in some ways the network packet scheduling literature is more relevant to us, because packets come in different sizes and the scheduler doesn't get to change that. (OTOH, packet scheduling algorithms tend to assume you can look into the future and predict how long a packet will spend transmitting, whereas when we start a task step we don't know how long it will run before yielding.)
Does it matter though?
It's not entirely clear that "fairness" is actually the solution to the problems linked above! Fair task scheduling is in part about negotiating between somewhat adversarial users (e.g. different TCP connections fighting for their share of a link), which isn't as obvious a fit to the tasks in our system. Though OTOH those tasks may be doing work on behalf of different competing users. And scheduler algorithms only matter when a program is CPU-bound, which hopefully trio programs usually aren't. But OTOH even programs that are I/O-bound overall will become CPU-bound in bursts, and it's exactly when things are melting down under load that you'd most like to handle things gracefully. OTOOH the real solution is often going to be something like load shedding or otherwise rate-limiting incoming work; better scheduling isn't a silver bullet to fix load issues.
So, I really don't know whether this is actually a good/useful idea for trio or not. But it might be a place where we can make a substantial difference to trio programs' usability in a central, principled way, so that seems worth exploring!
Options
In principle, a WFQ scheduler is actually very simple to implement (see below). What's not so clear is whether this would actually help in practice. There are two obvious issues:
- If some task hogs the CPU for 100 ms, then there is no scheduling policy that can possibly avoid this causing a 100 ms spike for everyone else. You can get amortized fairness in the long run by exiling that task from the CPU for a comparatively long time afterwards, but the end result is still going to be jittery and spiky. Conclusion: nothing we do at the scheduler level is going to have a great effect unless user code is written to yield with a good frequency. I guess the solution here is to make sure we have great tools for profiling user code and warning of CPU hogs. The instrumentation API is a good start.
- In a cooperative scheduler, tasks run until they block and become non-runnable. In pure WFQ, the scheduler only cares about runnable tasks; a task that blocks effectively disappears, and when it becomes runnable again it's treat like a new task with zero history. This is okay in a pre-emptive scheduler where tasks don't have a chance to "overdraw" CPU time, but doesn't make much sense for us. For us the way you treat a task that wakes from sleep is the whole question.
Maybe what we really need is better knobs to let users set the priority of different tasks. (WFQ makes it easy to implement relative priorities, and relatively easy to implement hierarchical scheduling policies; it's also pretty straightforward to implement strictly tiered priorities like the Linux kernel's realtime priorities.) But "here's 100 knobs to tune, try tweaking some whenever your server performance degrades" is a pretty bad UX too. There's something to be said for the predictability of FIFO.
In general, this is a deep problem that will definitely require experience and data and visualizations of real apps on top of trio.
Prior art
Ironically, the Python GIL is essentially a cooperative scheduling system (though a weird one), and probably the one with the best public analysis! Interesting documents:
- http://www.dabeaz.com/python/UnderstandingGIL.pdf / https://www.youtube.com/watch?v=Obt-vMVdM8s
- https://mail.python.org/pipermail/python-dev/2009-October/093321.html
- https://bugs.python.org/issue7946
- http://www.dabeaz.com/blog/2010/02/revisiting-thread-priorities-and-new.html
The classic Mac OS used cooperative scheduling heavily. This was known as the "thread manager". I'm very curious what their scheduling strategy looked like, but I don't know how well documented it is. The "Inside Macintosh: Thread Manager" book (PDFs are floating around on the web) might have more details. [Edit: there's now some notes on this on the reading list wiki page.]
Alternatively...
Alternatively, if we decide that we don't want a fancy scheduling system, we could go the other way, and actually guarantee deterministic FIFO scheduling, on the theory that determinism is generally a nice thing if you aren't losing anything else to get it.