Prometheus metrics blocks tornado main thread · Issue #123 · jupyter-server/jupyter-resource-usage (original) (raw)

Description

A bug was reported in jtpio/jupyterlab-system-monitor#87 about the UI lagging with several kernels running. The issue was traced to the system monitor extension as disabling that extension while keeping the same load on the system made the UI issue go away.

Reproduce

Create multiple notebooks with contents:

import time

i = 0
while True:
    print(f"i={i}")
    i += 1
    time.sleep(1)
Run 4+ kernels all executing this cell.

Open a terminal and (hopefully your key repeat speed is high enough) hold down a character e.g. "x" to get continuous input into the terminal. This should be very smooth, you should see characters appearing rapidly and without pause.

Now relaunch the server with --ResourceUseDisplay.track_cpu_percent=True.

Repeat the process. While holding down a key in the terminal you will notice frequent lags and pauses.

Expected behavior

The UI does not lag with the extension enabled.

Problem

The API handler does the right thing by running the call to psutil on a separate thread: https://github.com/jupyter-server/jupyter-resource-usage/blob/master/jupyter_resource_usage/api.py#L66

However the prometheus metrics uses a different implementation (why?) and does the same expensive operation on the main tornado thread which blocks other calls: https://github.com/jupyter-server/jupyter-resource-usage/blob/master/jupyter_resource_usage/metrics.py#L40

You can prove this is the root cause by simply disabling this and the following lines: https://github.com/jupyter-server/jupyter-resource-usage/blob/master/jupyter_resource_usage/server_extension.py#L22

When this callback is removed the UI no longer lags every second.