Issue 26534: subprocess.check_output with shell=True ignores the timeout (original) (raw)

Basically - subprocess.check_output("ping localhost", shell=True, timeout=5) Will hang forever.

What happens is that subprocess.run times out on communicate, sends SIGKILL to the shell and then calls communicate again without the timeout. But nothing actually closes the "ping" command so the second communicate will never exit.

Even if we put a timeout on the second communicate, it won't be good enough because that "ping" will still be "leaked".

This SO question is about the fact that kill doesn't work when shell=True, and might be relevant for this issue: http://stackoverflow.com/questions/4789837/how-to-terminate-a-python-subprocess-launched-with-shell-true

As a possible fix I propose doing the following:

Add killpg to the Popen class.
Add an argument to run - "kill_group" that makes run use killpg instead of kill
Users can all: subprocess.check_output("ping localhost", shell=True, start_new_session=True, kill_group=True, timeout=5)

And have it work fine. This is the best I could come up with, without breaking any existing behavior.

Other options to consider:

Not have kill_group argument and to implicitly kill by group when start_new_session is True (but this is not really backwards compatible).
Having kill_group as an argument for Popen and not run, and when kill is called, it will call killpg if the option was specified.

A patch is attached with my proposed solution.

I don’t know enough about process groups and sessions to properly review, but some things that stand out:

Patch is missing documentation and tests
If the “killpg” just wraps os.killpg, perhaps adding the method is not justified
Are there any race conditions with killing a process group that has already exited. When does a process group get freed and potentially reused (so you may kill the wrong group)? The “send_signal” method (and others) have a check to avoid signalling an unrelated process.
The method is called killpg, and the doc string mentions SIGKILL, but the implementation says SIGTERM
What happens if you use killpg without starting a new session? If it kills the parent process as well, that sounds like a source of subtle bugs that may only be detected in unexpected cases (e.g. Ctrl+C or timeout)
Be aware of Issue 25942. It is not clear what should happen to the child process(es) when the timeout happens, or when the “communicate” call is interrupted.
What platforms does this support and what happens if there is no platform support?