[Python-Dev] Stable buildbots (original) (raw)

Trent Nelson trent at snakebite.org
Tue Nov 23 09:40:50 CET 2010


On 14-Nov-10 3:48 AM, David Bolen wrote:

This is a completely separate issue, though probably around just as long, and like the popup problem its frequency changes over time. By "hung" here I'm referring to cases where something must go wrong with a test and/or its cleanup such that a pythond process remains running, usually several of them at the same time.

My guess: the "hung" (single-threaded) Python process has called select() without a timeout in order to wait for some data. However, the data never arrives (due to a broken/failed test), and the select() never returns.

On Windows, processes seem harder to kill when they get into this state. If I purposely wedge a Windows process via select() via the interactive interpreter, ctrl-c has absolutely no effect (whereas on Unix, ctrl-c will interrupt the select()).

As for why kill_python.exe doesn't seem to be able to kill said wedged processes, the MSDN documentation on TerminateProcess1 states the following:

The terminated process cannot exit until all
pending I/O has been completed or canceled. (sic)

It's not unreasonable to assume a wedged select() constitutes pending I/O, so that's a possible explanation as to why kill_python.exe isn't able to terminate the processes.

(Also, kill_python currently assumes TerminateProcess() always works; perhaps this optimism is misplaced. Also note the XXX TODO regarding the fact that we don't kill processes that have loaded our python*.dll, but may not be named python_d.exe. I don't think that's the issue here, though.)

On 14-Nov-10 5:32 AM, David Bolen wrote:

"Martin v. Löwis"<martin at v.loewis.de> writes:

This is what kill_python.exe is supposed to solve. So I recommend to investigate why it fails to kill the hanging Pythons.

Yeah, I know, and I can't say I disagree in principle - not sure why Windows doesn't let the kill in that module work (or if there's an issue actually running it under all conditions).

At the moment though, I do know that using the sysinternals pskill utility externally (which is what I currently do interactively) definitely works so to be honest,

That's interesting. (That kill_python.exe doesn't kill the wedged processes, but pskill does.) kill_python is pretty simple, it just calls TerminateProcess() after acquiring a handle with the relevant PROCESS_TERMINATE access right. That being said, that's the recommended way to kill a process -- I doubt pskill would be going about it any differently (although, it is sysinternals... you never know what kind of crazy black magic it's doing behind the scenes).

Are you calling pskill with the -t flag? i.e. kill process and all dependents? That might be the ticket, especially if killing the child process that wedged select() is waiting on causes it to return, and thus, makes it killable.

Otherwise, if it happens again, can you try kill_python.exe first, then pskill, and confirm if the former fails but the latter succeeds?

Trent.


More information about the Python-Dev mailing list