Issue 26456: import _tkinter + TestForkInThread leaves zombie with stalled thread (original) (raw)

Created on 2016-02-29 06:29 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
child-exit.patch martin.panter,2016-03-01 05:48 review
Messages (8)
msg260993 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-29 06:29
After running the 2.7 test suite many times, my Linux OS’s memory slowly gets eaten up. It seems to be because of zombie Python processes that never get cleaned up unless I kill them explicitly. I never get this problem with the Python 3 test suite. I narrowed it down to running test_tcl followed by test_thread, and then narrowed it even further to importing _tkinter and running TestForkInThread.test_forkinthread(). Now I have it minimized to the following: $ ./python -c 'import _tkinter, thread, os; thread.start_new_thread(os.fork, ())' A process is left behind listed with the “defunct” or Z (zombie) status. However it has a child thread; maybe this is why it does not automatically get cleaned up. Extract from “htop”: PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 1 root 20 0 35412 4528 3448 S 0.0 0.2 0:01.25 /sbin/init 12615 vadmium 20 0 0 0 0 Z 0.0 0.0 0:00.00 ├─ python 12616 vadmium 20 0 142M 5952 2220 S 0.0 0.3 0:00.00 │ └─ ./python -c import _tkinter, thread, os; thread.start_new_thread(os.fork, ()) $ sudo strace -p 12616 Process 12616 attached - interrupt to quit select(4, [3], [], [], NULL^C <unfinished ...> Process 12616 detached $ ls -l /proc/12616/fd total 0 lrwx------ 1 vadmium users 64 Feb 29 05:57 0 -> /dev/pts/1 lrwx------ 1 vadmium users 64 Feb 29 05:57 1 -> /dev/pts/1 lrwx------ 1 vadmium users 64 Feb 29 05:57 2 -> /dev/pts/1 lr-x------ 1 vadmium users 64 Feb 29 05:57 3 -> pipe:[946176] lr-x------ 1 vadmium users 64 Feb 29 05:57 4 -> pipe:[946321] l-wx------ 1 vadmium users 64 Feb 29 05:57 5 -> pipe:[946176] $ pacman -Q systemd glibc systemd 222-1 glibc 2.22-4
msg261003 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-02-29 11:23
I should point out that I think this problem didn’t used to happen. As far as I know, it could be a bug in a recently upgraded glibc or something. On another Linux computer I cannot produce the problem. When I get a chance I will try upgrading packages to see if that triggers the problem.
msg261020 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2016-02-29 20:32
I suspect this may be what causes several of the 2.7 builders to fail. The ones that fail look like they complete successfully, but then sit with no output until buildbot kills them. For example: http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%202.7/builds/506/steps/compile/logs/stdio http://buildbot.python.org/all/builders/s390x%20Debian%202.7/builds/258/steps/test/logs/stdio http://buildbot.python.org/all/builders/s390x%20RHEL%202.7/builds/261/steps/test/logs/stdio http://buildbot.python.org/all/builders/s390x%20SLES%202.7/builds/264/steps/test/logs/stdio http://buildbot.python.org/all/builders/x86-64%20Ubuntu%2015.10%20Skylake%20CPU%202.7/builds/159/steps/test/logs/stdio In each of those cases, test_tcl runs before test_thread (except on the SLES builder; but test_tk also imports _tkinter and does come before test_thread). I can reproduce on Ubuntu 14.04.3, but not on a freshly updated Gentoo.
msg261038 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-01 05:48
Yes it looks like you might be right about those hanging buildbots. The occasional successes (e.g. <http://buildbot.python.org/all/builders/s390x%20Debian%202.7/builds/205/steps/test/logs/stdio>) seem to happen when test_thread runs before any of the TK, TCL, and Idle tests. The reason why this does not affect Python 3 is probably because the test only calls sys.exit() in Python 2; this code was added in r78527. In Python 3, the code was merged in revision 58c35495a934, but the code was apparently changed to call os._exit() at the same time. So one potential fix or workaround could be to change to os._exit() as in child-exit.patch. It seems Tcl_FindExecutable() creates a thread, and this thread survives fork(). (Perhaps it is re-created?) Python exiting does not cause this thread to be stopped. Playing with “strace” it seems the threads that return from fork() in the parent and child both finish with _exit(0). However the “main” thread in the parent finishes with exit_group(0), which is documented as terminating all threads. Calling os._exit() also seems to call exit_group(), which explains why that fixes the problem in the child. I can produce the problem in all versions of Python without using _tkinter, using the following code instead: import _thread, os, time def thread1(): pid = os.fork() if not pid: # In the child, the original main thread no longer exists. Start a # new thread that will stall for 60 s. _thread.start_new_thread(time.sleep, (60,)) _thread.start_new_thread(thread1, ()) time.sleep(2) # Give fork() a chance to run I’m not really sure, but maybe Python could improve its handling of this case, when fork() is called on a non-“main” thread and another thread is also running in the child process.
msg261178 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2016-03-03 21:30
I can confirm that child-exit.patch fixes the immediate issue, so I'm +1 on just committing it since it will make several buildbots useful again. Improving general handling of the situation can be done in a new issue. For the record, I agree that this seems to be a relatively recent phenomenon. I tried bisecting cpython to find a source for it (using `./python -m test.regrtest test_tcl test_thread`), but the bisect just came up with the first changeset that allows _tkinter to actually build. Perhaps Tcl_FindExecutable starting a thread is a new thing?
msg261184 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-04 08:20
But please add a reference to this issue.
msg261331 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-03-08 07:45
New changeset 613196986c09 by Martin Panter in branch '2.7': Issue #26456: Force all child threads to terminate in TestForkInThread https://hg.python.org/cpython/rev/613196986c09
msg261377 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-03-08 20:36
The change to the test seems to have the desired effect. The buildbots are no longer timing out (tests are failing for other reasons).
History
Date User Action Args
2022-04-11 14:58:28 admin set github: 70643
2016-03-08 20:36:35 martin.panter set status: open -> closedresolution: fixedmessages: + stage: commit review -> resolved
2016-03-08 07:45:16 python-dev set nosy: + python-devmessages: +
2016-03-04 08:20:01 serhiy.storchaka set messages: +
2016-03-03 21:30:02 zach.ware set assignee: martin.pantermessages: + stage: commit review
2016-03-01 05:48:57 martin.panter set files: + child-exit.patchkeywords: + patchmessages: +
2016-02-29 20:32:19 zach.ware set nosy: + zach.waremessages: +
2016-02-29 11:23:50 martin.panter set messages: +
2016-02-29 07:19:34 serhiy.storchaka set nosy: + pitrou, serhiy.storchaka
2016-02-29 06:29:11 martin.panter create