msg134839 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-04-30 06:43 |
The FreeBSD-AMD64 bot exhibits sporadic hanging in unspecific places. FreeBSD is running under kvm in the background. When the hanging occurs, the virtual machine uses 100% CPU and I can't log in via ssh, so I have to kill the kvm process. The fact that the ssh login fails if a user process is misbehaving seems like a FreeBSD/kvm issue to me. However, this problem did not occur when I set up the bot a couple of weeks ago. I've started a series of older revision builds to see if anything recent causes this. |
|
|
msg134890 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-04-30 23:15 |
> The FreeBSD-AMD64 bot exhibits sporadic hanging in unspecific places. You can try a shorter regrtest timeout, edit Lib/test/regrtest.py near: if hasattr(faulthandler, 'dump_tracebacks_later'): timeout = 60*60 (or use --timeout option of the regrtest.py program) If you have an access to a terminal (using ssh), you can also set a signal to dump the traceback: edit regrtest.py to add "import signal; faulthandler.register(signal.SIGUSR1, all_threads=True)" after "faulthandler.enable()". Then use "kill -USR1 pid" to dump the traceback. Or the problem is an unlimited loop while dumping the traceback because of a timeout :-D In this case, disable the timeout using --timeout=0 option of regrtest.py. |
|
|
msg134901 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-05-01 06:03 |
Thanks Victor, I can try some of that. Could this also be a problem with the buildbot software or a networking problem? The Ubuntu PPC bot might have the same issue. Here the tests appear to be finished but the clean doesn't start: http://www.python.org/dev/buildbot/all/builders/PPC%20Ubuntu%203.1/builds/387/steps/test/logs/stdio http://www.python.org/dev/buildbot/all/builders/PPC%20Ubuntu%203.1/builds/387 |
|
|
msg134922 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2011-05-01 19:36 |
That might be another instance of this: http://thread.gmane.org/gmane.comp.python.devel/123698 You might want to bring this up on python-dev. |
|
|
msg134997 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-05-02 18:23 |
Going through the logs, this indeed looks like a buildbot software issue to me. I attach the logs that correspond to this incident: http://www.python.org/dev/buildbot/all/builders/AMD64%20FreeBSD%208.2%203.2/builds/85 After ... 2011-04-30 01:10:56+0200 [Broker,client] closing stdin 2011-04-30 01:10:56+0200 [Broker,client] using PTY: False ... normally you should see: ... [-] command finished with signal None, exit code 0, elapsedTime: But there is nothing until I restarted the bot. |
|
|
msg135084 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-05-03 22:15 |
Another instance: 2011-05-03 20🔞08+0200 [Broker,client] closing stdin 2011-05-03 20🔞08+0200 [Broker,client] using PTY: False 2011-05-03 20:20:38+0200 [-] sending app-level keepalive Again this is missing: ... [-] command finished with signal None, exit code 0, elapsedTime: Also, as we speak the Ubuntu PPC bot is hanging as well: http://www.python.org/dev/buildbot/all/builders/PPC%20Ubuntu%202.7/builds/386/steps/test/logs/stdio Antoine, do you have access to the server logs for the relevant times? My bot is on CEST. |
|
|
msg135085 - (view) |
Author: Barry A. Warsaw (barry) *  |
Date: 2011-05-03 22:40 |
My Ubuntu PPC server is having hardware problems. It will just intermittently shut off. I've reset the SMU and the PRAM, vacuumed out the guts, reseated the RAM, pulled any possibly problematic 3rd party boards, and it still crashes. I was watching the syslog and it didn't look like a thermal shutdown, though it acted like that. The only thing I can think of is a power supply problem, so I'm going to see if I can find an inexpensive replacement. In the meantime, this machine will be offline for a couple of weeks at least. |
|
|
msg135174 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-05-05 07:10 |
The FreeBSD bot had these error messages in the log files: 1) kernel: swap_pager: indefinite wait buffer: device 2) Approaching the limit on PV entries, consider increasing either the vm.pmap.shpgperproc or the vm.pmap.p v_entry_max sysctl. I set up the bot from scratch with these changes: a) Use swap partition (2GB) instead of swap file (2 GB). b) Use these sysctls: kern.ipc.shm_use_phys=1 vm.pmap.shpgperproc=4096 vm.pmap.pv_entry_max=16777216 c) Use self-compiled Python2.7 instead of the system Python2.6. Let's see how that works out. Error 1) is bad, perhaps FreeBSD does not play well with the qcow2 file system under high load. |
|
|
msg135175 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-05-05 07:36 |
On second thought, I don't want to debug possible qcow2 issues, so I made another change: d) Use raw format for the image. |
|
|
msg135421 - (view) |
Author: Stefan Krah (skrah) *  |
Date: 2011-05-07 09:06 |
I think the FreeBSD bot changes are working out fine. The Ubuntu-PPC issues were unrelated, so I'm closing this. |
|
|