Issue 21901: test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell (original) (raw)

Created on 2014-07-01 22:40 by r.david.murray, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (14)
msg222059 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-07-01 22:40
On one particular linux vserver virtual machine (which is unfortunately my development platform for python), test.test_selectors.PollSelectorTestCase.test_above_fd_setsize fails with the following message: zsh: killed and at that point the test suite stops running, regardless of whether or not I started it with -j. As far as I can tell, the configuration of this vserver is the same as the one my buildbots run on, but they are on different host machines, so there could be some differences I'm not remembering. On the buldbots, the test gets skipped with the message 'FD limit reached'. Anyone have any clues how to debug this?
msg222062 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-07-02 00:20
The test changes the maximum number of open files. What is the limit in your shell? You can try to modify the test to add print(soft, hard) after getrlimit(). On Fedora 20: $ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' (1024, 4096) The test tries to use the hard limit (4096) to set the soft limit (1024).
msg222075 - (view)	Author: Charles-François Natali (neologix) *	Date: 2014-07-02 06:50
There's probably a special mechanism due to vserver which makes the kernel kill the process instead of failing with EPERM, but it's really surprising. What happens if you try the following: $ python -c "from resource import *; _, hard = getrlimit(RLIMIT_NOFILE); setrlimit(RLIMIT_NOFILE, (hard, hard))" You could run the process under strace to see what's going on: you'll likely just see the reception of a signal though. Maybe "dmesg" would show interesting logs.
msg222534 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-07-07 22:55
ping?
msg222951 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-07-13 16:29
The python command just returns. The dmesg was a good call: python invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0 python cpuset=pydev mems_allowed=0 [...] Out of memory: kill process python(28623:#112) score 85200 or a child Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB I thought I had this virtual server configured with the same resources as I do the buildbots, but I could be wrong. It's been quite some time since I set both of them up, and I don't even remember how the resources are set at the moment. Let me know if you want to see the entire dmesg output.
msg223002 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-07-14 08:36
> Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB 340 MB to run test_selectors sounds high. What is the value of NUM_FDS? And what is the result of this command in your vserver? $ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' (1024, 4096)
msg223165 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-07-16 01:25
rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' (1024L, 1048576L) Unfortunately the buildbot box is offline at the moment and it may be a bit before I can get it back, so I can't compare the results above with that VM.
msg223181 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-07-16 08:20
> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' > (1024L, 1048576L) Oh, 1 million files is much bigger than 4 thousand files (4096). The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE: # A scalable implementation should have no problem with more than # FD_SETSIZE file descriptors. Since we don't know the value, we just # try to set the soft RLIMIT_NOFILE to the hard RLIMIT_NOFILE ceiling. For example, on my Linux FD_SETSIZE is 1024, whereas the hard limit of RLIMIT_NOFILE is 4096. /usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024 Maybe we can simply expose the FD_SETSIZE constant in the select module? The constant is useful when you use select.select(), which is still heavily used on Windows.
msg223563 - (view)	Author: Charles-François Natali (neologix) *	Date: 2014-07-21 07:08
>> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' >> (1024L, 1048576L) > > Oh, 1 million files is much bigger than 4 thousand files (4096). > > The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE: We could cap it to let's say 2**16, it's larger than any possible FD_SETSIZE (which are usually low since fd_set are often allocated on the stack and select() doesn't scale well behind that anyway). But I don't see anything wrong with the test, it's really the buildbot setting which is to blame: I expect other tests to fail with such a low max virtual memory.
msg223571 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-07-21 10:58
That is the only test that fails for lack of memory. And it's not the buildbot, it's my development virtual machine. Having the test suite be killed when I do a full test run is...rather annoying.
msg223573 - (view)	Author: Charles-François Natali (neologix) *	Date: 2014-07-21 11:17
Alright, I'll cap the value then (no need to expose FD_SETSIZE).
msg223691 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-07-22 20:30
New changeset 7238c6a05ca6 by Charles-François Natali in branch '3.4': Issue #21901: Cap the maximum number of file descriptors to use for the test. http://hg.python.org/cpython/rev/7238c6a05ca6 New changeset 89665cc05592 by Charles-François Natali in branch 'default': Issue #21901: Cap the maximum number of file descriptors to use for the test. http://hg.python.org/cpython/rev/89665cc05592
msg223696 - (view)	Author: Charles-François Natali (neologix) *	Date: 2014-07-22 20:52
Sorry for the delay, should be fixed now.
msg224088 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2014-07-26 21:56
Test passes for me now, thanks.

History
Date	User	Action	Args
2022-04-11 14:58:05	admin	set	github: 66100
2014-07-26 21:56:32	r.david.murray	set	messages: +
2014-07-22 20:52:49	neologix	set	status: open -> closedresolution: fixedmessages: + stage: resolved
2014-07-22 20:30:30	python-dev	set	nosy: + python-devmessages: +
2014-07-21 11:17:39	neologix	set	messages: +
2014-07-21 10:58:45	r.david.murray	set	messages: +
2014-07-21 07:08:37	neologix	set	messages: +
2014-07-16 08:20:13	vstinner	set	messages: +
2014-07-16 01:25:53	r.david.murray	set	messages: +
2014-07-14 08:36:32	vstinner	set	messages: +
2014-07-13 16:29:05	r.david.murray	set	messages: +
2014-07-07 22:55:22	vstinner	set	messages: +
2014-07-02 06:50:57	neologix	set	messages: +
2014-07-02 00:20:54	vstinner	set	messages: +
2014-07-01 22:47:45	r.david.murray	set	title: test_selectors.PollSelectorTestCase.test_above_fd_setsize killed by shell -> test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell
2014-07-01 22:40:03	r.david.murray	create