Issue 21901: test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell (original) (raw)

Created on 2014-07-01 22:40 by r.david.murray, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (14)
msg222059 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-01 22:40
On one particular linux vserver virtual machine (which is unfortunately my development platform for python), test.test_selectors.PollSelectorTestCase.test_above_fd_setsize fails with the following message: zsh: killed and at that point the test suite stops running, regardless of whether or not I started it with -j. As far as I can tell, the configuration of this vserver is the same as the one my buildbots run on, but they are on different host machines, so there could be some differences I'm not remembering. On the buldbots, the test gets skipped with the message 'FD limit reached'. Anyone have any clues how to debug this?
msg222062 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-02 00:20
The test changes the maximum number of open files. What is the limit in your shell? You can try to modify the test to add print(soft, hard) after getrlimit(). On Fedora 20: $ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' (1024, 4096) The test tries to use the hard limit (4096) to set the soft limit (1024).
msg222075 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-02 06:50
There's probably a special mechanism due to vserver which makes the kernel kill the process instead of failing with EPERM, but it's really surprising. What happens if you try the following: $ python -c "from resource import *; _, hard = getrlimit(RLIMIT_NOFILE); setrlimit(RLIMIT_NOFILE, (hard, hard))" You could run the process under strace to see what's going on: you'll likely just see the reception of a signal though. Maybe "dmesg" would show interesting logs.
msg222534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-07 22:55
ping?
msg222951 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-13 16:29
The python command just returns. The dmesg was a good call: python invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0 python cpuset=pydev mems_allowed=0 [...] Out of memory: kill process python(28623:#112) score 85200 or a child Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB I *thought* I had this virtual server configured with the same resources as I do the buildbots, but I could be wrong. It's been quite some time since I set both of them up, and I don't even remember how the resources are set at the moment. Let me know if you want to see the entire dmesg output.
msg223002 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-14 08:36
> Killed process python(28623:#112) vsz:340800kB, anon-rss:330764kB, file-rss:3864kB 340 MB to run test_selectors sounds high. What is the value of NUM_FDS? And what is the result of this command in your vserver? $ python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' (1024, 4096)
msg223165 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-16 01:25
rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' (1024L, 1048576L) Unfortunately the buildbot box is offline at the moment and it may be a bit before I can get it back, so I can't compare the results above with that VM.
msg223181 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-07-16 08:20
> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' > (1024L, 1048576L) Oh, 1 million files is much bigger than 4 thousand files (4096). The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE: # A scalable implementation should have no problem with more than # FD_SETSIZE file descriptors. Since we don't know the value, we just # try to set the soft RLIMIT_NOFILE to the hard RLIMIT_NOFILE ceiling. For example, on my Linux FD_SETSIZE is 1024, whereas the hard limit of RLIMIT_NOFILE is 4096. /usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024 Maybe we can simply expose the FD_SETSIZE constant in the select module? The constant is useful when you use select.select(), which is still heavily used on Windows.
msg223563 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-21 07:08
>> rdmurray@pydev:~/python/p34>python -c 'import resource; print(resource.getrlimit(resource.RLIMIT_NOFILE))' >> (1024L, 1048576L) > > Oh, 1 million files is much bigger than 4 thousand files (4096). > > The test should only test FD_SETSIZE + 10 files, the problem is to get FD_SETSITE: We could cap it to let's say 2**16, it's larger than any possible FD_SETSIZE (which are usually low since fd_set are often allocated on the stack and select() doesn't scale well behind that anyway). But I don't see anything wrong with the test, it's really the buildbot setting which is to blame: I expect other tests to fail with such a low max virtual memory.
msg223571 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-21 10:58
That is the only test that fails for lack of memory. And it's not the buildbot, it's my development virtual machine. Having the test suite be killed when I do a full test run is...rather annoying.
msg223573 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-21 11:17
Alright, I'll cap the value then (no need to expose FD_SETSIZE).
msg223691 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-07-22 20:30
New changeset 7238c6a05ca6 by Charles-François Natali in branch '3.4': Issue #21901: Cap the maximum number of file descriptors to use for the test. http://hg.python.org/cpython/rev/7238c6a05ca6 New changeset 89665cc05592 by Charles-François Natali in branch 'default': Issue #21901: Cap the maximum number of file descriptors to use for the test. http://hg.python.org/cpython/rev/89665cc05592
msg223696 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-22 20:52
Sorry for the delay, should be fixed now.
msg224088 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-26 21:56
Test passes for me now, thanks.
History
Date User Action Args
2022-04-11 14:58:05 admin set github: 66100
2014-07-26 21:56:32 r.david.murray set messages: +
2014-07-22 20:52:49 neologix set status: open -> closedresolution: fixedmessages: + stage: resolved
2014-07-22 20:30:30 python-dev set nosy: + python-devmessages: +
2014-07-21 11:17:39 neologix set messages: +
2014-07-21 10:58:45 r.david.murray set messages: +
2014-07-21 07:08:37 neologix set messages: +
2014-07-16 08:20:13 vstinner set messages: +
2014-07-16 01:25:53 r.david.murray set messages: +
2014-07-14 08:36:32 vstinner set messages: +
2014-07-13 16:29:05 r.david.murray set messages: +
2014-07-07 22:55:22 vstinner set messages: +
2014-07-02 06:50:57 neologix set messages: +
2014-07-02 00:20:54 vstinner set messages: +
2014-07-01 22:47:45 r.david.murray set title: test_selectors.PollSelectorTestCase.test_above_fd_setsize killed by shell -> test_selectors.PollSelectorTestCase.test_above_fd_setsize reported killed by shell
2014-07-01 22:40:03 r.david.murray create