Issue 1183: race in SocketServer.ForkingMixIn.collect_children (original) (raw)

Created on 2007-09-20 19:35 by dripton, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg56065 - (view) Author: David Ripton (dripton) Date: 2007-09-20 19:35
CentOS Linux 5, Python 2.4.3 (but code appears unchanged in 2.5 and trunk, so I don't believe this bug has already been fixed) We have an xmlrpc server that subclasses DocXMLRPCServer.DocXMLRPCServer and SocketServer.ForkingMixIn. Under load, it sometimes crashes with an error in SocketServer.ForkingMixIn.collect_children The bug is that collect_children calls os.waitpid with pid 0, so it waits for any child. But then it assumes that the pid found was in the list self.active_children, and attempts to remove it from that list without a try block. However, another call to collect_children could have already removed it, so we get "ValueError: list.remove(x): x not in list" The fix is just adding a try/except block around the attempt to remove pid from self.active children. diff -u SocketServer.py /tmp/SocketServer.py --- SocketServer.py 2007-08-27 10:52:24.000000000 -0400 +++ /tmp/SocketServer.py 2007-09-20 15:34:00.000000000 -0400 @@ -421,7 +421,10 @@ except os.error: pid = None if not pid: break - self.active_children.remove(pid) + try: + self.active_children.remove(pid) + except ValueError: + pass def process_request(self, request, client_address): """Fork a new subprocess to process the request."""
msg56892 - (view) Author: Ralf Schmitt (schmir) Date: 2007-10-28 19:22
I've had the exact same error - but only when I used a subclass of XMLRPCServer, which installed signal handlers for SIGCHLD (which then called collect_children). Does your code install such a signal handler? (I found mine somewhere on the web). Do you start any subprocesses?
msg56903 - (view) Author: David Ripton (dripton) Date: 2007-10-29 12:59
No signal handler. Yes, we run subprocesses. I don't believe either is necessary to trigger the race condition, though.
msg62042 - (view) Author: David Ripton (dripton) Date: 2008-02-04 15:20
Just noticed that this is a partial duplicate of issue 1540386.
msg63100 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008-02-28 18:23
Hmm. I think the race can only happen if you call collect_children() concurrently from multiple threads or from a signal handler. The waidpid(0) bug (which affected anyone who spawned subprocesses from anything other than ForkingMixIn) is partly fixed by r61106, but I don't intend to make ForkingMixIn thread- or signal-safe. Let me know if this is enough for you. :)
msg63124 - (view) Author: David Ripton (dripton) Date: 2008-02-29 03:24
The "if pid not in self.active_children: continue" check that was added in r61106 appears to fix the bug, so I'm happy. Thanks.
History
Date User Action Args
2022-04-11 14:56:27 admin set github: 45524
2008-02-29 05:26:30 jyasskin set status: pending -> closed
2008-02-29 03:24:28 dripton set messages: +
2008-02-28 18:23:08 jyasskin set status: open -> pendingresolution: fixedmessages: + nosy: + jyasskinversions: + Python 2.6
2008-02-04 15:20:42 dripton set messages: + versions: + Python 2.5
2007-10-29 12:59:31 dripton set messages: +
2007-10-28 19:22:15 schmir set nosy: + schmirmessages: +
2007-09-21 02:16:16 jafo set priority: normalkeywords: + patch
2007-09-20 19:35:42 dripton create