Issue 11891: Poll call in multiprocessing/forking.py is not thread safe. Results in "OSError: [Errno 10] No child processes" exceptions. (original) (raw)

Background:

I'm using multiprocessing not to run jobs in parallel, but to run functions in a different process space so they can be done as a different user. I am thus using multiprocessing in a multithreaded (Linux) application.

Problem:

In multiprocessing/forking.py the poll() function is not thread safe. If multiple threads call poll() you could have two back-to-back calls to os.waitpid() on the same PID (this happens frequently when multiprocessing's _cleanup() function is called).

Traceback (most recent call last): File "/opt/scyld/foo.py", line 178, in call pool = Pool(processes=1) File "/opt/scyld/python/2.6.5/lib/python2.6/multiprocessing/init.py", line 227, in Pool return Pool(processes, initializer, initargs) File "/opt/scyld/python/2.6.5/lib/python2.6/multiprocessing/pool.py", line 104, in init w.start() File "/opt/scyld/python/2.6.5/lib/python2.6/multiprocessing/process.py", line 99, in start _cleanup() File "/opt/scyld/python/2.6.5/lib/python2.6/multiprocessing/process.py", line 53, in _cleanup if p._popen.poll() is not None: File "/opt/scyld/python/2.6.5/lib/python2.6/multiprocessing/forking.py", line 106, in poll pid, sts = os.waitpid(self.pid, flag) OSError: [Errno 10] No child processes

Suggested Fix:

Wrap the os.waitpid() call in a try/except block looking for OSError 10 exceptions and return the returncode currently available in that event. The one potential problem this introduces is if someone calls os.waitpid() on that PID on the process without going through forking.py. This will result in self.returncode never being set to a non-None value. If you're using the multiprocessing module to create processes, however, you should be also using it to clean up after itself.

I've attached a test file.

Here's how I changed poll() in multiprocessing/forking.py:

    def poll(self, flag=os.WNOHANG):
        if self.returncode is None:
            try:
                pid, sts = os.waitpid(self.pid, flag)
            except OSError, e:
                if e.errno == 10:
                    return self.returncode
                else:
                    raise
            if pid == self.pid:
                if os.WIFSIGNALED(sts):
                    self.returncode = -os.WTERMSIG(sts)
                else:
                    assert os.WIFEXITED(sts)
                    self.returncode = os.WEXITSTATUS(sts)
        return self.returncode