[Python-Dev] Usage of the multiprocessing API and object lifetime (original) (raw)
Victor Stinner vstinner at redhat.com
Tue Dec 11 09:21:31 EST 2018
- Previous message (by thread): [Python-Dev] windows compiler list missing 3.7 details on wiki
- Next message (by thread): [Python-Dev] Usage of the multiprocessing API and object lifetime
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
tzickel reported a reference cycle bug in multiprocessing which keeps threads and processes alive:
https://bugs.python.org/issue34172
He wrote a fix which has been merged in 3.6, 3.7 and master branches. But Pablo Galindo noticed that the fix breaks the following code (he added "I found the weird code in the example in several projects."):
import multiprocessing
def the_test():
print("Begin")
for x in multiprocessing.Pool().imap(int,
["4", "3"]):
print(x)
print("End")
the_test()
Pablo proposed to add a strong reference to the Pool from multiprocessing iterators: https://bugs.python.org/issue35378
I blocked his pull request because I see this change as a risk of new reference cycles. Since we are close to 3.6 and 3.7 releases, I decided to revert the multiprocessing fix instead.
Pablo's issue35378 evolved to add a weak reference in iterators to try to detect when the Pool is destroyed: raise an exception from the iterator, if possible.
Then a discussion started on how the multiprocessing API is supposed to be used and about the lifetime of multiprocessing objects.
I would prefer to make the multiprocessing API stricter: Python shouldn't try to guess how long an object is going to live. The API user has to explicitly release resources.
tzickel noted that the documentations says:
"When the pool object is garbage collected terminate() will be called immediately."
And that multiprocessing rely on the garbage collector to release resources, especially using multiprocessing.util.Finalize tool:
class Finalize(object):
'''
Class which supports object finalization using weakrefs
'''
def __init__(self, obj, callback, ...):
...
if obj is not None:
self._weakref = weakref.ref(obj, self)
else:
assert exitpriority is not None
...
_finalizer_registry[self._key] = self
I propose to start to emit ResourceWarning in Python 3.8 when objects are not released explicitly. I wrote a first change to emit ResourceWarning in the Pool object:
https://bugs.python.org/issue35424 https://github.com/python/cpython/pull/10974
By the way, I'm surprised that "with pool:" doesn't release all resources. An additional "pool.join()" is needed to ensure that all resources are released. It's a little bit surprising to have to emit a ResourceWarning if join() has not been called, even if the code uses "with pool:".
I don't know well the multiprocessing API, so I'm not sure in which directions we should go: best effort to support strange legacy code with "implicit object lifetime", or become stricter in Python 3.8?
From a technical point of view, I would prefer to become stricter. Relying on the garbage collector means that the code is going to behave badly on PyPy which uses a different garbage collector implementation :-(
Victor
Night gathers, and now my watch begins. It shall not end until my death.
- Previous message (by thread): [Python-Dev] windows compiler list missing 3.7 details on wiki
- Next message (by thread): [Python-Dev] Usage of the multiprocessing API and object lifetime
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]