[2.7] bpo-34110: Fix module load problem of cPickle in case of multithreading by sangongs · Pull Request #8276 · python/cpython (original) (raw)

I now think you're right, so posting the argument where I was attempting to convince you I was right, but ended up convincing myself that you're right instead, and I had just misunderstood your proposal.

So +1 from me.


The import system adds the sys.modules entry before starting the module execution, but doesn't add the attribute on the parent package until the module execution finishes:

$ mkdir -p _example
$ echo "import _example; print(dir(_example))" > _example/parent_keys.py
$ python3 -c "import _example.parent_keys; print(dir(_example))"
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__']
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'parent_keys']

In current Python 2.7 code, the pickle module can access _example.parent_keys while it is being imported. If the keys it needs are already there, that's a good thing. If they're not there, it's a problem.

Illustrative example where I believe being unable to access a partially imported module could end up being problematic:

parent_pkg.module_a.py:

import pickle
import module_b
class ExampleClass(object):
    pass
# This works because module_a.ExampleClass already exists
module_b.send_data(pickle.dumps(ExampleClass()))

module_b.py:

import pickle
_received_objects = []

def send_data(data):
    _received_objects.append(pickle.loads(data))

As written, that code would be fine regardless of how the locking works, since the unpickle call happens with the import lock for module_a held no matter what.

In Python 2.7, it also works even if the unpickling happens in another thread, since pickle doesn't try to acquire the import lock, and the required module attributes are available before the potential unpickle is invoked.

In Python 3.7, there's technically a chance to deadlock with another thread, but only if module_a specifically is waiting for that thread to finish (which seems spectacularly unlikely).

With a global import lock, though, there would be many more ways to end up in a situation where the pickle.loads call ran with module_a was still being imported (so it fell through to the "try to acquire the import lock" code).

However, the key to your proposal is that the cases like my hypothetical one above won't fall through to the import lock, because the attributes they need will already be there. It's only the motivating cases that currently raise AttributeError that will risk deadlocking (but probably won't deadlock in practice, since they probably won't be implicitly waiting for any other threads during the import).