bpo-30891: importlib _find_and_load() try/except by vstinner · Pull Request #2665 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation21 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

vstinner

importlib _find_and_load() now uses a "try/except KeyError" block to
check if the imported module is not sys.modules, rather than checking
for "name is sys.modules" and then get it with sys.modules[name].

This change should prevent a race condition.

@mention-bot

@vstinner

I already proposed this change in my previous PR #2646, but @serhiy-storchaka complained that it would make import() slower.

But @ncoghlan also noticed that previously, the whole function was protected by the global imp lock, whereas now "module = sys.modules[name]" is not protected by any lock and so there is a risk of a race condition.

I agree, that's why I wrote this PR.

See also discussion at PR #2646.

@vstinner

If @serhiy-storchaka still considers that performance is more important that atomicity in this function, we can just move "module = sys.modules[name]" into the "with _ModuleLockManager(name):" block.

@serhiy-storchaka

In common case _find_and_load() is called when name not in sys.modules (this is checked in C code). This will cause raising KeyError. Catching an exception in Python code is slow. Therefore this change will make slower the common case.

You can use sys.modules.get(name, sentinel). This is atomic as well as sys.modules[name], but faster in common case (name not in sys.modules).

ncoghlan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything this will make the "already loaded" case a tiny bit faster (since it skips the double name lookup), while making initial loads marginally slower (but likely imperceptibly slower relative to the overheads of actually reading files and directories from disk).

So +1 from me for going with the conventional solution of switching to relying on exception handling to help ensure consistency of state.

@serhiy-storchaka

The "already loaded" case already is made significantly faster by handling in C code. This code is executed in "still not loaded" case.

@ncoghlan

@serhiy-storchaka The module lock means that in the presence of concurrent attempts at loading the module, the 2nd and subsequent threads will follow the "no KeyError" path.

It occurs to me that this could use a comment after the with statement explaining the need for that path, though.

There's also the micro-optimisation option you suggested on the original PR: creating a _NEEDS_LOADING sentinel, and using sys.modules.get(name, _NEEDS_LOADING) instead of the try/except.

@serhiy-storchaka

A race condition is exceptional. It happens only when two or more threads simultaneously import the same module first time. Once the module is imported, all subsequent imports go by fast path in C code and don't have a race condition.

This PR fixes even more rare case. When at the same time one thread unloads a module while other thread imports it (this case is similar to bpo-30876). I agree that this case is worth to be fixed, but not at the cost of more common cases.

@vstinner

You can use sys.modules.get(name, sentinel).

I don't want to add a sentinel object. I just moved "module = sys.modules[name]" in the "with _ModuleLockManager(name):" block.

ncoghlan

if name not in sys.modules:
return _find_and_load_unlocked(name, import_)
module = sys.modules[name]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the branch that I think could use a comment to say "Another thread updated sys.modules while this one was waiting for the import lock"

ncoghlan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note suggesting a clarifying comment around when the fallback path executes, but either way I'm fine with merging this approach.

@vstinner

Sorry, I don't understand which solution should be implemented:

@serhiy-storchaka

Both (1) and (3) are safe. (1) and (3) are equally safe if ignore some corner cases (non-string keys in sys.modules which raise KeyError on comparison). In normal case (3) is slightly faster than (1).

@ncoghlan

The "name in" + "sys.modules[name]" case is fine if you're holding the module lock the entire time (hence why I switched my review to approved), but I prefer the sys.modules.get approach with a sentinel since it is more obviously self-consistent and avoids the overhead of a Python level exception handler.

@vstinner

Ok, let's try sys.modules.get().

serhiy-storchaka

@@ -957,13 +954,16 @@ def _find_and_load_unlocked(name, import_):
return module
_SENTINEL = object()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nick suggested the name _NEEDS_LOADING and it looks better to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@vstinner

Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition.

Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead.

serhiy-storchaka

@serhiy-storchaka

Since this fixes an observable behavior, I think it needs a NEWS entry.

@vstinner

Since this fixes an observable behavior, I think it needs a NEWS entry.

Can you please elaborate? How replacing "name not in sys.modules" with "sys.modules.get(name, _NEEDS_LOADING)" changes the behaviour? sys.modules is a dummy dict, so there is no behaviour change, no?

The main change is that it fixes a race condition, but this commit is the 3rd one in http://bugs.python.org/issue30891 and the first commit was made by you and already contains a NEWS item:

- bpo-30814: Fixed a race condition when import a submodule from a package. 

It's enough to summarize the 3 commits, no?

@ncoghlan

Right, I think the original NEWS entry is sufficient to cover all the changes - we're just iterating on the exact details of how that fix works.

@bedevere-bot

vstinner added a commit that referenced this pull request

Jul 21, 2017

@vstinner

Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition.

Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead. (cherry picked from commit e72b135)

@serhiy-storchaka

It's enough to summarize the 3 commits, no?

Agree.

Labels

type-bug

An unexpected behavior, bug, or error