bpo-30891: importlib _find_and_load() try/except by vstinner · Pull Request #2665 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation21 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
importlib _find_and_load() now uses a "try/except KeyError" block to
check if the imported module is not sys.modules, rather than checking
for "name is sys.modules" and then get it with sys.modules[name].
This change should prevent a race condition.
I already proposed this change in my previous PR #2646, but @serhiy-storchaka complained that it would make import() slower.
But @ncoghlan also noticed that previously, the whole function was protected by the global imp lock, whereas now "module = sys.modules[name]" is not protected by any lock and so there is a risk of a race condition.
I agree, that's why I wrote this PR.
See also discussion at PR #2646.
If @serhiy-storchaka still considers that performance is more important that atomicity in this function, we can just move "module = sys.modules[name]" into the "with _ModuleLockManager(name):" block.
In common case _find_and_load()
is called when name not in sys.modules (this is checked in C code). This will cause raising KeyError. Catching an exception in Python code is slow. Therefore this change will make slower the common case.
You can use sys.modules.get(name, sentinel)
. This is atomic as well as sys.modules[name]
, but faster in common case (name not in sys.modules).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anything this will make the "already loaded" case a tiny bit faster (since it skips the double name lookup), while making initial loads marginally slower (but likely imperceptibly slower relative to the overheads of actually reading files and directories from disk).
So +1 from me for going with the conventional solution of switching to relying on exception handling to help ensure consistency of state.
The "already loaded" case already is made significantly faster by handling in C code. This code is executed in "still not loaded" case.
@serhiy-storchaka The module lock means that in the presence of concurrent attempts at loading the module, the 2nd and subsequent threads will follow the "no KeyError" path.
It occurs to me that this could use a comment after the with statement explaining the need for that path, though.
There's also the micro-optimisation option you suggested on the original PR: creating a _NEEDS_LOADING
sentinel, and using sys.modules.get(name, _NEEDS_LOADING)
instead of the try/except.
A race condition is exceptional. It happens only when two or more threads simultaneously import the same module first time. Once the module is imported, all subsequent imports go by fast path in C code and don't have a race condition.
This PR fixes even more rare case. When at the same time one thread unloads a module while other thread imports it (this case is similar to bpo-30876). I agree that this case is worth to be fixed, but not at the cost of more common cases.
You can use sys.modules.get(name, sentinel).
I don't want to add a sentinel object. I just moved "module = sys.modules[name]" in the "with _ModuleLockManager(name):" block.
if name not in sys.modules: |
---|
return _find_and_load_unlocked(name, import_) |
module = sys.modules[name] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the branch that I think could use a comment to say "Another thread updated sys.modules while this one was waiting for the import lock"
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note suggesting a clarifying comment around when the fallback path executes, but either way I'm fine with merging this approach.
Sorry, I don't understand which solution should be implemented:
- "try/except KeyError" is too slow according to @serhiy-storchaka
- "name in" followed by "module = sys.modules[name"] is unsafe accordng to @ncoghlan
- Is sys.modules.get(name, sentinel) the right fix for this issue? Would it be fast enough?
Both (1) and (3) are safe. (1) and (3) are equally safe if ignore some corner cases (non-string keys in sys.modules which raise KeyError on comparison). In normal case (3) is slightly faster than (1).
The "name in" + "sys.modules[name]" case is fine if you're holding the module lock the entire time (hence why I switched my review to approved
), but I prefer the sys.modules.get
approach with a sentinel since it is more obviously self-consistent and avoids the overhead of a Python level exception handler.
Ok, let's try sys.modules.get().
@@ -957,13 +954,16 @@ def _find_and_load_unlocked(name, import_): |
---|
return module |
_SENTINEL = object() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nick suggested the name _NEEDS_LOADING
and it looks better to me.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition.
Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead.
Since this fixes an observable behavior, I think it needs a NEWS entry.
Since this fixes an observable behavior, I think it needs a NEWS entry.
Can you please elaborate? How replacing "name not in sys.modules" with "sys.modules.get(name, _NEEDS_LOADING)" changes the behaviour? sys.modules is a dummy dict, so there is no behaviour change, no?
The main change is that it fixes a race condition, but this commit is the 3rd one in http://bugs.python.org/issue30891 and the first commit was made by you and already contains a NEWS item:
- bpo-30814: Fixed a race condition when import a submodule from a package.
It's enough to summarize the 3 commits, no?
Right, I think the original NEWS entry is sufficient to cover all the changes - we're just iterating on the exact details of how that fix works.
vstinner added a commit that referenced this pull request
Use sys.modules.get() in the "with _ModuleLockManager(name):" block to protect the dictionary key with the module lock and use an atomic get to prevent race condition.
Remove also _bootstrap._POPULATE since it was unused (_bootstrap_external now has its own _POPULATE object), add a new _SENTINEL object instead. (cherry picked from commit e72b135)
It's enough to summarize the 3 commits, no?
Agree.
Labels
An unexpected behavior, bug, or error