Message 404619 - Python tracker (original) (raw)
On Wed, Oct 20, 2021 at 6:11 PM Barry A. Warsaw <report@bugs.python.org> wrote:
I guess a question to answer then is whether we philosophically want the module attributes to be equivalent to the spec attributes. And by equivalent, I mean enforced to be exactly so, and thus a proxy. To me, the duplication is a wart that we should migrate away from so there’s only one place for these attributes, and that should be the spec.
Here is the mapping we currently describe in the docs:
mod.name === spec.name mod.package === spec.parent mod.loader === spec.loader mod.file === spec.origin mod.path === spec.submodule_search_locations mod.cached === spec.cached
But right now, they don’t have to stay in sync, and I don’t think it’s reasonable to put the onus on the user to keep them in sync, because it’s unclear what code uses which attribute. Okay, so you can just set them both to be safe, but then you can’t do that with spec.parent/package
Currently any of the module attrs can be different than the spec. In two cases they can legitimately be different: name (with main) and file (with frozen stdlib modules). For the rest, they should be in sync.
Treating the spec as the single source of truth makes sense. My only concern has been that you can no longer determine how a module was originally imported once the spec is changed. However, I just realized that you can always run importlib.util.find_spec() to reproduce that info (with some minor caveats). So now I'm less concerned about that. :)
Notably, users have forever(?) been able to modify all of the module attrs, with impact on the import system: package and path affecting later imports, and the rest affecting reload.
FWIW, an "advantage" of the module attrs is that they can be set in the module code. The same is true for the corresponding spec attrs but with just enough indirection to require more intent.
Regardless, the idea of post-import modifications to modules/specs has always made me uncomfortable. As a user I'd expect an alternative that feels less like a (non-obvious) low-level hack.
====
To me here are the important questions:
- when does code ever modify the module attrs (or spec) and why?
- should we distinguish the roles of the module attrs and spec (how-module-was-loaded vs. how-module-will-reload vs. how-module-impacts-other-imports)?
- would it make sense to store spec modifications separately from the spec (e.g. on the module)?
- which attrs should be deprecated?
- should any module attrs (the ones that don't get eventually removed) be read-only? What about spec attrs?
- would it be better to provide importlib.util.* helpers to address those needs, instead of having folks modify the module/spec directly?
My take:
- that would be nice to know :)
- that depends on what matters in practice. My gut says the distinctions aren't important enough to do anything about it, except where there are legitimate differences between the module and spec.
Currently the module attrs cover all three roles. The spec only covers how-module-was-loaded (but is used as a fallback for the other two roles in most cases).
Those two special cases, with name and file being out of sync, are meaningful only for introspection, rather than affecting the import machinery. (In the case of frozen modules that have file set, note that spec.has_location is False.) I'm not sure how these fit in with the different roles.
Advantages to keeping the spec exclusively how-module-was-imported:
- it's what I'd expect; having to call importlib.util.find_spec() isn't the obvious thing
- the loader can modify the spec, so importlib.util.find_spec() won't necessarily match
None of those appear important enough to warrant keeping the status quo. The disadvantages seem heavier (maintenance costs and user confusion with (unnecessarily) having multiple sources of truth).
- probably not, though it depends on (2)
However, if all those module attrs become read-only then we would need to figure out where to store name and file in those special cases.
- everything except name and file (and probably path)
- for modules, yes; for the spec, only if we stick with the one role
On modules I'd expect all of them to become properties regardless, with most of them becoming read-only eventually:
getter:
- proxy the corresponding spec attr
- a deprecation warning if it isn't an attr that needs to stay
setter:
- proxy the corresponding spec attr
- a deprecation warning for now on all attrs
- a deprecation error later on all attrs
- an AttributeError even later (do not make it a data-only descriptor)
A bonus advantage of properties is that they would reduce clutter on the module dict.
What about path? We'll probably keep it as a traditional indicator that the module is a package. However, do we make it a read-only proxy of spec.module_search_locations? (We already use a path proxy for namespace packages.)
- it probably isn't worth it.
Due to the extra indirection, modifying the spec seems like a more deliberate (non-accidental or confused) action than changing the module attrs. That's probably enough "help". However, in cases where multiple attrs together have specific meaning, such helpers might be helpful for users.
====
Regarding file being different from spec.origin, it might be worth revisiting the question of "origin" vs. "location" on the spec. Note that, in the case of frozen stdlib modules, spec.has_location is False even though file is set. That smells fishy to me.