[Python-Dev] PEP 420 - dynamic path computation is missing rationale (original) (raw)
PJ Eby pje at telecommunity.com
Mon May 21 20:08:16 CEST 2012
- Previous message: [Python-Dev] PEP 420 - dynamic path computation is missing rationale
- Next message: [Python-Dev] PEP 420 - dynamic path computation is missing rationale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, May 21, 2012 at 9:55 AM, Guido van Rossum <guido at python.org> wrote:
Ah, I see. But I disagree that this is a reasonable constraint on sys.path. The magic path object of a toplevel namespace module should know it is a toplevel module, and explicitly refetch sys.path rather than just keeping around a copy.
That's fine by me - the class could actually be defined to take a module name and attribute (e.g. 'sys', 'path' or 'foo', 'path'), and then there'd be no need to special case anything: it would behave exactly the same way for subpackages and top-level packages.
This leaves the magic path objects for namespace modules, which I could live with, as long as their repr was not the same as a list, and assuming a good rationale is given. Although I'd still prefer plain lists here as well; I'd like to be able to manually construct a namespace package and force its directories to be a specific set of directories that I happen to know about, regardless of whether they are related to sys.path or not. And I'd like to know that my setup in that case would not be disturbed by changes to sys.path.
To do that, you just assign to path, the same as now, ala path = pkgutil.extend_path(). The auto-updating is in the initially-assigned path object, not the module object or some sort of generalized magic.
I'd like to hear more about this from Philip -- is that feature
actually widely used?
Well, it's built into setuptools, so yes. ;-) It gets used any time a dynamically specified dependency is used that might contain a namespace package. This means, for example, that every setup script out there using "setup.py test", every project using certain paste.deploy features... it's really difficult to spell out the scope of things that are using this, in the context of setuptools and distribute, because there are an immense number of ways to indirectly rely on it.
This doesn't mean that the feature can't continue to be implemented inside setuptools' dynamic dependency system, but the code to do it in setuptools is MUCH more complicated than the PEP 420 code, and doesn't work if you manually add something to sys.path without asking setuptools to do it. It's also somewhat timing-sensitive, depending on when and whether you import 'site' and pkg_resources, and whether you are mixing eggs and non-eggs in your namespace packages.
In short, the implementation is a huge mess that the PEP 420 approach would vastly simplify.
But... that wasn't the original reason why I proposed it. The original reason was simply that it makes namespace packages act more like the equivalents do in other languages. While being able to override path can be considered a feature of Python, its being static by default is NOT a feature, in the same way that requiring an init.py is not really a feature.
The principle of least surprise says (at least IMO) that if you add a directory to sys.path, you should be able to import stuff from it. That whether it works depends on whether or not you already imported part of a namespace package earlier is both surprising and confusing. (More on this below.)
What would a package have to do if the feature didn't exist?
Continue to depend on setuptools to do it for them, or use some hypothetical update API... but that's not really the right question. ;-)
The right question is, what happens to package users if the feature didn't exist?
And the answer to that question is, "you must call this hypothetical update API every time you change sys.path, because otherwise your imports might break, depending on whether or not some other package imported something from a namespace before you changed sys.path".
And of course, you also need to make sure that any third-party code you use does this too, if it adds something to sys.path for you.
And if you're writing cross-Python-version code, you need to check to make sure whether the API is actually available.
And if you're someone helping Python newbies, you need to add this to your list of debugging questions for import-related problems.
And remember: if you forget to do this, it might not break now. It'll break later, when you add that other plugin or update that random module that dynamically decides to import something that just happens to be in a namespace package, so be prepared for it to break your application in the field, when an end-user is using it with a collection of plugins that you haven't tested together, or in the same import sequence...
The people using setuptools won't have these problems, but new Python users will, as people begin using a PEP 420 that lacks this feature.
The key scope question, I think, is: "How often do programs change sys.path at runtime, and what have they imported up to that point?" (Because for the other part of the scope, I think it's a fairly safe bet that namespace packages are going to become even more popular than they are now, once PEP 420 is in place.)
But the key API/usability question is: "What's the One Obvious Way to add/change what's importable?"
And I believe the answer to that question is, "change sys.path", not "change sys.path, and then import some other module to call another API to say, 'yes, I really meant to update sys.path, thank you very much.'"
(Especially since NOT requiring that extra API isn't going to break any existing code.)
I'd really much rather not have this feature, which reeks of too much magic to me. (An area where Philip and I often disagree. :-)
My take on it is that it only SEEMS like magic, because we're used to static path. But other languages don't have per-package path in the first place, so there's nothing to "automatically update", and so it's not magic at all that other subpackages/modules can be found when the system path changes!
So, under the PEP 420 approach, it's static path that's really the weird special case, and should be considered so. (After all, path is and was primarily an implementation optimization and compatibility hack, rather than a user-facing "feature" of the import system.)
For example, when would you want to explicitly spell out a namespace package path, and restrict it from seeing sys.path changes? I've not seen anybody ask for this feature in the context of setuptools; it's only ever been bug reports about when the more complicated implementation fails to detect an update.
So, to wrap up:
The primary rationale for the feature is that "least surprise" for a new user to Python is that adding to sys.path should allow importing a portion of a namespace, whether or not you've already imported some other thing in that namespace. Symmetry with other languages and with other Python features (e.g. changing the working directory in an interactive interpreter) suggests it, and the removal of a similar timing dependency from PEP 402 (preventing direct import of a namespace-only package unless you imported a subpackage first) suggests that the same type of timing dependency should be removed here, too. (Note, for example, that I may not know that importing baz.spam indirectly causes some part of foo.wiz to be imported, and that if I then add another directory to sys.path containing a foo.* portion, my code will no longer work when I try to import foo.ham. This is much more "magical" behavior, in least-surprise terms!)
The constraints on sys.path and package path objects can and should be removed, by making the dynamic path objects refer to a module and attribute, instead of directly referencing parent path objects. Code that currently manipulates path will not break, because such code will not be using PEP 420 namespace packages anyway (and so, path will be a list. (Even so, the most common path manipulation idiom is "path = pkgutil.extend_path(...)" anyway!)
Namespace packages are a widely used feature of setuptools, and AFAIK nobody has ever asked to stop dynamic additions to namespace path, but a wide assortment of things people do with setuptools rely on dynamic additions under the hood. Providing the feature in PEP 420 gives a migration path away from setuptools, at least for this one feature. (Specifically, it does away with the need to use declare_namespace(), and the need to do all sys.path manipulation via setuptools' requirements API.)
Self-contained (init.py packages) and fixed path lists can and should be considered the "magic" or "special case" parts of importing in Python 3, even though we're accustomed to them being central import concepts in Python 2. Modules and namespace packages can and should be the default case from an instructional POV, and sys.path updating should reflect this. (That is, future tutorials should introduce modules, then namespace packages, and finally self-contained packages with init and path, because the idea of a namespace package doesn't depend on path existing in the first place; it's essentially only a historical accident that self-contained packages were implemented in Python first.) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120521/9da15315/attachment.html>
- Previous message: [Python-Dev] PEP 420 - dynamic path computation is missing rationale
- Next message: [Python-Dev] PEP 420 - dynamic path computation is missing rationale
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]