[Python-Dev] Better support for consuming vendored packages (original) (raw)

Gregory Szorc gregory.szorc at gmail.com
Thu Mar 22 12:58:07 EDT 2018

Previous message (by thread): [Python-Dev] Do we have vlookup function
Next message (by thread): [Python-Dev] Better support for consuming vendored packages
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I'd like to start a discussion around practices for vendoring package dependencies. I'm not sure python-dev is the appropriate venue for this discussion. If not, please point me to one and I'll gladly take it there.

I'll start with a problem statement.

Not all consumers of Python packages wish to consume Python packages in the common pip install <package> + import <package> manner. Some Python applications may wish to vendor Python package dependencies such that known compatible versions are always available.

For example, a Python application targeting a general audience may not wish to expose the existence of Python nor want its users to be concerned about Python packaging. This is good for the application because it reduces complexity and the surface area of things that can go wrong.

But at the same time, Python applications need to be aware that the Python environment may contain more than just the Python standard library and whatever Python packages are provided by that application. If using the system Python executable, other system packages may have installed Python packages in the system site-packages and those packages would be visible to your application. A user could pip install a package and that would be in the Python environment used by your application. In short, unless your application distributes its own copy of Python, all bets are off with regards to what packages are installed. (And even then advanced users could muck with the bundled Python, but let's ignore that edge case.)

In short, import X is often the wild west. For applications that want to "just work" without requiring end users to manage Python packages, import X is dangerous because X could come from anywhere and be anything - possibly even a separate code base providing the same package name!

Since Python applications may not want to burden users with Python packaging, they may vendor Python package dependencies such that a known compatible version is always available. In most cases, a Python application can insert itself into sys.path to ensure its copies of packages are picked up first. This works a lot of the time. But the strategy can fall apart.

Some Python applications support loading plugins or extensions. When user-provided code can be executed, that code could have dependencies on additional Python packages. Or that custom code could perform sys.path modifications to provide its own package dependencies. What this means is that import X from the perspective of the main application becomes dangerous again. You want to pick up the packages that you provided. But you just aren't sure that those packages will actually be picked up. And to complicate matters even more, an extension may wish to use a different version of a package from what you distribute. e.g. they may want to adopt the latest version that you haven't ported to you or they may want to use an old versions because they haven't ported yet. So now you have the requirements that multiple versions of packages be available. In Python's shared module namespace, that means having separate package names.

A partial solution to this quagmire is using relative - not absolute - imports. e.g. say you have a package named "knights." It has a dependency on a 3rd party package named "shrubbery." Let's assume you distribute your application with a copy of "shrubbery" which is installed at some packages root, alongside "knights:"

/ /knights/init.py /knights/ni.py /shrubbery/init.py

If from knights.ni you import shrubbery, you /could/ get the copy of "shrubbery" distributed by your application. Or you could pick up some other random copy that is also installed somewhere in sys.path.

Whereas if you vendor "shrubbery" into your package. e.g.

/ /knights/init.py /knights/ni.py /knights/vendored/init.py /knights/vendored/shrubbery/init.py

Then from knights.ni you from .vendored import shrubbery, you are guaranteed to get your local copy of the "shrubbery" package.

This reliable behavior is highly desired by Python applications.

But there are problems.

What we've done is effectively rename the "shrubbery" package to "knights.vendored.shrubbery." If a module inside that package attempts an import shrubbery.x, this could fail because "shrubbery" is no longer the package name. Or worse, it could pick up a separate copy of "shrubbery" somewhere else in sys.path and you could have a Frankenstein package pulling its code from multiple installs. So for this to work, all package-local imports must be using relative imports. e.g. from . import x.

The takeaway is that packages using relative imports for their own modules are much more flexible and therefore friendly to downstream consumers that may wish to vendor them under different names. Packages using relative imports can be dropped in and used, often without source modifications. This is a big deal, as downstream consumers don't want to be modifying/forking packages they don't maintain. Because of the advantages of relative imports, I've individually reached the conclusion that relative imports within packages should be considered a best practice. I would encourage the Python community to discuss adopting that practice more formally (perhaps as a PEP or something).

But package-local relative imports aren't a cure-all. There is a major problem with nested dependencies. e.g. if "shrubbery" depends on the "herring" package. There's no reasonable way of telling "shrubbery" that "herring" is actually provided by "knights.vendored." You might be tempted to convert non package-local imports to relative. e.g. from .. import herring. But the importer doesn't allow relative imports outside the current top-level package and this would break classic installs where "shrubbery" and "herring" are proper top-level packages and not sub-packages in e.g. a "vendored" sub-package. For cases where this occurs, the easiest recourse today is to rewrite imported source code to use relative imports. That's annoying, but it works.

In summary, some Python applications may want to vendor and distribute Python package dependencies. Reliance on absolute imports is dangerous because the global Python environment is effectively undefined from the perspective of the application. The safest thing to do is use relative imports from within the application. But because many packages don't use relative imports themselves, vendoring a package can require rewriting source code so imports are relative. And even if relative imports are used within that package, relative imports can't be used for other top-level packages. So source code rewriting is required to handle these. If you vendor your Python package dependencies, your world often consists of a lot of pain. It's better to absorb that pain than inflict it on the end-users of your application (who shouldn't need to care about Python packaging). But this is a pain that Python application developers must deal with. And I feel that pain undermines the health of the Python ecosystem because it makes Python a less attractive platform for standalone applications.

I would very much welcome a discussion and any ideas on improving the Python package dependency problem for standalone Python applications. I think encouraging the use of relative imports within packages is a solid first step. But it obviously isn't a complete solution.

Gregory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20180322/31aa2f39/attachment.html>

Previous message (by thread): [Python-Dev] Do we have vlookup function
Next message (by thread): [Python-Dev] Better support for consuming vendored packages
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list