[Python-Dev] Status of packaging in 3.3 (original) (raw)

Alex Clark aclark at aclark.net
Fri Jun 22 15:13:18 CEST 2012


Hi,

On 6/22/12 1:05 AM, Nick Coghlan wrote:

On Fri, Jun 22, 2012 at 10:01 AM, Donald Stufft <donald.stufft at gmail.com> wrote:

The idea i'm hoping for is to stop worrying about one implementation over another and hoping to create a common format that all the tools can agree upon and create/install. Right, and this is where it encouraged me to see in the Bento docs that David had cribbed from RPM in this regard (although I don't believe he has cribbed enough). A packaging system really needs to cope with two very different levels of packaging: 1. Source distributions (e.g. SRPMs). To get from this to useful software requires developer tools. 2. "Binary" distributions (e.g. RPMs). To get from this to useful software mainly requires a "file copy" utility (well, that and an archive decompressor). An SRPM is just a SPEC file and source tarball. That's it. To get from that to an installed product, you have a bunch of additional "BuildRequires" dependencies, along with %build and %install scripts and a %files definition that define what will be packaged up and included in the binary RPM. The exact nature of the metadata format doesn't really matter, what matters is that it's a documented standard that multiple tools can read. An RPM includes files that actually get installed on the target system. An RPM can be arch specific (if they include built binary bits) or "noarch" if they're platform neutral. distutils really only plays at the SRPM level - there is no defined OS neutral RPM equivalent. That's why I brought up the bdistsimple discussion earlier in the thread - if we can agree on a standard bdistsimple format, then we can more cleanly decouple the "build" step from the "install" step. I think one of the key things to learn from the SPEC file format is the configuration language it used for the various build phases: sh (technically, any shell on the system, but almost everyone just uses the default system shell) This is why you can integrate whatever build system you like with it: so long as you can invoke the build from the shell, then you can use it to make your RPM. Now, there's an obvious problem with this: it's completely useless from a cross-platform building point of view. Isn't it a shame there's no language we could use that would let us invoke build systems in a cross platform way? Oh, wait... So here's some sheer pie-in-the-sky speculation. If people like elements of this idea enough to run with it, great. If not... oh well: - I believe the "egg" term has way too much negative baggage (courtesy of easyinstall), and find the full term Distribution to be too easily confused with "Linux distribution". However, "Python dist" is unambiguous (since the more typical abbreviation for an aggregate distribution is "distro"). Thus, I attempt to systematically refer to the objects used to distribute Python software from developers to users as "dists". In practice, this terminology is already used in many places (distutils, sdist, bdistmsi, bdistrpm, the .dist-info format in PEP 376 etc). Thus, Python software is distributed as dists (either sdists or bdists), which may in turn be converted to distro packages (e.g. SRPMs and RPMs) for deployment to particular environments.

+0.5. There is definitely a problem with the term "egg", but I don't think negative baggage is it.

Rather, I think "egg" is just plain too confusing, and perhaps too "cutsie", too. A blurb from the internet[1]:

"An egg is a bundle that contains all the package data. In the ideal case, an egg is a zip-compressed file with all the necessary package files. But in some cases, setuptools decides (or is told by switches) that a package should not be zip-compressed. In those cases, an egg is simply an uncompressed subdirectory, but with the same contents. The single file version is handy for transporting, and saves a little bit of disk space, but an egg directory is functionally and organizationally identical."

Compared to the definitions of package and distribution I posted earlier in this thread, the confusion is:

So to avoid this confusion I've personally stopped using the term "egg" in favor of "package". (Outside a computer context, everyone knows a package is something "with stuff in it") But as Donald said, what we are all talking about is technically called a "distribution". ("Honey, a distribution arrived for you in the mail today!" :-))

I love that Nick is thinking "outside the box" re: terminology, but I'm not 100% convinced the new term should be "dist". Rather I propose:

I believe this is the most "human" thing we can do[2].

Alex

[1] http://www.ibm.com/developerworks/linux/library/l-cppeak3/index.html

[2] http://python-for-humans.heroku.com

- I reject setup.cfg, as I believe ini-style configuration files are not appropriate for a metadata format that needs to include file listings and code fragments - I reject bento.info, as I think if we accept yet-another-custom-configuration-file-format into the standard library instead of just using YAML, we're even crazier than is already apparent - I shall use "dist.yaml" as my proposed name for my "I wish I could define packages like this" format (and yes, that means adding yaml support to the standard library is part of the wish) - many of the details below will be flawed, but I want to give a clear idea for how a concept like this might work in practice - we need to define a clear set of build phases, and then design the dist metadata format accordingly. For example: - source - uses a "source" section in dist.yaml - "source/install" maps source files directly to desired install locations - essentially what the setup.cfg Resources section tries to do - used for pure Python code, documentation, etc - See below for example - "source/files" defines a list of extra files to be included - "source/exclude" defines the list of files to be excluded - "source/run" defines a Python fragment to be executed - serves a similar purpose to the "files" section in setup.cfg - creates a temporary directory (and sets it as the working directory) - dist.yaml is copied to the temporary directory - all files to be installed are copied to the temporary directory - all extra files are copied to the temporary directory - the Python fragment in "source/run" is executed (which can thus easily add more files) - if sdist archive creation is requested, entire contents of temporary directory are included - build - uses a "build" section in dist.yaml - "build/install" maps built files to desired install locations - like source/install, but for build artifacts - compiled C extensions, .pyc and .pyo files, etc would all go here - "build/run" defines a Python fragment to be executed - "build/files" defines the list of files to be included - "build/exclude" defines the list of files to be excluded - "build/requires" defines extra dependencies not needed at runtime - starting environment is a source directory that is either: - preexisting (e.g. to allow building in-place in the source tree) - created by running source first - created by unpacking an sdist archive - the Python fragment in "build/run" is executed to trigger the build - if the build succeeds (i.e. doesn't throw an exception) - create a temporary directory - copy dist.yaml - copy all specified files - this is the easiest way to exclude build artifacts from the distribution, while still keeping them around to enable incremental builds - if bdistsimple archive creation is requested, entire contents of temporary directory are included - other bdist formats (such as bdistrpm) will have their own rules for getting from the bdistsimple format to the platform specific format - install - uses an "install" section in dist.yaml - "install/pre" defines a Python fragment to be executed before copying files - "install/post" defines a Python fragment to be executed after copying files - starting environment is a bdistsimple directory that is either: - preexisting (e.g. to allow creation by system packaging tools) - created by running build first - created by unpacking a bdistsimple archive - end result is a fully installed and usable piece of software - test - uses a "test" section in dist.yaml - "test/run" defines a Python fragment to be executed to start the tests - "test/requires" defines extra dependencies needed to run the test suite - Example "source/install" based on http://alexis.notmyidea.org/distutils2/setupcfg.html#complete-example (my YAML may be a bit dodgy). - With this scheme, module installation is just another install category. - A solution for easily installing entire subtrees is desirable. I propose the recursive glob ** syntax for that purpose. - Unlike setup.cfg, every category would have an "-excluded" counterpart to filter unwanted files. Explicit is better than implicit. source: install: modules: example.py examplepkg/*.py examplepkg/**/*.py examplepkg/resource.txt doc: README doc/* doc-excluded: doc/man man: doc/man scripts: # Directory details are stripped automatically scripts/LAUNCH scripts/*.{sh,bat} # But subdirectories can be made explicit extras/: scripts/extras/*.{sh,bat} - the goal of a dist.yaml syntax would be to be explicit and comprehensive. If this gets too verbose, then the solution would be dist.yaml generators that are less expressive, but also reduce the necessary boilerplate. - a typical "sdist" will now just be an archive consisting of: - the project's dist.yaml file - all files created by the "source" phase - the "bdistsimple" format will just be an archive consisting of: - the project's dist.yaml file - all files created by the "build" phase - the source and build run hooks and install pre and post hooks become the way you integrate with arbitrary build systems. No fancy command or compiler system or anything like that, you just import whatever you need and call it with the appropriate arguments. To other tools, they will just be opaque chunks of text, but to the build system, they're executable pieces of Python code, just as RPM includes executable scripts. Cheers, Nick.

-- Alex Clark ยท http://pythonpackages.com



More information about the Python-Dev mailing list