[Python-Dev] Status of packaging in 3.3

Alex Clark aclark at aclark.net
Fri Jun 22 15:13:18 CEST 2012


Hi,

On 6/22/12 1:05 AM, Nick Coghlan wrote:
> On Fri, Jun 22, 2012 at 10:01 AM, Donald Stufft <donald.stufft at gmail.com> wrote:
>> The idea i'm hoping for is to stop worrying about one implementation over
>> another and
>> hoping to create a common format that all the tools can agree upon and
>> create/install.
>
> Right, and this is where it encouraged me to see in the Bento docs
> that David had cribbed from RPM in this regard (although I don't
> believe he has cribbed *enough*).
>
> A packaging system really needs to cope with two very different levels
> of packaging:
> 1. Source distributions (e.g. SRPMs). To get from this to useful
> software requires developer tools.
> 2. "Binary" distributions (e.g. RPMs). To get from this to useful
> software mainly requires a "file copy" utility (well, that and an
> archive decompressor).
>
> An SRPM is *just* a SPEC file and source tarball. That's it. To get
> from that to an installed product, you have a bunch of additional
> "BuildRequires" dependencies, along with %build and %install scripts
> and a %files definition that define what will be packaged up and
> included in the binary RPM. The exact nature of the metadata format
> doesn't really matter, what matters is that it's a documented standard
> that multiple tools can read.
>
> An RPM includes files that actually get installed on the target
> system. An RPM can be arch specific (if they include built binary
> bits) or "noarch" if they're platform neutral.
>
> distutils really only plays at the SRPM level - there is no defined OS
> neutral RPM equivalent. That's why I brought up the bdist_simple
> discussion earlier in the thread - if we can agree on a standard
> bdist_simple format, then we can more cleanly decouple the "build"
> step from the "install" step.
>
> I think one of the key things to learn from the SPEC file format is
> the configuration language it used for the various build phases: sh
> (technically, any shell on the system, but almost everyone just uses
> the default system shell)
>
> This is why you can integrate whatever build system you like with it:
> so long as you can invoke the build from the shell, then you can use
> it to make your RPM.
>
> Now, there's an obvious problem with this: it's completely useless
> from a *cross-platform* building point of view. Isn't it a shame
> there's no language we could use that would let us invoke build
> systems in a cross platform way? Oh, wait...
>
> So here's some sheer pie-in-the-sky speculation. If people like
> elements of this idea enough to run with it, great. If not... oh well:
>
> - I believe the "egg" term has way too much negative baggage (courtesy
> of easy_install), and find the full term Distribution to be too easily
> confused with "Linux distribution". However, "Python dist" is
> unambiguous (since the more typical abbreviation for an aggregate
> distribution is "distro"). Thus, I attempt to systematically refer to
> the objects used to distribute Python software from developers to
> users as "dists". In practice, this terminology is already used in
> many places (distutils, sdist, bdist_msi, bdist_rpm, the .dist-info
> format in PEP 376 etc). Thus, Python software is distributed as dists
> (either sdists or bdists), which may in turn be converted to distro
> packages (e.g. SRPMs and RPMs) for deployment to particular
> environments.


+0.5. There is definitely a problem with the term "egg", but I don't 
think negative baggage is it.

Rather, I think "egg" is just plain too confusing, and perhaps too 
"cutsie", too. A blurb from the internet[1]:


"An egg is a bundle that contains all the package data. In the ideal 
case, an egg is a zip-compressed file with all the necessary package 
files. But in some cases, setuptools decides (or is told by switches) 
that a package should not be zip-compressed. In those cases, an egg is 
simply an uncompressed subdirectory, but with the same contents. The 
single file version is handy for transporting, and saves a little bit of 
disk space, but an egg directory is functionally and organizationally 
identical."


Compared to the definitions of package and distribution I posted earlier 
in this thread, the confusion is:

- A package is one or more modules inside another module, a distribution 
is a compressed archive of those modules, but an egg is either or both.

- The blurb author uses the term "package data" presumably to refer to 
package modules, package data (i.e. resources like templates, etc), and 
package metadata.

So to avoid this confusion I've personally stopped using the term "egg" 
in favor of "package". (Outside a computer context, everyone knows a 
package is something "with stuff in it") But as Donald said, what we are 
all talking about is technically called a "distribution". ("Honey, a 
distribution arrived for you in the mail today!" :-))

I love that Nick is thinking "outside the box" re: terminology, but I'm 
not 100% convinced the new term should be "dist". Rather I propose:

- Change the definition of package to: a module (or modules) plus 
package data and package metadata inside another module.

- Refer to source dists as "source packages" i.e. packages containing 
source code.

- Refer to binary dists as "binary packages" i.e. packages containing 
byte code and executables.


I believe this is the most "human" thing we can do[2].



Alex



[1] http://www.ibm.com/developerworks/linux/library/l-cppeak3/index.html

[2] http://python-for-humans.heroku.com



>
> - I reject setup.cfg, as I believe ini-style configuration files are
> not appropriate for a metadata format that needs to include file
> listings and code fragments
>
> - I reject bento.info, as I think if we accept
> yet-another-custom-configuration-file-format into the standard library
> instead of just using YAML, we're even crazier than is already
> apparent
>
> - I shall use "dist.yaml" as my proposed name for my "I wish I could
> define packages like this" format (and yes, that means adding yaml
> support to the standard library is part of the wish)
>
> - many of the details below will be flawed, but I want to give a clear
> idea for how a concept like this might work in practice
>
> - we need to define a clear set of build phases, and then design the
> dist metadata format accordingly. For example:
>      - source
>          - uses a "source" section in dist.yaml
>          - "source/install" maps source files directly to desired
> install locations
>             - essentially what the setup.cfg Resources section tries to do
>             - used for pure Python code, documentation, etc
>             - See below for example
>          - "source/files" defines a list of extra files to be included
>          - "source/exclude" defines the list of files to be excluded
>          - "source/run" defines a Python fragment to be executed
>          - serves a similar purpose to the "files" section in setup.cfg
>          - creates a temporary directory (and sets it as the working directory)
>          - dist.yaml is copied to the temporary directory
>          - all files to be installed are copied to the temporary directory
>          - all extra files are copied to the temporary directory
>          - the Python fragment in "source/run" is executed (which can
> thus easily add more files)
>          - if sdist archive creation is requested, entire contents of
> temporary directory are included
>      - build
>          - uses a "build" section in dist.yaml
>          - "build/install" maps built files to desired install locations
>             - like source/install, but for build artifacts
>             - compiled C extensions, .pyc and .pyo files, etc would all go here
>          - "build/run" defines a Python fragment to be executed
>          - "build/files" defines the list of files to be included
>          - "build/exclude" defines the list of files to be excluded
>          - "build/requires" defines extra dependencies not needed at runtime
>          - starting environment is a source directory that is either:
>            - preexisting (e.g. to allow building in-place in the source tree)
>            - created by running source first
>            - created by unpacking an sdist archive
>          - the Python fragment in "build/run" is executed to trigger the build
>          - if the build succeeds (i.e. doesn't throw an exception)
>            - create a temporary directory
>            - copy dist.yaml
>            - copy all specified files
>            - this is the easiest way to exclude build artifacts from
> the distribution, while still keeping them around to enable
> incremental builds
>          - if bdist_simple archive creation is requested, entire
> contents of temporary directory are included
>          - other bdist formats (such as bdist_rpm) will have their own
> rules for getting from the bdist_simple format to the platform
> specific format
>      - install
>          - uses an "install" section in dist.yaml
>          - "install/pre" defines a Python fragment to be executed
> before copying files
>          - "install/post" defines a Python fragment to be executed
> after copying files
>          - starting environment is a bdist_simple directory that is either:
>            - preexisting (e.g. to allow creation by system packaging tools)
>            - created by running build first
>            - created by unpacking a bdist_simple archive
>          - end result is a fully installed and usable piece of software
>      - test
>          - uses a "test" section in dist.yaml
>          - "test/run" defines a Python fragment to be executed to start the tests
>          - "test/requires" defines extra dependencies needed to run the
> test suite
>
> - Example "source/install" based on
> http://alexis.notmyidea.org/distutils2/setupcfg.html#complete-example
> (my YAML may be a bit dodgy).
>    - With this scheme, module installation is just another install category.
>    - A solution for easily installing entire subtrees is desirable. I
> propose the recursive glob ** syntax for that purpose.
>    - Unlike setup.cfg, every category would have an "-excluded"
> counterpart to filter unwanted files. Explicit is better than
> implicit.
>
>      source:
>        install:
>          modules:
>            example.py
>            example_pkg/*.py
>            example_pkg/**/*.py
>            example_pkg/resource.txt
>          doc:
>            README
>            doc/*
>          doc-excluded:
>            doc/man
>          man:
>            doc/man
>          scripts:
>            # Directory details are stripped automatically
>            scripts/LAUNCH
>            scripts/*.{sh,bat}
>            # But subdirectories can be made explicit
>            extras/:
>                scripts/extras/*.{sh,bat}
>
> - the goal of a dist.yaml syntax would be to be *explicit* and
> *comprehensive*. If this gets too verbose, then the solution would be
> dist.yaml generators that are less expressive, but also reduce the
> necessary boilerplate.
>
> - a typical "sdist" will now just be an archive consisting of:
>      - the project's dist.yaml file
>      - all files created by the "source" phase
>
> - the "bdist_simple" format will just be an archive consisting of:
>      - the project's dist.yaml file
>      - all files created by the "build" phase
>
> - the source and build run hooks and install pre and post hooks become
> the way you integrate with arbitrary build systems. No fancy command
> or compiler system or anything like that, you just import whatever you
> need and call it with the appropriate arguments. To other tools, they
> will just be opaque chunks of text, but to the build system, they're
> executable pieces of Python code, just as RPM includes executable
> scripts.
>
> Cheers,
> Nick.
>


-- 
Alex Clark · http://pythonpackages.com





More information about the Python-Dev mailing list