[Distutils] formencode as .egg in Debian ??

Phillip J. Eby pje at telecommunity.com
Wed Nov 23 02:03:34 CET 2005


At 01:21 AM 11/23/2005 +0100, Martin v. Löwis wrote:
>Phillip J. Eby wrote:
>>>>>Debian should provide the packages, but not as eggs.
>>>>
>>>>
>>>>For packages that only operate as eggs, and/or require their 
>>>>dependencies as eggs, you are stating a contradiction in terms.  Eggs 
>>>>are not merely a distribution format, any more than Java .jar files are.
>>>
>>>So I should say
>>>
>>>"Debian should not provide eggs, period", since what Debian provides
>>>are packages, and eggs are not?
>>
>>I don't understand you.
>
>This is getting difficult: I don't actually know what "a contradiction
>in terms" is. You seemed to be saying that eggs are not a distribution
>format.

They are not a distribution format.  There are in fact three physical 
formats that an  egg can take (if we ignore .egg-link files, which are 
really needed only to work around the absence of symlinks on Windows).  In 
principle, there could be many others.

I suspect that part of the confusion stems that I prefer to use "package" 
to refer only to a Python package (thing you import), and not to refer to a 
distribution as a "package".  However, Debian calls distributions 
"packages", so some confusion is perhaps inevitable.  What's more, it 
appears that the Debian policy calls for the Debian package to be named for 
the contained Python package, regardless of whether that's the name of the 
distribution.

An "egg" is a "distribution" of a "project" that is importable and can 
carry both standardized and individualized metadata that can be read by the 
pkg_resources module.  There are various distribution *formats* in which an 
"egg" may be physically manifested, but the "egg" itself is a logical 
concept, not a physical one.  It is therefore, as I said, "not merely a 
distribution format".  Is that any clearer?

The "contradiction in terms" was that I took your meaning of "package" to 
be the same as my term "project" - i.e., a functional collection of Python 
resources.  Projects that *are* eggs, can't be provided "but not as 
eggs".  They *are* eggs, so not providing them as eggs means not providing 
them at all.

In contrast, projects that are not built with setuptools aren't inherently 
eggs, but you can certainly make eggs out of them.  For these projects, you 
*do* have the choice to provide them "not as eggs", but then they are also 
of no use to the projects that need eggs.

As we've already briefly discussed, in the simplest form a project can be 
made eggs just by adding an appropriately-named .egg-info/PKG-INFO file.


>  If so, Debian should not distribute them.

This is what I don't understand, as it has nothing to do whether or not is 
a distribution format, at least not that I can see.  My statement was that 
eggs are not merely a distribution format; they are a logical concept that 
can be physically packaged in various ways, and if it's necessary to invent 
yet another physical layout, well, we can do that too.


>If eggs are,
>in fact, a distribution format: what is the contradiction then?
>I would still claim that Debian should not distribute them, but
>instead distribute policy-conforming Debian packages instead.

Which would be the same as saying you wouldn't distribute, say, setuptools 
itself.  Setuptools is an egg, and can't function except as an egg, because 
it is more than a Python package.  Again, an "egg" is some specific release 
of a project and its introspectable metadata.


>>I still don't understand you.  If a package subclasses a distutils 
>>command, is it no longer a distutils setup?
>
>It is not a distutils setup because it does not invoke
>distutils.core.setup.

Now I really don't understand you.  Line 43 of setuptools/__init__.py reads:

     setup = distutils.core.setup

So, how is it not invoking distutils.core.setup?


>>What if it bundles a library module that includes a subclass of a 
>>distutils command?  Where, precisely, do you draw the line between a 
>>"distutils setup" and something else?
>
>Extending distutils is fine. An extension is a feature that, if not
>invoked, has no effect. easy_setup changes install in a way that
>has an effect.

So do all the packages that rework install_data to be more to their liking 
- and there are quite a lot of them, as I discovered when I began testing 
easy_install.


>>Nothing except performance considerations prevents you having a separate 
>>.pth file for each and every egg
>
>That is not true. Usability also suffers if sys.path becomes long.

How?  I don't understand this.  Someone using eggs rarely has reason to 
manually manipulate sys.path unless they are adding some kind of plugin 
directory to it.  If they want to know what package version they are using, 
pkg_resources provides a superior API for querying it; I can say e.g. 
'require("TurboGears")' and receive back a list of all the eggs that 
compose or are required by TurboGears, along with their locations.  (Or 
conversely, receive a DistributionNotFound or VersionConflict error 
explaining what's missing or what was already imported that's a different 
version than the one needed.)


> >> but in a way unfriendly to dpkg
>>I don't understand you here.  Are you saying that it's not possible for 
>>dpkg to do a post-install or uninstall operation like adding or removing 
>>a line from a file?
>
>That is certainly possible - but currently, each maintainer would have
>to come up with his own solution. This is more tedious to do than it
>could be.

easy_deb implements this, so it seems to me it would be a simple matter of 
running easy_deb to produce the .deb from the .egg.  (Caveat: I have not 
used easy_deb, but its author assures me that it is able to handle the .pth 
manipulation in a sane way.)


>>Of course, this creates additional work for package maintainers that 
>>wouldn't be present with setuptools' normal .egg file/directory 
>>distributions, and my assumption was that the maintainers would prefer to 
>>be able to ignore such issues and get the benefit of dependencies defined 
>>by the upstream developers.  Eggs keep each project in its own little 
>>bubble, where it can't overwrite anything else and can be uninstalled 
>>without removing any overlapping parts.
>
>I don't see how the maintainer could use the dependency information
>in the egg files. Debian policy is that the .deb files need to
>define proper dependencies, so the maintainer has to lookup
>and edit the dependency information *anyway*. Using the egg
>package name is of limited, help, either, because Debian policy
>mandates a certain naming scheme for packages, giving the
>FormEncode package a name of python2.4-formencode.

What I would suggest here is having a namespace (e.g. pyegg2.4-whatever) 
for naming packages based on their PyPI names, so that there can be an 
automated relationship between setuptools dependencies and Debian 
ones.  This doesn't work for existing Debian packages, of course, but it 
seems to me that they could in fact have the same contents as their pyegg 
cousin; both could simply use the .egg-info approach.  (easy_deb uses 
python-pypi-whatever, which seems a bit long to me, but then, it's also 
implemented and my pyegg2.4 idea isn't.)

Anyway, I don't see any obvious reasons why this can't be an automated 
process, even for the system library dependencies.  easy_deb even has a 
simple configuration file that can augment the setuptools-style 
dependencies with explicit Debian dependencies.  There's also nothing 
stopping us from defining a way to add Debian dependency information to 
setup(); in fact setuptools encourages this by offering an extensible 
system to allow distutils extensions to offer and validate new setup() 
keywords and use them to generate additional metadata in the egg.  This 
would make it possible to push back Debian dependency information to 
upstream maintainers, if this were desired.



More information about the Distutils-SIG mailing list