[Distutils] formencode as .egg in Debian ??

Phillip J. Eby pje at telecommunity.com
Wed Nov 23 18:18:07 CET 2005


At 11:08 AM 11/23/2005 +0100, Martin v. Löwis wrote:
>As for terminology, you seem to suggest to use "distribution" where
>Debian uses "package". So "Debian package" would become "Debian
>distribution".

No, I'm fine with "Debian package"; I was using "distribution" in the sense 
of "distutils distribution", such that you can have a "Debian package" of a 
"distutils distribution".  The issue is that a "Python package" is not 1:1 
with either a "Debian package" nor a "distutils distribution".  An "egg" is 
a "distutils distribution" that may or may not contain "Python packages", 
but also contains "egg metadata" which is specific to the "distribution", 
not to any individual Python module or Python package contained within that 
distribution.


>I'll try to use "project" in your sense and "package" in the
>Python sense whenever I can.

Great - and let's use "Debian package" to mean the thing that manages the 
installation of a project containing packages.  :)


>Phillip J. Eby wrote:
>>An "egg" is a "distribution" of a "project" that is importable and can 
>>carry both standardized and individualized metadata that can be read by 
>>the pkg_resources module.  There are various distribution *formats* in 
>>which an "egg" may be physically manifested, but the "egg" itself is a 
>>logical concept, not a physical one.  It is therefore, as I said, "not 
>>merely a distribution format".  Is that any clearer?
>
>Yes. When I said "an egg", I meant "a zipfile with a .egg extension,
>or a directory with a .egg extension". In response to
>
># [...] who will quite simply need eggs for many packages.
># If Debian doesn't provide them, the users will be forced to obtain
># them elsewhere.
>
>I meant
>
>"Debian should provide the distributions, but not as .egg files";
>it should provide the distribution as a deb file. So users are provided
>with the project, but in a form that is not one of the three forms
>an egg could have.

I was referring to how the distribution is *installed*.  You don't use 
things directly from a deb file, they have to be installed on the 
system.  When you install an egg, you must use one of the three forms, or 
the system as a whole will not function.  Eggs that depend on the egg will 
not be able to find it, nor use any plugins it contains.  Eggs that define 
a plugin system of their own, will usually define self-plugins in their own 
metadata, as this is considered good style as well as being more 
convenient.  Such eggs will not function without their *own* metadata 
installed.  (Setuptools is an example of this, and I believe Trac 1.0 will 
be similar; some of the Paste projects may be using this already, too.)

So, when I say it is a contradiction in terms to install an egg in a 
non-egg form, I mean that it is nonsensical to say that you have installed 
it, because it will be unusable (by other eggs), nonfunctional (by itself), 
or both.


>>The "contradiction in terms" was that I took your meaning of "package" to 
>>be the same as my term "project" - i.e., a functional collection of 
>>Python resources.  Projects that *are* eggs, can't be provided "but not 
>>as eggs".  They *are* eggs, so not providing them as eggs means not 
>>providing them at all.
>
>I would expect that you can "unegg" a project.

For projects that make use of eggs, you expect wrong.  Try it with 
setuptools, and you will find that it is unable to even run its own tests, 
because the "test" command is registered via an entry point.  Entry points 
are just one kind of project metadata that can be registered; other 
projects like Trac and SQLObject have their own kinds of metadata as 
well.  None of this metadata is accessible without the EGG-INFO or 
.egg-info directory; removing it is like removing the JavaBean metadata or 
the deployment descriptors from Java jars, rendering the jar useless in 
many contexts, despite the fact that all the "code" remains.

The only projects that can be "unegged", then, are ones that no egg project 
depends on, and which do not themselves depend on any eggs.  The number of 
projects that are not depended on by other projects will be smaller and 
smaller over time, as will the number that do not depend on other eggs.

Hm, that reminds me.  One of the newer setuptools features for egg projects 
is automatic script generation using entry points.  A developer can 
designate a function in some module as the implementation for a script, and 
a platform-appropriate script to invoke that function is automatically 
generated during installation.  (In the case of Windows, an .exe is created 
alongside a .py or .pyw, on all other platforms it's a simple #!python 
script with no extension.)

However, these generated scripts contain only a couple of lines that invoke 
the function via the project's entry point table - which is part of its egg 
metadata.  So, if you remove the metadata, any scripts of this type that 
are installed by the project will fail to operate as well.  Since there is 
no script in the original source, you would have to manually copy 
information from the project's setup.py in order to create scripts with 
equivalent functionality.

In essence, trying to work around the absence of egg metadata is a 
bottomless pit, because over time there will be an ever-increasing amount 
of functionality in the field that is based on the use of metadata.


>You can distribute the
>project as a collection of Python modules, not as a collection of
>Python resources. The Debian developer could (and I was suggesting
>he should) just ignore the entire egg structure, and distribute
>the code of the library only.

Sure, just like you could delete the metadata files and directories from 
jar files, if you had some policy that required it.  However, this wouldn't 
make any more sense than what you're proposing here.  The projects would be 
unusable by other projects and/or nonfunctional in themselves, just like eggs.


>>>  If so, Debian should not distribute them.
>>
>>This is what I don't understand, as it has nothing to do whether or not 
>>is a distribution format, at least not that I can see.  My statement was 
>>that eggs are not merely a distribution format; they are a logical 
>>concept that can be physically packaged in various ways, and if it's 
>>necessary to invent yet another physical layout, well, we can do that too.
>
>Yes, but this logical concept is in the way of Debian 
>packages/distributions (atleast if done naively by the Debian
>developer). This is what started the entire discussion: Matthias
>Urlichs complained that Bob Tanner included the egg structure
>in the formencode Debian package/distribution.

It's in the way of not changing the policy, sure.  However, the policy's 
restriction in this case is not providing any functional benefit to 
anyone.  Eggs, on the other hand, are a functional technical construct with 
actual usefulness in the field.  To choose the policy over your users' 
needs in this case is like choosing to eat the restaurant's menu because 
the food in the pictures is more neatly arranged than the food on your 
plate.  :)


>The specific initial complaints where:
>- you can't use it with a simple "import formencode",
>- pydoc does not work on "eggs".

These are both incorrect.  First, if you install a .pth file (as easy_deb 
does, and any extra_path distutils distributions do), the first is 
moot.  Second, pydoc works fine on all varieties of eggs, with a single 
exception: it does not work with zipped packages - the modules in the 
package can be documented, but not the parent package itself.  This is a 
clear and obvious bug in pydoc (failure to update for PEP 302), and it is 
easily fixed.  Nonetheless, it is trivially avoided by using either the 
unzipped or .egg-info installation formats.

(Detail: PEP 302 specifically allows strings in a package __path__ to not 
be directories, and it also allows __path__ to be empty.  pydoc assumes 
that it is non-empty and that its first element is a directory.)


>I would add the complaint:
>- it increases sys.path for no good reason.

It is only true that it increases the length in the case of the two .egg 
forms, not the .egg-info form.

The "no good reason" part is an interesting opinion, although in my view it 
is rather narrow-minded.  Being able to support multi-version importing is 
a very good reason indeed, as is avoiding the need for a platform-specific 
package management tool in order to manage Python projects.

Of course, you can safely ignore these points if you are looking at it 
strictly from the point of view of a package management tool that doesn't 
support installing multiple versions of things.  You are blocked from these 
eminently "good reasons", however, by something that has nothing to do with 
eggs, so putting the "no good reason" on eggs is inappropriate.  There are 
quite good reasons; you are simply blocked from taking advantage of them by 
the limitations of your chosen packaging tool.

In any case, this complaint is moot in the case of the .egg-info form, 
since it does not affect the length of sys.path.


>>Which would be the same as saying you wouldn't distribute, say, 
>>setuptools itself.  Setuptools is an egg, and can't function except as an 
>>egg, because it is more than a Python package.  Again, an "egg" is some 
>>specific release of a project and its introspectable metadata.
>
>I could rewrite setuptools to function as a regular Python package.
>After a shallow inspection, there aren't many places where it really
>needs the pkg_resources functionalities for itself - I could only
>identify the part that locates cli.exe. As this is used on Windows
>only, a Debian port of setuptools could simply ignore this code.

Your "shallow inspection" is just that.  Try this experiment.  Delete the 
"setuptools.egg-info" directory, and then try to run "setup.py test" or 
"setup.py bdist_egg".  After you figure out how to fix that, and install 
your setuptools in a "non-egg" form, I encourage you to try to build and 
install SQLObject and buildutils, or any other package that adds setup 
commands to setuptools, and see whether those commands work when the 
provider is lacking its metadata.  For an encore, see if you can figure out 
how to get PasteDeploy configuration files to work - they're a format that 
allows users to deploy arbitrary WSGI applications as long as they're 
importable... and installed as an egg, with egg metadata.

Eggs (and their metadata) exist because they provide functionality that is 
not practical to provide without them, and the scope of the deployed 
functionality that relies on the metadata is increasing rather quickly.


>If "setup.py install" does other things, like editing an
>existing file, it is not so easy anymore.

I'm thinking that perhaps I should add an option like 
'--single-version-externally-managed' to the install command so that you 
can indicate that you are installing for the sake of an external package 
manager that will manage conflicts and uninstallation needs.  This would 
then allow installation using the .egg-info form and no .pth files.

The only issues remaining then are namespace packages and other 
inter-project overlaps, which of course you have to deal with 
now.  (Example: the PyDispatcher and RuleDispatch projects both contain a 
'dispatch' package, with unrelated contents.)


>>>That is not true. Usability also suffers if sys.path becomes long.
>>
>>How?  I don't understand this.
>
>People will often inspect sys.path to understand where Python
>is looking for their code.

As I pointed out, eggs give you much better information on this.  For example:

python -c "import pkg_resources; print pkg_resources.require('kid')"

[kid 0.7a 
(c:\cygwin\home\pje\chandlerstuff\chandler\release\bin\lib\site-packages\kid-0.7a-py2.4.egg), 
elementtree 1.2.6 
(c:\cygwin\home\pje\chandlerstuff\chandler\release\bin\lib\site-packages\elementtree-1.2.6-py2.4.egg)]

I get the versions along with the paths, and the versions and paths of all 
dependencies.  This information is not available in a cross-platform way 
without eggs.  (And again, I mean the logical egg, not the .egg format; the 
above command would've listed any projects in .egg-info format as well as 
.egg files and directories.)


>>What I would suggest here is having a namespace (e.g. pyegg2.4-whatever) 
>>for naming packages based on their PyPI names, so that there can be an 
>>automated relationship between setuptools dependencies and Debian ones.
>
>That would be a policy change (I think). Whether it would be agreeable,
>I have no idea.

I understand that, on both points.  I was simply suggesting it would be 
useful, not trying to debate what the policy currently is.


>>Anyway, I don't see any obvious reasons why this can't be an automated 
>>process, even for the system library dependencies.  easy_deb even has a 
>>simple configuration file that can augment the setuptools-style 
>>dependencies with explicit Debian dependencies.
>
>Debian policy currently seems to require that the dependencies are
>provided as plain text in a patch to the upstream sources(*). So the
>idea certainly is that dependencies are managed by the developer,
>not automatically.

I'm only interested in what's helpful or useful to Debian developers and 
users, not what the current policy is.  Policies tend to adapt to fit 
things that are useful, or else they become more of a drawback than a 
benefit.  I mention these things because they may allow the process and 
policy to be improved, to everyone's benefit.

If the policy doesn't change, however, then it should suffice to use 
.egg-info format to allow the distribution of egg projects as Debian 
packages conforming to the existing policy, assuming the policy does not 
prohibit including non-package directories in site-packages.  The fact that 
.egg-info packaging may inconvenience packagers is a pain caused by the 
policy, however, not by eggs.  I do intend, though, to update setuptools 
and easy_install to make using .egg-info form easier, and I will probably 
also fix it so that running e.g. bdist_rpm on a setuptools-based package 
will produce an .egg-info format egg wrapped in an RPM.

I remain concerned about how such packages will work with namespace 
packages, since namespace packages mean that two different distributions 
may be supplying the same __init__.py files, and some package managers may 
not be able to deal with two system packages (e.g. Debian packages, RPMs, 
etc.) supplying the same file, even if it has identical contents in each 
system package.



More information about the Distutils-SIG mailing list