[Distutils] formencode as .egg in Debian ??

Fri Nov 25 07:33:05 CET 2005

At 12:54 PM 11/25/2005 +1100, David Arnold wrote:
>So, if a system package, shipped by the upstream developer as an egg, is
>"unpacked" into a directory structure, and its metadata is maintained
>in a .egg-info file somewhere in sys.path, non-system eggs will have all
>they need to operate correctly?

Yes, with a few clarifications.  The internal structure of an egg, let's 
say foobar-1.2-py23.egg, would look something like:

     foobar/
            __init__.py
            baz.py
            # plus .pyc files, etc.

     EGG-INFO/
              PKG-INFO  # distutils metadata like description/version
              requires.txt   # optional and required dependencies
              # plus other metadata files, either setuptools-defined or
              # project specific

If you unpack this as-is, but rename EGG-INFO to foobar.egg-info (today) or 
foobar-1.2.egg-info (when I release 0.6a9 of setuptools), and the whole 
tree above is in a directory on sys.path, this egg is good to go.

I would like to clarify the phrase "shipped as an egg", though.  To me, 
that would mean that the developer is distributing a binary .egg file, and 
I'm assuming that Debian is primarily interested in *source* packages, 
being a Free Software distribution.  (A binary .egg doesn't have to contain 
source code at all; you can specifically build it with the source stripped 
if you desire.)  The plan for setuptools 0.6a9 is to provide an option to 
"setup.py install" that will basically install the layout described above, 
with the correctly named .egg-info directory automatically 
created.  (Normally, the whole tree above is instead nested in an .egg file 
or directory.)

I think I should also clarify that whether the upstream developer sets out 
to package their project as an egg or not, it's possible to create an 
.egg-info directory and PKG-INFO file to identify that distribution, using 
setuptools' "easy_install" program and the source distribution.  So if the 
developer of 'foobar' did not choose to create an egg or use setuptools, 
this doesn't stop a developer who wants to *use* foobar from simply running 
easy_install to create an .egg file for it.  So, this is what I mean when I 
say there's no such thing as a non-egg package for an egg 
developer.  Someone who depends on a package can simply say they depend on 
it, and when they build their package, they'll get eggs for their 
dependencies as a side effect.

>So that's another goal of eggs?  To provide information to a package
>maintainer to assist in determining if it's the user's PYTHONPATH or
>.pth files that are causing a bug?

More specifically, what versions of what packages they're *actually* using, 
as opposed to what they think they installed or have on their 
system.  PYTHONPATH and .pth files can of course be a factor in that, but 
also just people thinking they installed something, or not knowing that a 
bug is fixed in a particular version.  Part of it too is finding out 
whether they're reporting a regression or whether they're just still using 
a version that has a bug that's been fixed.  In the case of the TurboGears 
mailing list, it's often been the case that TurboGears users flush out a 
bug in a dependency, which then gets fixed, but then a new TurboGears user 
maybe reports the same problem, and then it's obvious from their error 
message whether or not they upgraded.

I realize this is stuff you guys probably do all day for system packages, 
but eggs make the support job easier upstream too.

>I can see that this is *nice*; I'd debate "need".  But I'm happy to
>accept that for egg-based stuff, this is a nice feature.

Well, need is relative.  A project like TurboGears "needs" this, because 
otherwise it would be uneconomical to provide the current level of support 
on as many platforms.  So, one project's "nice to have" may be another 
project's lifeblood, depending on available resources.  They've also made 
it easier for the authors of TurboGears' dependencies to assist in support 
as well.  For me, I'm glad that these features have helped to make 
something like TurboGears possible and practical.

>I'm not going to try to assert "Unix values" here.  My observation is
>that historically, Unix has installed things into one of a couple of
>directory hierarchies (/usr, /usr/local, /opt).  Within those
>hierarchies, there has been scope for only one version of any given
>thing.

Um, sure.  Not sure what this has to do with the present discussion.  As a 
practical matter, only *one* version of an egg can be *active* (i.e. 
importable) on sys.path within a given process anyway.  It's also clearly 
not going to be the case on a Debian system that somebody would have 
multiple versions of something living in /usr/lib, although they might do 
it for /usr/local or in a user-private directory.

So, I think maybe I lost the train of thought on this point here.  I was 
under the impression that the consensus of the Debian-Python folks so far 
was that of any egg format, the "single version externally managed" one 
using .egg-info directories was preferred, since it is basically the same 
as your current layout.  (It's also convenient for me to implement, because 
it's basically the same as the format already used by the "setup.py 
develop" command for temporarily adding a project's source checkout to 
sys.path.)

>   Phillip> And we'd like all this to cleanly work with any
>   Phillip> locally-installed non-Debian eggs that might be in the mix,
>   Phillip> since we need to do development, beta testing, etc.
>
>   >>  And non-egg packages as well, right?
>
>   Phillip> There isn't any such thing, from an egg developer's
>   Phillip> perspective.
>
>Really?  So if I use one egg, everything has to be an egg?

I'm not sure I follow you.  If I'm an egg developer, and I want to use 
other Python packages in my project, I add their project names and versions 
to my setup.py, and then I get them installed for free.  If an .egg-info on 
sys.path indicates that the project I want is already on my system, then 
the tools don't go hunting on PyPI and the runtime doesn't gripe about 
missing dependencies.

Note again that the dependencies *don't* need to be distributed as 
eggs.  They can be distributed as source, eggs, .exe installers (Windows 
only), or Subversion URLs, as long as either PyPI has a usable link, or if 
I supply one in my project configuration.  These dependencies' authors 
don't even need to have heard of the concept of eggs, they just need a 
reasonably-standard Python distutils package with a setup.py.

Thus, if I'm developing an egg, yes, all my dependencies have to be eggs, 
but this doesn't imply that I'm pushing eggification upstream, it just 
means that I can install their package as an egg locally, which essentially 
amounts to adding the PKG-INFO file in either an EGG-INFO or .egg-info 
directory.  (The distutils normally generate this PKG-INFO file as part of 
creating a source distribution, so it's not even an egg-specific file format.)

So, projects using setuptools get to take advantage of most any project 
using distutils, and the upstream projects are modified only by adding the 
egg-info, in order to allow the tools and runtime to know when a dependency 
has already been satisfied.

While I don't advocate changing all Debian Python packages to add this 
metadata, I do suggest it's a practical way to deal with certain dependency 
issues.  For example, TurboGears depends on ElementTree, which is not 
packaged as an egg by its author.  (I think that Kid, which is also an 
egg-packaged TurboGears dependency, may depend on ElementTree as 
well.)  Anyway, the quickest way to get all this stuff working without a 
lot of hacks to the dependency metadata would be to install an .egg-info 
marker with the ElementTree package, so that the egg tools and runtime on 
any user's machine will simply know what version of ElementTree is present, 
and be happy.

I know - you can think of other ways to deal with this.  However, most of 
the ways that have been suggested to date fail in the use case where a user 
has been using the Debian package, and Kevin moves to requiring a new 
version of ElementTree or some other dependency, perhaps a new SVN revision 
that hasn't been released -- foobar-1.3.dev-r4262, let's say.  (Setuptools 
users can have their builds tagged with a repository revision 
number.)  This release of foobar isn't going to be in Debian unless you're 
tracking subversion revisions of experimental projects daily - and maybe 
you are, I don't know.  The point is that when the Debian package no longer 
satisfies the dependency, the egg tools move smoothly to downloading and 
installing wherever the user has configured their development environment 
to install it, say their ~/pydev directory.  So now we've segued smoothly 
into "multiple versions" being installed, but the "system version" is still 
intact.

A month later, a stable package is released and I upgrade my Debian 
install.  This is a later version than the development version I have in 
~/pydev, so the egg tools switch back to that as the preferred version 
unless I have a .pth file specifically requesting activation of the ~/pydev 
version as the active version for the other work I'm doing.  (And even then 
it'll still prefer the Debian version if I don't have a ~/pydev version 
that satisfies something's dependency.)

These transitions can only be so seamless if the Debian-installed version 
of foobar includes the egg-info marker so that the tools know what version 
is sitting in /usr/lib, as opposed to the version(s) I have hanging in my 
~/pydev.

>   Phillip> Any distutils package can be made into an egg, because all of
>   Phillip> the metadata needed is supplied by the standard distutils
>   Phillip> setup script.  So, if you have the source, you can make it an
>   Phillip> egg.
>
>What if I don't have the source (or setup.py) ?

What do you have instead?  There really aren't many formats for shipping 
binary Python packages.  The only ones provided by the distutils are 
bdist_dumb, bdist_wininst, and bdist_rpm.  It seems to me that all of these 
formats except bdist_dumb include enough metadata to be able to get the 
project name and version, which is all you need to create enough metadata 
to make a usable egg.  The "easy_install" tool actually supports turning 
bdist_wininst packages into eggs directly.  I'm not sure if you could do it 
with a bdist_dumb.  A bdist_rpm probably has most of what you need just in 
the filename alone, at least if you're doing it manually.  (Distutils-built 
distributions' filenames are too ambiguously formatted for automated 
parsing, alas, even though a human reader can usually tell what they mean.)

Anyway, all you need to make a non-egg package into an egg is its project 
name and version number.  If you have those two things, you can make a 
PKG-INFO file, and that's all you need for today's egg runtime.  For 0.6a9, 
you won't even need to put the data in a file, just the filename.

>Accepting that there will be parallel (I hesitate to say "competing")
>systems, and that keeping them in sync is both hard and necessary seems
>to be the open issue.

I think this may actually be an illusion, perhaps brought about by 
preconceptions based on experiences with other packaging systems.  All we 
need is that:

1. For Debian packages of setuptools-using packages (i.e., projects like 
FormEncode that explicitly set themselves up to be eggs), all the included 
metadata is installed in an .egg-info directory alongside the 
package.  This is nothing more than including all the package's required 
contents, so there's no "parallel" anything going on here.

2. For Debian packages of non-setuptools packages, that are a dependency of 
a setuptools-using package, add an empty .egg-info file named for the 
dependency's project name and version number, as specified in its setup.py 
name/version options.  This is just a simple addition to the packaging, and 
again doesn't seem to create any "parallel" anything.  You do not need to 
go back and repackage every single Debian-Python package unless you feel 
that that's a more efficient way to handle it.  You can simply add the 
.egg-info on an as-needed basis, when you package setuptools-using projects.

Now, there is the separate issue of whether you want to create a separate 
pyegg or python-pypi namespace for these packages, so that you can keep a 
closer match between package names and PyPI project names.  That's for you 
guys to decide, as that's a matter of policy and process.  But I don't see 
anything forcing you to make such a split, so again I don't get the 
"parallel" part.