[Distutils] formencode as .egg in Debian ??
Phillip J. Eby
pje at telecommunity.com
Wed Nov 23 21:29:17 CET 2005
At 08:12 PM 11/23/2005 +0100, Matthias Urlichs wrote:
>Hi,
>
>Phillip J. Eby:
> > I'm thinking that perhaps I should add an option like
> > '--single-version-externally-managed' to the install command so that you
> > can indicate that you are installing for the sake of an external package
> > manager that will manage conflicts and uninstallation needs. This would
> > then allow installation using the .egg-info form and no .pth files.
> >
>You might shorten that option a bit. ;-) I agree that this would be a
>good option to have.
I try to use very long names for options that can have damaging effects if
used indiscriminately. A project that's installed the "old-fashioned way"
(which is what this does, apart from adding .egg-info) is hard to uninstall
and may overwrite other projects' files. So, it is only safe to use if the
files are being managed by some external package manager, and it further
only works for a single installed version at a time. So the name is
intended to advertise these facts, and to discourage people who are just
reading the option list from trying it out to see what it does. :)
> > >People will often inspect sys.path to understand where Python
> > >is looking for their code.
> >
> > As I pointed out, eggs give you much better information on this.
>
>The .egg metadata does. That, as you say, is distinct from the idea of
>packaging the .egg as a zip file. Most likely, one that includes .pyc
>files which were byte-compiled with different file paths; That causes no
>problems whatsoever ... until you get obscure ideas like trying to step
>through the code with pdb, or opening it in your editor to insert an
>assertion or a printf, trying to figure out why your code breaks. :-/
This is actually what the .egg-info mode was designed for. That is, doing
development of the project. A setuptools-based project can run "setup.py
develop" to add the project's source directory to sys.path, after
generating an .egg-info directory in the project source if necessary. This
allows you to do all your development right in your source checkout, and of
course all the file paths are just fine, and the egg metadata is available
at runtime. You can then deploy the project as an .egg file or directory.
(Also, for the .egg directory format, note that easy_install recompiles the
.pyc/.pyo files so their paths *do* point to the .egg contents instead of
the original build paths. The issues with zipfiles and precompiled .pyc
files are orthogonal to anything about setuptools, eggs, etc.; they will
bite you in today's Python no matter what's in the zipfile or who
precompiled the .pyc files. I do have some ideas for fixing both of these
problems in future versions of Python, but they're rather off-topic for all
the lists we are currently talking on.)
>That's not exactly negotiable. Debian has a packaging format which
>resolves generic installation dependencies on its own. Therefore it
>cannot depend on Python-specific .egg metadata. Therefore we need a way
>to translate .egg metadata to Debian metadata.
Yes, that's precisely what I was suggesting would be helpful. As Vincenzo
already mentioned, the egg metadata is a good starting point for defining
the Debian metadata. I'm obviously not proposing changing Debian's
metadata system. Well, maybe it wasn't *obvious* that I wasn't proposing
that, but in any case I'm not. :)
> > I remain concerned about how such packages will work with namespace
> > packages, since namespace packages mean that two different distributions
> > may be supplying the same __init__.py files, and some package managers may
> > not be able to deal with two system packages (e.g. Debian packages, RPMs,
> > etc.) supplying the same file, even if it has identical contents in each
> > system package.
> >
>Debian packaging has a method to explicitly rename a different package's
>file if it conflicts with yours ("dpkg-divert"; it does _not_ depend on
>which package gets installed first). IMHO that's actually superior
>randomly executing only one of these files, since you are aware that
>there is a conflict (the second package simply doesn't install if you
>don't fix it), and thus can handle it intelligently.
The two kinds of possible conflicts are namespace packages, and
project-level resources.
A namespace package is more like a Java package than a traditional Python
package. A Java package can be split across multiple directories or jar
files; it doesn't have to be all in one place. Thus you can have lots of
jars with org.apache.* classes in them.
Python, however, requires packages to have an __init__.py file, and by
default the entire package is assumed to be in the directory containing the
__init__.py file. However, as of Python 2.3, the 'pkgutil' module was
introduced in the Python standard library which allowed you to create a
Java-style "namespace package", automatically combining package directories
found on different parts of sys.path. So, if in one sys.path directory you
had a 'zope.interface' package, and in another you had a 'zope.publisher'
package, these would be combined, instead of the first one being treated as
if it were all of 'zope.*', and the second being completely
ignored. However, *each* of the subpackages needs its own zope/__init__.py
file for this to work.
So, the issue here is that if you install two projects that contain zope.*
packages into the *same* directory (e.g. site-packages), then there will be
two different zope/__init__.py files installed at the same location, even
though they will have the same content (a short snippet of code to activate
the namespace mechanism via the pkgutil module or via setuptools'
pkg_resources module).
To date, there are only a small number of these namespace packages in
existence, but over time they will represent a fairly large number of
*projects*. As I go through the breakup of the PEAK meta-project into
separate components, I expect to have a dozen or so projects contributing
to the peak.* and peak.util.* namespace packages. Ian Bicking's Paste
meta-project has a paste.* namespace package spread out in two or three
subprojects so far. There has been some off-and-on discussion about
whether Zope 3 will move to eggs instead of their own zpkg tool (which has
issues on Windows and Mac OS that eggs do not), and in that case they will
likely have a couple dozen components in zope.* and zope.app.*.
So, for the long-term solution of wrapping Python projects in Debian
packages, the namespace issue needs to be addressed, because renaming each
project's zope/__init__.py or whatever isn't going to work very
well. There has to be one __init__.py file, or else such projects need to
be installed in their own .egg directories or zipfiles to avoid collisions.
The second collision issue with --single-version-externally-managed is
top-level resource collisions. Some existing projects that are not
egg-based manipulate their install_data operation in such a way that they
create files or directories in site-packages directly, rather than inside
their own package data structures. Setuptools neither encourages nor
discourages this, because it doesn't cause any problems for any egg layout
except the .egg-info one -- and the .egg-info one was originally designed
to support development, not deployment. In the development scenario, any
such files are isolated to the source tree, and for deployment the .egg
file or directory keeps each projects' contents completely isolated.
So, what I'm saying is that putting all projects in the same directory (as
all "traditional" Python installations do) has some inherent limitations
with respect to namespace packages and top-level resources, and these
limitations are orthogonal to the question of egg metadata. The .egg
formats were created to solve these problems (including clean upgrades,
multi-version support, and uninstallation in scenarios where a package
manager isn't usable), and so the other features that they enable will be
increasingly popular as well.
In other words, as people make more use of PyPI (because they now really
*can*), more people will put things on PyPI, and the probability of package
name conflicts will increase more rapidly. The natural response will be a
desire to claim uber-project or organizational names (like paste.*, peak.*,
zope.*, etc.) putting individual projects under sub-package names. (For
example, someone has already argued that I should move RuleDispatch's
'dispatch' package to 'peak.dispatch' rather than keeping the top-level
'dispatch' name all to myself.)
So, I'm just saying that using the --single-version-externally-managed
approach requires that a package manager like Debian grow a way to handle
these namespace packages safely and sanely. One possibility is to create
dummy packages that contain only the __init__.py file for that namespace,
and then have the real packages all depend on the dummy package, while
omitting the __init__.py. So, perhaps each project containing a
peak.util.* subpackage would depend on a 'python2.4-peak.util-namespace'
package, which in turn would depend on a 'python2.4-peak-namespace'
package. It's rather ugly, to say the least, but it would work as long as
upstream developers never put anything in namespace __init__.py files
except for the pkg_resources.declare_namespace() call.
(By the way, since part of an egg's metadata lists what namespace packages
the project contains code or data for, the generation of these dependencies
can be automated as part of the egg-to-deb conversion process.)
Or, of course, the .egg directory approach can also be used to bypass all
collision issues, but this brings sys.path and .pth files back into the
discussion. On the other hand, it can possibly be assumed that anything in
a namespace package can be used only after a require() (either implicit or
explicit), so maybe the .pth can be dropped for projects with namespace
packages. These are possibilities worth considering, since they avoid the
ugliness of creating dummy packages just to hold namespace __init__.py files.
More information about the Distutils-SIG
mailing list