[Distutils] setuptools in a cross-compilation packaging environment

Phillip J. Eby pje at telecommunity.com
Fri Oct 7 19:11:58 CEST 2005


At 02:01 PM 10/7/2005 +0200, M.-A. Lemburg wrote:
>Sorry, maybe I wasn't clear: a package builder needs
>to *build* a package (rpm, egg, .tar.gz drop in place
>archive, etc.) without the dependency checks.

bdist_egg simply builds an egg.  Dependency checking is a function of 
*installing* the egg, not building it.


>For the user to be able to turn off the dependency checks
>when installing an egg using an option is also an often
>needed feature.

Yes, and it has been on my to-do list for some time.  However, the majority 
of packages in eggs today don't have any dependencies declared anyway, 
because they're not packages that use setuptools.  So the option, if it 
existed, wouldn't have been very useful until quite recently.  In any case, 
the main refactoring I needed to do before that option could be added is 
done, so I'll probably add it in the next non-bugfix release.


>  rpm often requires this when you want
>to install packages in different order, in automated
>installs or due to conflicts in the way different
>packages name the dependencies. I guess, eggs will
>exhibit the same problems over time.

I'm not sure I follow you here, but in any case there's nothing stopping 
people from installing eggs by just dropping them in a directory on 
sys.path without doing any installation steps at all.  It's only if you 
want the egg to be on sys.path at startup without manually munging 
PYTHONPATH or a .pth file or calling require(), or if you want to install 
any scripts that you need to run easy_install on the egg.


> > There is a simple trick that packagers can use to make their legacy
> > packages work as eggs: build .egg-info directories for them in the 
> sys.path
> > directory where the package resides, so that the necessary metadata is
> > present.  This does not require the use of .pth files, but it does slow
> > down the process of package discovery for things that do use pkg_resources
> > to locate their dependencies.  It also still requires them to repackage
> > existing packages, but doesn't require changing the layout.
>
>Where would you have to put these directories and what
>do they contain ?

You put them in the directory where the unmanaged packages are 
installed.  At minimum, they contain a PKG-INFO file, and if the package 
ordinarily uses setuptools, they should also contain whatever else the 
egg's EGG-INFO directory contained.  The directory name is 
ProjectName.egg-info, where ProjectName is the project's name on PyPI, with 
non-alphanumerics condensed by the pkg_resources.safe_name() function.


>I must admit that I haven't followed the discussions about
>these .egg-info directories. Is there a good reason not to
>use the already existing PKG-INFO files that distutils builds
>and which are used by PyPI (aka cheeseshop) ?

I don't know if there's such a reason or not, but in any case that's what 
we use as part of the egg-info directories.  However, we *also* allow for 
unlimited metadata resources to be provided in egg-info, as this is what 
allows us to carry things like plugin metadata and scripts in the 
egg.  There are other metadata files listing the C extensions in the 
package, the "namespace packages" that the egg participates in, and so on.


>Hmm, you seem to be making things unnecessarily complicated.

That probably just means you're not familiar with the requirements.  My 
first post here about the issues was about this time last year, discussing 
application plugins and their packaging.  The use of eggs for general 
Python libraries as well as plugins only came into play this January, at 
Bob Ippolito's urging.  So, while there may potentially exist solutions 
that might be somewhat simpler for certain kinds of Python library 
packaging, they don't even begin to address the issues for application 
plugin packaging, which is the raison d'etre of eggs.  Trac, for example, 
lets you simply drop eggs into a plugin directory in order to use them.  At 
some point, Chandler should be allowing this as well, and maybe someday 
Zope will support it too.  It's primarily for these use cases that eggs 
exist; it just so happens that they make a fine way to manage installed 
Python packages as well.


>Why not just rely on the import mechanism and put all
>eggs into a common package, e.g. pythoneggs ?!
>Your EasyInstall script could then modify a file in that
>package called e.g. database.py which includes all the
>necessary information about all the installed packages
>in form of a dictionary.

You completely lost me.  A major feature of eggs is that for an application 
needing plugins, it can simply scan a directory of downloaded eggs and plug 
them into itself.  Having a required installation mechanism other than 
"download the egg and put it here" breaks that.

What's more, putting them in a single package makes it impossible to have 
eggs installed in more than one directory, since packages can't span 
directories, at least not without using setuptools' namespace package 
facility.  And using that facility would mean the runtime would have to 
always get imported whenever you used an egg - which is *not* required 
right now unless you're using a zipped egg with a C extension in it.  And 
even then the runtime only gets imported if you actually try to import the 
C extension.  So, it seems to me your approach creates more I/O overhead 
for using installed packages.

Finally, don't forget that eggs allow simultaneous installation of multiple 
versions of a package.  So, you'd *still* have to have sys.path manipulation.


>This would have the great advantage of allowing introspection
>without too much fuzz and reduces the need to search paths,
>directories and so-on which causes a lot of I/O overhead
>and slows down startup times for applications needing
>to check dependency requirements a lot.

And the disadvantage of absolutely requiring install/uninstall steps, which 
is anathema.  Note that with the exception of .egg-info markers (which 
aren't really intended for production use, anyway, they're a feature for 
deploying packages under development without needing to build a "real" 
egg), eggs can be fully introspected from their *filename* for dependency 
processing purposes.  So, if the needed eggs are all on sys.path already, 
no additional I/O gets done.  Identifying all the eggs available in a given 
directory is one listdir() operation, but it only happens if a suitable 
package isn't already on sys.path, and the listdir()s happen at most once 
during a given instance of dependency processing.


> >>Please make sure that your eggs catch all possible
> >>Python binary build dimensions:
> >>
> >>* Python version
> >>* Python Unicode variant (UCS2, UCS4)
> >>* OS name
> >>* OS version
> >>* Platform architecture (e.g. 32-bit vs. 64-bit)
> >
> >
> > As far as I know, all of this except the Unicode variant is captured in
> > distutils' get_platform().  And if it's not, it should be, since it 
> affects
> > any other kind of bdist mechanism.
>
>Agreed.
>
>So you use get_platform() for the egg names ?

Yes - except on Mac OS X, which has a changed platform string.


> >>and please also make this scheme extendable, so that
> >>it is easy to add more dimensions should they become
> >>necessary in the future.
> >
> > It's extensible by changing the get_platform() and compatible_platform()
> > functions in pkg_resources.
>
>Ah, that's monkey patching. Isn't there some better way ?

Well, my presumption here is that we're going to get the scheme right for 
Python at large, and make it standard.  Are you saying that some packages 
should have their own scheme?  That's not really workable since in order to 
import the package and use its scheme, we would have to first know that the 
package was compatible!


> > If you have suggestions, please make them known, and let's get them into
> > the distutils in general, not just our own offshoots thereof.
>
>This is what we use:
>
>def py_version(unicode_aware=1, include_patchlevel=0):
>
>[snip]
>The result is a build system that can be used to build
>all binaries for a single platform without getting
>conflicts and binaries that include a proper platform
>string, e.g.
>
>egenix-mxodbc-zopeda-1.0.9.darwin-8.2.0-Power_Macintosh-py2.3_ucs2.zip
>egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs2.zip
>egenix-mxodbc-zopeda-1.0.9.linux-i686-py2.3_ucs4.zip

eggs put the Python version before the platform, because "pure" eggs that 
don't contain any C code don't include the platform string.  We also don't 
have a UCS flag, but if we did it should be part of the platform string 
rather than the Python version, since "pure" eggs don't care about the UCS 
mode, and even if they did, that'd be a requirement of the package rather 
than the egg itself being platform specific.


> > A single .pth file is certainly an option, and it's what easy_install
> > itself uses.
>
>Fair enough.
>
>Could this be enforced and maybe also removed
>completely by telling people to add the egg directory to
>PYTHONPATH ?

If by "egg directory" you mean a single .egg directory (or zipfile) for a 
particular package, then yes, for that particular package you could do 
that.  If you mean, can you just put the directory *containing* eggs on 
PYTHONPATH, then the answer is no, if you want the package to be on 
sys.path without any special action taken (like calling 
pkg_resources.require()).


>Note that the pythonegg package approach would pretty much
>remove the need for these .pth files.

Only in the sense that it would require reinventing them in a different 
form.  :)



More information about the Distutils-SIG mailing list