[Distutils] setuptools-0.4a2: Eggs, scripts, and __file__

Ryan Tomayko rtomayko at gmail.com
Tue Jun 14 00:00:56 CEST 2005


On Jun 13, 2005, at 12:21 PM, Phillip J. Eby wrote:
> Are you distributing an application, or a library?  If you're  
> distributing a library, you don't need require() in library code.   
> If it's an application, you can handle your own dependencies by  
> force-installing the eggs in the application script directory using  
> EasyInstall, and then just require() your main package.

Libraries, applications, and libraries with helper scripts.

>> This should give me all of the benefits of
>> eggs when I'm using them and fallback to the old-style manual
>> dependency management otherwise. Does that make sense?
>
> Um, yeah, except I don't think you really need to fall back, just  
> because people have other stuff installed.  The worst that's going  
> to happen is that you're going to force reinstallation of  
> dependencies they already have, just to get them into eggs.  (Or  
> make them create .egg-info directories to tell the system the stuff  
> is already installed.)

That's the problem. I'm trying to figure out a general plan of attack  
that Linux/BSD package maintainers can adopt for python packages that  
want to use eggs / setuptools. Here's some numbers on how many python  
packages are included in a couple different distributions:

On a OS X darwinports box:

     $ port list | grep -e '^py-' | wc -l
     237

On a Fedora 3 box (Core + Extras):

     $ yum list all | grep -e '^py' -e 'python' | wc -l
     78

I don't have any debian or gentoo boxes handy but I imagine they'd  
weigh in somewhere around the darwinports number.

None of these packages are currently provided as eggs or with .egg- 
info directories when they are installed to site-packages and they  
have complex dependency relationships that are managed by the  
distribution's utility (port, yum, apt-get, emerge, etc.) This  
creates a problem for these packages because it means that they can  
not assume dependencies will always be egg managed. If they start  
adding require() calls to their scripts, they will break under these  
environments. require() is an all or nothing proposition for  
distributions and that means there will need to be a planned  
"upgrade" period or something for all packages.

As a more specific example, I contribute to two packages that are  
distributed with Fedora Core: python-urlgrabber and yum. yum depends  
on python-urlgrabber and python-elementtree. Now, if I wanted to move  
yum to be egg based and use require(), I would also need to ensure  
that all yum's dependencies are egg based. When yum (and its  
dependencies) are installed from RPM, they must all be in egg format  
(or at least provide .egg-info dirs). If not, the yum script will fail.

So a single package using require() can cause a snowball effect where  
many other packages would need to be upgraded to egg format as well.  
In time, this may be a good thing because it could accelerate  
adoption of eggs but for the time being it makes it really hard to  
use require().

> Hm.  What if you created .egg directories and symlinked the  
> dependencies into them during your installation process?  Or if  
> there were some way to create the .egg-info directories  
> automatically from packaging system databases, or from inspecting  
> module contents?  Setuptools has some code that can look for the  
> setting of constants or presence of symbols in modules, without  
> importing them.  Perhaps I could extend this somehow so that a  
> transitional package like yours could include additional info in  
> the setup script, that checks for these dependencies and tags them  
> somehow?

Yes, yes. Along those lines.

> Or maybe this could be done by metadata -- you put a legacy.py file  
> in your egg-info, and when processing your egg's dependencies, if  
> pkg_resources can't find a package you need, it would call a  
> function in legacy.py that would check for the dependency using  
> setuptools' inspection facilities, and return a path and a guess at  
> a version number.
>
> How does that sound?

That would solve my problem perfectly.

>>  If that's the case, I might be
>> able to make my scripts as simple as::
>>
>>     from pkg_resources import require, find_distributions
>>     if list(find_distributions('MyPackage')):
>>
>
> Don't do this.  find_distributions() yields distributions found in  
> a directory or zipfile; it doesn't take a package name, it takes a  
> sys.path entry.

Ahhh..

> I'm not sure exactly what you're trying to do here.  If you just  
> want to know if your script is running from a development location  
> (and therefore needs to call require() to set up dependencies),  
> couldn't you just check for 'MyPackage.egg-info' in the sys.path[0]  
> (script) directory?
>
> e.g.:
>
>     import sys, os
>     if os.path.isdir(os.path.join(sys.path[0],"MyPackage.egg-info")):
>         from pkg_resources import require
>         require("MyPackage")  # ensures dependencies get processed
>
> If this is what you want, perhaps we can create a standard recipe  
> in pkg_resources, like maybe 'script_package("MyPackage")', that  
> only does the require if you're a development egg and not being run  
> from run_main().
>
>> The downside to this approach is that I would have to be sure to NOT
>> distribute MyPackage.egg-info with RPMs and other packages, which
>> kind of rules out any phased approach to bringing egg based packaging
>> to Fedora's stock RPMs.
>>
>
> I don't know if .egg-info is a good idea for RPMs.  .egg-info is  
> primarily intended for *development*, not deployment, because you  
> can't easily override a package installed with .egg-info in site- 
> packages.  In fact, the only way you can normally override it is to  
> install an egg alongside the script.
>
> My current idea for how RPMs and other packagers should install  
> eggs is just to dump them in site-packages as egg files or  
> directories, and let people use require() or else use EasyInstall  
> to set the active package.

Right. But that's going to require significant lobbying and effort to  
get all core python packages included in a distribution migrated over  
to eggs. This is the root of my dilemma. The only realistic approach  
seems to be supporting dual egg and non-egg deployment for a little  
while. The legacy.py proposal is one method and I threw out a couple  
earlier. But yea.. I think this is the root of the problem.

> Hey, wait a second...  if you can put install/uninstall scripts in  
> packages, couldn't installing or uninstalling an RPM ask  
> EasyInstall to fix up the easyinstall.pth file?  This would let  
> packagers distribute as eggs, but without breaking users'  
> expectations that the package would be available to "just import".   
> If somebody explicitly wants to support multiversion for a package,  
> they can run 'EasyInstall -m PackageName' to reset it to multi- 
> version after installing a new version.

I think that would be great but it assumes modification to a whole  
lot of existing packages. Ideally, I'd like to be able to start using  
eggs and require() in my packages without having to convince everyone  
else to do so just yet. Not that I won't be trying to convince  
people, I just don't want to rely on other packages being eggified in  
Fedora Core, darwinports, etc for upcoming releases.

> EasyInstall doesn't have everything that's needed to do this yet  
> (no "uninstall" mode), but perhaps it's a good option to add, and  
> then packagers could standardize on this approach.

I'd be happy to advocate to / work with packagers once we get a basic  
set of best practices together. It seems like there are a lot of  
options here - we just need to iron out the details.

Thanks,

Ryan Tomayko
                                  rtomayko at gmail.com
                                  http://naeblis.cx/rtomayko/




More information about the Distutils-SIG mailing list