[Python-Dev] [Distutils] Capsule Summary of Some Packaging/Deployment Technology Concerns

Wed Mar 19 01:27:01 CET 2008

Phillip J. Eby wrote:
> At 05:10 PM 3/17/2008 -0500, Jeff Rush wrote:
>   
>
>> 1. Many felt the existing dependency resolver was not correct.  They wanted a
>>     full tree traversal resulting in an intersection of all restrictions,
>>     instead of a first-acceptable-solution approach taking now, which can
>>     result in top-level dependencies not being enforced upon lower 
>> levels.  The
>>     latter is faster however.  One solution would be to make the resolver
>>     pluggable.
>>     
>
> Patches welcome, on both counts.  Personally, Bob and I originally 
> wanted a full-tree intersection, too, but it turned out to be hairier 
> to implement than it seems at first.  My guess is that none of the 
> people who want it, have actually tried to implement it without a 
> factorial or exponential O().  But that doesn't mean I'll be unhappy 
> if somebody succeeds.  :)
>   

I think we'd make significant progress by just intersecting the 
dependencies we know about as we progress through the dependency tree.  
For example, if A requires B==2 and C==3, and if B requires C>=2,<=4, 
then at the time we install A we'd pick C==3 and also at the time we 
install B we'd pick C==3.   As opposed to the current scheme that would 
choose C==4 for the latter case.   This would allow dependent projects 
(think applications here) to better control the versions of the full set 
of libraries they use.   Things would still fail (like they do now) if 
you ran across dependencies that had no intersection or if you 
encountered a new requirement after the target projected was already 
installed.

If you really wanted to do a full-tree intersection, it seems to me that 
the problem is detecting all the dependencies without having to spend 
significant time downloading/building in order to find them out.   This 
could be solved by simply extending the cheeseshop interface to export 
the set of requirements outside of the egg / tarball / etc.  We've done 
this for our own egg repository by extracting the appropriate meta-data 
files out of EGG-INFO and putting it into a separate file.  This info is 
also useful for users as it gives them an idea of how much *new* stuff 
is going to be installed (a la yum, apt-get, etc.)

> In other words, we attempt to achieve heuristically what's being 
> proposed to do algorithmically.  And my guess is that whatever cases 
> the heuristic is failing at, would probably not be helped by an 
> algorithmic approach either.  But I would welcome some actual data, either way.
>   

With our ETS projects, we've run into problems with the current 
heuristic.  Perhaps we just don't know how to make it work like we want? 

We have a set of projects that we want to be individually installable 
(to the extent that we limit cross-project dependencies) but we also 
want to make it easy to install the complete set.  We use a meta-egg for 
the latter.  It's purpose is only to specify the exact versions of each 
project that have been explicitly tested to work together -- you could 
almost think of it as a source control system tag.  Whereas on the 
individual projects, we explicitly want to ensure that people get the 
latest possible release of each required API so the version requirements 
are wider here.   This setup causes problems whenever we release new 
versions of projects because it seems easy_install ignores the meta-egg 
exact versions when it gets down into a project and comes across a wider 
cross-project dependency.   We ended up having to give up on the ranges 
in the cross-project dependencies and synchronize them to the same 
values in the meta-egg dependencies.   There are numerous side-effects 
of this that we don't like but we haven't found a way around it.

> Again, though, patches are welcome.  :)  (Specifically, for the 
> trunk; I don't see a resolver overhaul as being suitable for the 0.6 
> stable branch.)
>   

We're planning to pursue this (for the above mentioned strategy) as soon 
as we work ourselves out of a bit of a backlog of other things to do.

>> 2. People want a solution for the handling of documentation.  The distutils
>>     module has had commented out sections related to this for several years.
>>     
>
> As with so many other things, this gets tossed around the 
> distutils-sig every now and then.  A couple of times I've thrown out 
> some options for how this might be done, but then the conversation 
> peters out around the time anybody would have to actually do some 
> work on it.  (Me included, since I don't have an itch that needs 
> scratching in this area.)
>
> In particular, if somebody wants to come up with a metadata standard 
> for including documentation in eggs, we've got a boatload of hooks by 
> which it could be done.  Nothing's stopping anybody from proposing a 
> standard and building a tool, here.  (e.g. using the setuptools 
> command hook, .egg-info writer hook, etc.)

Enthought has started an effort (it's currently one of two things in our 
ETSProjectTools project at 
https://svn.enthought.com/svn/enthought/ETSProjectTools/trunk) and we're 
experimenting with our solution before proposing it as a patch.  We'd 
love some more help if anyone wants to participate.

>> 3. A more flexible internal handing of the different types of files is needed.
>>     Currently the code, data, lib, etc. files are aggregated at 
>> build time and
>>     people would like them to be kept separate until install/packaging time.
>>     
>
> I don't know what this means, exactly.
>   

A number of projects want to provide various types of files besides code 
in their distributable, and they'd like these to end up in standard 
locations for that type of file.  Think documentation, sample data, web 
templates, configuration settings, etc.   Each of these should be 
treated differently at installation time depending on platform.  On 
*nix, docs should go in /usr/share/doc whereas we might need to create a 
C:\Python2.5\docs on Windows.   With sample data and templates, you 
probably just want it accessible outside of the zipped egg so users can 
easily look at it, add to it, edit it, etc.  Configuration settings 
should be installed with some defaults into a standard configuration 
directory like /etc on *nix, etc.

Basically the issue is that it needs to be easier to include different 
sets of files into an egg for different actions to be taken during 
installation or packaging into an OS-specific distribution format.

>>     The other is the use of a single .pth file to control the list 
>> of activated
>>     packages.  Those who produce distributions would prefer a magic directory
>>     into which links to distributions could be dropped, similar to 
>> the current
>>     best practices for Linux, with /etc/conf.d/, /etc/profile.d/,
>>     /etc/xinetd.d/ and so forth.
>>     
>
> site-packages is that directory, and has been since long before 
> setuptools.  Just drop uniquely-named .pth files there, and you're good to go.
>   

But the docs for easy_install claim that the list of active eggs is 
maintained in easy-install.pth.  Also, if I create my own .pth file, and 
the user tries to update my version to a new one, will the easy_install 
tool modify my .pth file to remove the mention of the old version from 
my sys.path and put the new version in the same .pth file?  Or will it 
now be listed in both places?  Or will it only in easy-install.pth?  

>> 7. Many wanted to ability to install files anywhere in the install tree and
>>     not just under the Python package.  Under distutils this was possible but
>>     it was removed in setuptools for security reasons.
>>     
>
> It wasn't security, it was manageability.  Egg-based installation 
> means containment, (analagous to GNU stow) and therefore portability 
> and disposability of plugins.  (Which again is what eggs were really 
> developed for in the first place.)
>   

Yes, but as you've already pointed out, they've escaped into a larger 
ecosystem and this restriction is a severe limitation -- leading to 
significant frustration.  Especially as projects evolve and want to do 
something more complex than simply install pure Python code.  Here at 
Enthought, we use and ship a number of projects that have extensions and 
thus dynamic libraries that need to either be modified during 
installation to work from the user's installed location, or copied 
elsewhere on the system to avoid the need to modify (which we also can't 
do via an egg install) env variables, registries, etc.   We'd also love 
to be able to ship end-user enterprise-scale applications via eggs so 
that bug fixes and updates don't require downloading a monolithic 100MB+ 
installer.  But doing that requires the ability to update desktop icons, 
menus, etc. which we also can't do automatically via an egg.

If you don't want the burden on setuptools to support, much less track, 
all these options, then perhaps it could just support automatic 
execution of a post-install script (and pre-uninstall script if 
uninstallation ever happens) that allows individual project developers 
to do what they need to do?  Let the burden of describing how those 
things happen and how to uninstall/relocate/update them fall to the 
provider of the projects that do them.

Also, IIUC, stow only tries to "contain" the hard files.  It puts links 
in multiple standard locations (for man pages, executables, libraries, 
etc.)   If setuptools supported these options, I don't think there'd be 
any discussion here except for things like "how do I extend the set of 
things the tool supports so that my foo-type files get linked into the 
standard /os/path/to/foo for the X os?"

>>   Custom code can still
>>     be written to do this explicitly but this is not popular.
>>     
>
> No kidding.  :)  Current best practice is to include a script or 
> module in the package that can install other files to a designated 
> location.  Personally, though, I tend to view applications and 
> libraries that target specific install locations to be overreaching 
> their bounds, and stepping into sysadmin territory.  Give me the 
> tools to install the data, don't just dump it somewhere on my system 
> where *you* think it should go, in other words.
>   

I should have read ahead.  This sounds close to what I've been 
describing except that this leads me to picture a script that prompts 
for install locations and allows the user to customize the destinations 
rather than one that assumes everything goes in a standard place.  I'm 
all for this, and the continuation of the ability to install an egg into 
a user-environment vs. a system-environment.  

The only thing missing here is the ability for the installer to 
automatically run that script so that installation isn't a disjointed, 
two-step manual process that a user is prone to forgot to complete. 

One of the features of Enthought's Enstaller extension to easy_install 
was that it looks for a post_install.py script in EGG-INFO and if one is 
found, runs it.  I would think that getting this into setuptools would 
be a significant step forward but I believe you previously rejected that 
idea.   We'll take a stab at creating a patch for you if you're more 
receptive to that idea now.  Just let me know.

> On the other hand, I've been puzzling over how to handle legitimate 
> post-install features.  On Windows, both wx and pywin32 have a real 
> need to do some actuall "install" operations.  Some is just copying 
> files, but pywin32 also has to do some registry stuff.  I don't know 
> how to allow just what's sensible, without opening up a huge can of 
> worms, though.
>   

I think there are lots of situations that are legitimate (projects with 
extensions, projects that want to put icons on the desktop or in menus, 
projects that need to interact with a registry, projects that want to 
put configuration information somewhere other than in a zip file in a 
site-packages dir, etc.)   I think we should worry less about preventing 
developers from shooting themselves in the foot and more about ensuring 
that they can hunt for food for their survival.   We can always tighten 
things down after seeing the usecases that develop, right?

-- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20080318/ac9b10a3/attachment-0001.htm