[Python-Dev] [Distutils] Capsule Summary of Some Packaging/Deployment Technology Concerns

Wed Mar 19 02:20:44 CET 2008

We should probably move this off of Python-Dev, as we're getting into 
deep details now...

At 07:27 PM 3/18/2008 -0500, Dave Peterson wrote:
>If you really wanted to do a full-tree intersection, it seems to me 
>that the problem is detecting all the dependencies without having to 
>spend significant time downloading/building in order to find them 
>out.   This could be solved by simply extending the cheeseshop 
>interface to export the set of requirements outside of the egg / 
>tarball / etc.  We've done this for our own egg repository by 
>extracting the appropriate meta-data files out of EGG-INFO and 
>putting it into a separate file.  This info is also useful for users 
>as it gives them an idea of how much *new* stuff is going to be 
>installed (a la yum, apt-get, etc.)

...and now we're more directly competing with them, too.  The 
original idea Bob and I had was to do XML files ala Eclipse feature 
repositories, but then later I realized that for what we were doing, 
HTML was both adequate and already available.  However, I don't see a 
problem in principle with having "header" files available for this 
sort of thing.

>With our ETS projects, we've run into problems with the current 
>heuristic.  Perhaps we just don't know how to make it work like we want?
>
>We have a set of projects that we want to be individually 
>installable (to the extent that we limit cross-project dependencies) 
>but we also want to make it easy to install the complete set.  We 
>use a meta-egg for the latter.  It's purpose is only to specify the 
>exact versions of each project that have been explicitly tested to 
>work together -- you could almost think of it as a source control system tag.

I would think that as long as that meta-egg specifies *all* the 
required versions (right down to recursive dependencies), then there 
shouldn't be any problem.  Maybe it's me who's not understanding something?

I would think that you could get the appropriate data by running the 
tl.eggdeps tool.

>A number of projects want to provide various types of files besides 
>code in their distributable, and they'd like these to end up in 
>standard locations for that type of file.  Think documentation, 
>sample data, web templates, configuration settings, etc.   Each of 
>these should be treated differently at installation time depending 
>on platform.  On *nix, docs should go in /usr/share/doc whereas we 
>might need to create a C:\Python2.5\docs on Windows.   With sample 
>data and templates, you probably just want it accessible outside of 
>the zipped egg so users can easily look at it, add to it, edit it, 
>etc.  Configuration settings should be installed with some defaults 
>into a standard configuration directory like /etc on *nix, etc.
>
>Basically the issue is that it needs to be easier to include 
>different sets of files into an egg for different actions to be 
>taken during installation or packaging into an OS-specific distribution format.

Yes, it would be nice to define a metadata standard for including 
installable "datasets" either through copying or symlinking, 
optionally with entry points for running some code, too.  When you 
install an egg, these things could get added to a "post-install 
to-do" list, that you could then read to find out what steps to do, 
or invoke a tool on to actually do some of those steps.

>But the docs for easy_install claim that the list of active eggs is 
>maintained in easy-install.pth.  Also, if I create my own .pth file, 
>and the user tries to update my version to a new one, will the 
>easy_install tool modify my .pth file to remove the mention of the 
>old version from my sys.path and put the new version in the same 
>.pth file?  Or will it now be listed in both places?  Or will it 
>only in easy-install.pth?

My understanding of the context of the question was that it applied 
to *system* packaging tools, which would be exclusively maintaining 
the .pth entries for the packages they installed.  i.e., a scenario 
with *no* easy-install.pth.  Setuptools will still detect the 
presence of their eggs, regardless of the means by which they're 
added to sys.path.  But it would not *maintain* those .pth files.

>Yes, but as you've already pointed out, they've escaped into a 
>larger ecosystem and this restriction is a severe limitation -- 
>leading to significant frustration.  Especially as projects evolve 
>and want to do something more complex than simply install pure 
>Python code.  Here at Enthought, we use and ship a number of 
>projects that have extensions and thus dynamic libraries that need 
>to either be modified during installation to work from the user's 
>installed location, or copied elsewhere on the system to avoid the 
>need to modify (which we also can't do via an egg install) env 
>variables, registries, etc.

By the way, there *is* experimental shared library building support 
in setuptools, and I recently heard from Andi Vajda that he was 
successful in using it in his JCC project to make available a C++ 
library for linkage from JCC-built projects.  (I'm also sitting on 
his patch that makes it work...)  I'm not sure that it actually fixes 
the larger problem, in that e.g., if the main project is installed by 
the system, and then you build or install an egg yourself.  But I 
think those problems are solvable.

>    We'd also love to be able to ship end-user enterprise-scale 
> applications via eggs so that bug fixes and updates don't require 
> downloading a monolithic 100MB+ installer.  But doing that requires 
> the ability to update desktop icons, menus, etc. which we also 
> can't do automatically via an egg.

Yep...  a good post-install mechanism would be handy for wx and 
pywin32 as well.

>If you don't want the burden on setuptools to support, much less 
>track, all these options, then perhaps it could just support 
>automatic execution of a post-install script (and pre-uninstall 
>script if uninstallation ever happens) that allows individual 
>project developers to do what they need to do?  Let the burden of 
>describing how those things happen and how to 
>uninstall/relocate/update them fall to the provider of the projects 
>that do them.

Yeah, that's what I really *don't* want.  I'd like to enable a more 
trustable mechanism than a blindly-executed script.  I'd rather see a 
standard that makes a developer document more, and have to at least 
*convince* the user that their post-install is worthwhile, even if 
the tool then makes it easy to run.

Better still, I'd rather have those post-install parts done in such a 
way that things like icons, menus, manifests, registry stuff, etc., 
have to get explicitly listed instead of being done programatically.

>Also, IIUC, stow only tries to "contain" the hard files.  It puts 
>links in multiple standard locations (for man pages, executables, 
>libraries, etc.)   If setuptools supported these options, I don't 
>think there'd be any discussion here except for things like "how do 
>I extend the set of things the tool supports so that my foo-type 
>files get linked into the standard /os/path/to/foo for the X os?"

Yep.  Having that would be a worthwhile thing, I think.  Discussion 
leading to specs is most welcome.

>I should have read ahead.  This sounds close to what I've been 
>describing except that this leads me to picture a script that 
>prompts for install locations and allows the user to customize the 
>destinations rather than one that assumes everything goes in a 
>standard place.  I'm all for this, and the continuation of the 
>ability to install an egg into a user-environment vs. a system-environment.

+1.

>The only thing missing here is the ability for the installer to 
>automatically run that script so that installation isn't a 
>disjointed, two-step manual process that a user is prone to forgot 
>to complete.

I don't see a problem with a prompting process, backed by a log file 
that records what post-install steps are pending, finished, or 
explicitly rejected by the user.

One possibility, by the way, is that we could overload "extras" for 
this purpose.  Entry points (such as those for scripts) can require 
extras; if extras could mean post-install components like docs or 
icons or what-have-you, then trying to run the script could result in 
an error message telling you you need to "easy_install 
foo_package[icons]" or whatever.

>One of the features of Enthought's Enstaller extension to 
>easy_install was that it looks for a post_install.py script in 
>EGG-INFO and if one is found, runs it.  I would think that getting 
>this into setuptools would be a significant step forward but I 
>believe you previously rejected that idea.   We'll take a stab at 
>creating a patch for you if you're more receptive to that idea 
>now.  Just let me know.

No -- I'm not happy with a straight-up executable hook for 
post-install steps.  My evaluation of the state of PyPI is that I 
don't trust the community to write non-hazardous setup.py files, let 
alone post-install scripts.  There should be a high technical and 
social barrier to including post-install hooks with arbitrary code.

For example, if there was a required separation between installer 
tools and the things they install, such that any post-install 
operation had to be performed strictly by providing some 
human-readable data that will be passed to a separately-installed 
tool, and there was a high social stigma associated with writing your 
own post-install tool, then that might work.

So, for example, if the community creates an icons and menus 
installer tool for the various platforms, and then anybody can use it 
in their project by adding the right data, then the user doesn't have 
to fully trust arbitrary package authors, only the authors of the 
post-install tools.

I'm not saying that model is perfect; in fact I can see some 
potential pitfalls.  But once an automatic post-install hole is 
opened it will be *very* hard to close, because it will always be 
*easier* to roll your own crappy post-installer instead of 
contributing to a set of robust cross-project/cross-platform 
tools.  So I'd rather keep this particular "itch" in play and try to 
build up the scratching pressure until some people get together and 
pay attention long enough to solve the problem in a less hacky way.  :)

>>On the other hand, I've been puzzling over how to handle legitimate
>>post-install features.  On Windows, both wx and pywin32 have a real
>>need to do some actuall "install" operations.  Some is just copying
>>files, but pywin32 also has to do some registry stuff.  I don't know
>>how to allow just what's sensible, without opening up a huge can of
>>worms, though.
>>
>
>I think there are lots of situations that are legitimate (projects 
>with extensions, projects that want to put icons on the desktop or 
>in menus, projects that need to interact with a registry, projects 
>that want to put configuration information somewhere other than in a 
>zip file in a site-packages dir, etc.)   I think we should worry 
>less about preventing developers from shooting themselves in the foot

It's the users' feet that I'm concerned with.  Some people are 
already paranoid about the fact that PyPI doesn't use SSL and code 
signing, or that easy_install uses the intarwebs at all.  I can just 
see the witch hunt when we start executing arbitrary code.  Unh 
unh.  No way am I letting that happen.  Nope.

>  and more about ensuring that they can hunt for food for their survival.

Right now, if you have a post-install script that's essential, you'll 
just have to convince your users to run it.  Which nicely keeps 
easy_install out of what should be a conversation between developer and user.

Enstaller is a different case - you are presumably installing an 
application, and the user is trusting your installer.  easy_install 
is something else altogether, and is used by other programs such as buildout.

Actually, I wonder if instead of trying to enhance setuptools for 
post-install, if maybe we should be looking at buildout recipes and 
maybe having a way for setuptools dependencies to point to buildout 
specs.  IIRC, buildout specs can be remotely retrieved from a single URL, too.

>    We can always tighten things down after seeing the usecases that 
> develop, right?

Actually, no, we can't, since backward compatibility would keep us 
from removing the hook, once people rely on it.

I really feel yours (and others) pain on this issue, but it's one 
place where the users have to come first, and they need protection 
from the wilds of PyPI.  Distribution and installation issues are not 
first on most developers' minds, so the fact that someone writes a 
great library on PyPI doesn't mean they can write installers worth a 
crap.  Frankly, I wouldn't trust myself to write a correct 
post-installer on the first go -- perhaps *because* I have seen so 
many "simple" things go wrong.