[Distutils] Deployment with setuptools: a basket-of-eggs approach

Tue Apr 11 19:02:24 CEST 2006

On 11 Apr 2006 16:31:44 +0200, Iwan Vosloo XXXXX wrote:
>
> Hi Maris,
>
> Ok, I see...
>
> You can thus assume in your environment that the network will always
> be there.
>
> I was wondering whether you've ever looked at something like Debian's
> apt.  (Mentioned here just to learn from it, not to advocate its use.)
> Apt is a wonderful tool for keeping repositories and installing
> packages.  It does not solve all problems - and has the drawback that
> it only allows one version of something on a system (but you can trick
> it by having different package names...).
>

I am not completely familiar with the way that Debian handles package
releases, but I do have experience with managing apt, ebuilds, rpms,
and source installs.  I do not believe that these systems are an
option since some of our end-users are on Windows, and we do not have
an in-house admistrator to handle application upgrades.  Since
end-users will be handling upgrades, and because of the widely varying
technical skills of our users we must also have to try and have a
trivial upgrade process (which easy_install can do, with a bit of help
for runtime data and future-proofing).

Since our servers are all Fedora Linux, we could try installing Cygwin
on the Windows machines, and add facilities to automatically build two
RPMS, one for Fedora, one for Cygwin, and install those.  But it would
be very hard to justify that extra complexity in a shop our size.

One very nice thing about using setuptools and easy_install is that it
keeps the application lifecycle within the Python world.

> The hell you're talking about is something that Debian (and, I suppose
> other distros) has a lot of experience in managing.  And, for Debian,
> apt is the tool. (I don't know the others.)  Of course there are also
> a number of conventions and policies that play with to make it work.
>
> I find it odd that you call upon unit testing.  Is the issue not
> actually integration testing?
>

You are right, it would be integration testing.  With a technical
staff as small as ours (3 people) I was thinking that coding unit
tests for the boundaries between your package and its dependancies
would help ease the work load of upgrading.  I was hoping to ease
adoption of the new system by combining integration testing with a
framework that people are already familiar with.

> I think that the only way to deal with the possible complexities of
> many packages and dependencies is to impose restictions on when and
> how things are released.  For example, all the packages in Debian
> release X are tested to work together well (this is integration
> testing).  So, in Debian, you don't only have packages, you also have
> a set-of-versioned packages (also versioned) which is the release of
> the entire distro.  Any new version of a package, or new package that
> should work with that distro would need to be tested with all the rest
> of the packages in that release of the distro.
>

That level of testing would require the roles of either Project
Librarian or Build Master, neither of which we could justify at a
company our size.  Especially given the rate at which new projects are
written.

> I suppose disallowing more than one version of a package on a machine
> (like they have done) is one way of simplifying things.  And the
> standard workaround for special packages that need more versions, is
> to include part of the version in the name.  For example "gcc-3.4"
> (version 3.4.2) can be installed alongside "gcc-4.0" (version 4.0.3).
>

Alas, that is not an option with our legacy applications, which mix
library versions onto a single system, with the additional requirement
to be able to apply bug fixes to the systems easily (without a
sysadmin).

> With your scheme, your respository of eggs is also like a single,
> shared installation of eggs.  And it may be argued that there is a
> difference between putting something in a shared repository (which
> means "it is now officially released") versus installing a package on
> a machine where it is used.  When you install it, you care about other
> localised things too that are not versioned, such as local config or
> user data.  And things like apt include ways and means for you to
> upgrade user data and deal with config sensibly.
>
> It may be that the simplification of making "install" and "release"
> one thing is useful in an environment, I guess.  But in some
> environments the simplification may introduce other hellish issues.
>

Those other hellish issues are influenced by the vague,
situation-specific part I am attempting to grasp the full shape of. 
It seems to hinge on a handful of critical requirements, such as "two
applications, one system, different versions of libraries", "fast/easy
deployment to all systems", "fast/easy upgrades to all installed
applications", or "mixed operating systems".

One side effect of the 'require() only a single version' policy:  why
wouldn't I just bundle everything anyway?  Since I want only a single
version of the library, because that is all that I can guarantee to be
stable, then why not enforce the restriction physically?  Then we
would have the situation you speak of, making the distinction between
'installing on the local machine', and 'deploying to the shared
directory'.

I would have to install most applications as a multi-install in order
to meet the requirement of "two applications, one system, different
versions of libraries".  So we get:

# deploy
% easy_install -f /opt/eggs -zmaxd /opt/eggs mypackage
# install
% easy_install -f /opt/eggs -zmad /some/where mypackage

We now have the Java-style install I outlined in my first email (this
is the part where I wish that easy_install had an option to install
*everything* in the target directory, instead of trying to use
/usr/bin or whatever for scripts, etc.  We can get around this with a
custom setup.cfg and the "$base" distutils variable)

This satisfies the additional requirments of "multiple operating
systems" and "easy/fast installation"

That leaves us with the requirement for "easy/fast upgrades".  Under
the bundled setup we still have to visit every server in order to push
out a bugfix version of a library.  What is more, we have to visit
every individual application that was installed using the
multi-install option.

One way around this would be to specify a subset of a version number
to which easy_install could be applied.  Something like this:

% easy_install --upgrade mypackage==X.X.Y

We could achieve this by using pkg_resources to get the application's
dependancy tree, as well as all installed versions of all
dependancies.  Then it is a simple matter of parsing the version
string for each package and spliting out the 'X.X' part.  Then, for
each package in the tree, we re-run easy install with this
information:
os.system('easy_install -f /opt/eggs --upgrade mypackage==X.X')

easy_install should then grab and install any bugfix versions as required.

As far as pushing or pulling application bugfixes, this could be
handled by a cron script or scheduled task.  One per application,
which means more work to install any given app, because setuptools
can't handle this at the moment.  But it would work!

>
> (Sorry, I'm just thinking aloud, because I am also faced with such
> problems and, by habit, always build things around apt...  So its
> interesting to see other thoughts.)
>

No problem there, thinking aloud is often why discussing ideas with
people helps so much!

Maris