[Numpy-discussion] [matplotlib-devel] Announcing toydist, improving distribution and packaging situation

Mon Jan 4 02:25:44 EST 2010

On Mon, Jan 4, 2010 at 8:42 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau <cournape at gmail.com> wrote:
>> On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> What I do -- and documented for people in my lab to do -- is set up
>>> one virtualenv in my user account, and use it as my default python. (I
>>> 'activate' it from my login scripts.) The advantage of this is that
>>> easy_install (or pip) just works, without any hassle about permissions
>>> etc.
>>
>> It just works if you happen to be able to build everything from
>> sources. That alone means you ignore the majority of users I intend to
>> target.
>>
>> No other community (except maybe Ruby) push those isolated install
>> solutions as a general deployment solutions. If it were such a great
>> idea, other people would have picked up those solutions.
>
> AFAICT, R works more-or-less identically (once I convinced it to use a
> per-user library directory); install.packages() builds from source,
> and doesn't automatically pull in and build random C library
> dependencies.

As mentioned by Robert, this is different from the usual virtualenv
approach. Per-user app installation is certainly a useful (and
uncontroversial) feature.

And R does support automatically-built binary installers.

>
> Sure, I'm aware of the opensuse build service, have built third-party
> packages for my projects, etc. It's a good attempt, but also has a lot
> of problems, and when talking about scientific software it's totally
> useless to me :-). First, I don't have root on our compute cluster.

True, non-root install is a problem. Nothing *prevents* dpkg to run in
non root environment in principle if the packages itself does not
require it, but it is not really supported by the tools ATM.

> Second, even if I did I'd be very leery about installing third-party
> packages because there is no guarantee that the version numbering will
> be consistent between the third-party repo and the real distro repo --
> suppose that the distro packages 0.1, then the third party packages
> 0.2, then the distro packages 0.3, will upgrades be seamless? What if
> the third party screws up the version numbering at some point? Debian
> has "epochs" to deal with this, but third-parties can't use them and
> maintain compatibility.

Actually, at least with .deb-based distributions, this issue has a
solution. As packages has their own version in addition to the
upstream version, PPA-built packages have their own versions.

https://help.launchpad.net/Packaging/PPA/BuildingASourcePackage

Of course, this assumes a simple versioning scheme in the first place,
instead of the cluster-fck that versioning has became within python
packages (again, the scheme used in python is much more complicated
than everyone else, and it seems that nobody has ever stopped and
thought 5 minutes about the consequences, and whether this complexity
was a good idea in the first place).

> What if the person making the third party
> packages is not an expert on these random distros that they don't even
> use?

I think simple rules/conventions + build farms would solve most
issues. The problem is if you allow total flexibility as input, then
automatic and simple solutions become impossible. Certainly, PPA and
the build service provides for a much better experience than anything
pypi has ever given to me.

> Third, while we shouldn't advocate that people screw up backwards
> compatibility, version skew is a real issue. If I need one version of
> a package and my lab-mate needs another and we have submissions due
> tomorrow, then filing bugs is a great idea but not a solution.

Nothing prevents you from using virtualenv in that case (I may sound
dismissive of those tools, but I am really not. I use them myselves.
What I strongly react to is when those are pushed as the de-facto,
standard method).

> Fourth,
> even if we had expert maintainers taking care of all these third-party
> packages and all my concerns were answered, I couldn't convince our
> sysadmin of that; he's the one who'd have to clean up if something
> went wrong we don't have a big budget for overtime.

I am not advocating using only packaged, binary installers. I am
advocating using them as much as possible where it makes sense - on
windows and mac os x in particular.

Toydist also aims at making it easier to build, customize installs.
Although not yet implemented, --user-like scheme would be quite simple
to implement, because toydist installer internally uses autoconf-like
directories description (of which --user is a special case).

If you need sandboxed installs, customized installs, toydist will not
prevent it. It is certainly my intention to make it possible to use
virtualenv and co (you already can by building eggs, actually). I hope
that by having our own "SciPi", we can actually have a more reliable
approach. For example, the static dependency description + mandated
metadata would make this much easier and more robust, as there would
not be a need to run a setup.py to get the dependencies.

If you look at hackageDB
(http://hackage.haskell.org/packages/hackage.html), they have a very
simple index structure, which makes it easy to download it entirely,
and reuse this locally to avoid any internet access.

> Let's be honest -- scientists, on the whole, suck at IT
> infrastructure, and small individual packages are not going to be very
> expertly put together. IMHO any real solution should take this into
> account, keep them sandboxed from the rest of the system, and focus on
> providing the most friendly and seamless sandbox possible.

I agree packages will not always be well put together - but I don't
see why this would be worse than the current situation. I also
strongly disagree about the sandboxing as the solution of choice. For
most users, having only one install of most packages is the typical
use-case. Once you start sandboxing, you create artificial barriers
between the sandboxes, and this becomes too complicated for most users
IMHO.

>
> Maybe I was unclear -- proper build directory handling is nice,
> Cython/Pyrex's distutils integration get it wrong (not their fault,
> distutils is just impossible to do anything sensible with, as you've
> said), and I've never found build directories hard to implement

It is simple if you have a good infrastructure in place (node
abstraction, etc...), but that infrastructure is hard to get right.

>  But what I'm really talking about is
> having a "pre-build" step that integrates properly with the source and
> binary packaging stages, and that's not something waf or scons have
> any particular support for, AFAIK.

Could you explain with a concrete example what a pre-build stage would
look like ? I don't think I understand what you want,

cheers,

David