[Distutils] Handling the binary dependency management problem

Oscar Benjamin oscar.j.benjamin at gmail.com
Wed Dec 4 01:10:53 CET 2013


On 3 December 2013 22:18, Chris Barker <chris.barker at noaa.gov> wrote:
> On Tue, Dec 3, 2013 at 12:48 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> Because it already works for the scientific stack, and if we don't provide
>> any explicit messaging around where conda fits into the distribution
>> picture, users are going to remain confused about it for a long time.
>
> Do we have to have explicit messaging for every useful third-party package
> out there?
>
>> > I'm still confused as to why packages need to share external
>> > dependencies (though I can see why it's nice...) .
>>
>> Because they reference shared external data, communicate through shared
>> memory, or otherwise need compatible memory layouts. It's exactly the same
>> reason all C extensions need to be using the same C runtime as CPython on
>> Windows: because things like file descriptors break if they don't.
>
> OK -- maybe we need a better term than shared external dependencies -- that
> makes me think shared library. Also even the scipy stack is not as dependent
> in build env as we seem to thin it is -- I don't think there is any reason
> you can't use the "standard" MPL with Golke's MKL-build numpy, for instance.
> And I"m pretty sure that even scipy and numpy don't need to share their
> build environment more than any other  extension (i.e. they could use
> different BLAS implementations, etc... numpy version matters, but that's
> handled by the usual dependency handling.

Sorry, I was being vague earlier. The BLAS information is not
important but the Fortran ABI it exposes is:
http://docs.scipy.org/doc/numpy/user/install.html#fortran-abi-mismatch

MPL - matplotlib for those unfamiliar with the acronym - depends on
the numpy C API/ABI but not the Fortran ABI. So it would be
incompatible with, say, a pure Python implementation of numpy (or with
numpypy) but it should work fine with any of the numpy binaries
currently out there. (Numpy's C ABI has been unchanged from version
1.0 to 1.7 precisely because changing it has been too too painful to
contemplate).

> The reason Gohke's repo, and Anoconda and Canopy all exist is because it's a
> pain to build some of this stuff, period, not complex compatibly issues --
> and the real pain goes beyond the standard scipy stack (VTK is a killer!)

I agree that the binary compatibility issues are not as complex as
some are making out but it is a fact that his binaries are sometimes
binary-incompatible with other builds. I have seen examples of it
going wrong and he gives a clear warning at the top of his downloads
page:
http://www.lfd.uci.edu/~gohlke/pythonlibs/

>> but in their enthusiasm, the developers are pitching it as a general
>> purpose packaging solution. It isn't,
>
> It's not? Aside from momentum, and all that, could it not be a replacement
> for pip and wheel?

Conda/binstar could indeed be a replacement for pip and wheel and
PyPI. It currently lacks many packages but less so than PyPI if you're
mainly interested in binaries. For me pip+PyPI is a non-starter (as a
complete solution) if I can't install numpy and matplotlib.

>> By contrast, conda already exists, and already works, as it was designed
>> *specifically* to handle the scientific Python stack.
>
> I'm not sure we how well it works -- it works for Anoconda, and good point
> about the scientifc stack -- does it work equally well for other stacks? or
> mixing and matching?

I don't even know how well it works for the "scientific stack". It
didn't work for me! But I definitely know that pip+PyPI doesn't yet
work for me and working around that has caused me a lot more pain then
it would be to diagnose and fix the problem I had with conda. They
might even accept a one line, no-brainer pull request for my fix in
less then 3 months :) https://github.com/pypa/pip/pull/1187

>> This means that one key reason I want to recommend it for the cases where
>> it is a good fit (i.e. the scientific Python stack) is so we can explicitly
>> advise *against* using it in other cases where it will just add complexity
>> without adding value.
>
> I'm actually pretty concerned about this: lately the scipy community has
> defined a core "scipy stack":
>
> http://www.scipy.org/stackspec.html
>
> Along with this is a push to encourage users to just go with a scipy
> distribution to get that "stack":
>
> http://www.scipy.org/install.html
>
> and
>
> http://ipython.org/install.html
>
> I think this is in response to a years of pain of each package trying to
> build binaries for various platforms, and keeping it all in sync, etc. I
> feel their pain, and "just go with Anaconda or Canopy" is good advise for
> folks who want to get the "stack" up and running as easily as possible.

The scientific Python community are rightfully worried about potential
users losing interest in Python because these installation problems
occur for every noob who wants to use Python. In scientific usage
Python just isn't fully installed yet until numpy/scipy/matplotlib
etc. is. It makes perfect sense to try and get people introduced to
Python for scientific use in a way that minimises (delays?) their
encounter with the packaging problems in the Python ecosystem.

> But it does not server everyone else well -- web developers that need MPL
> for some plotting , scientific users that need a desktop GUI toolkit, pyhton
> newbies that want iPython, but none of that other stuff...
>
> What would serve all those folks well is a "standard build" of packages --
> i.e. built to go with the python.org builds, that can be downloaded with:
>
> pip install the_package.
>
> And I think, with binary wheels, we have the tools to do that.

Yes but there will never (rather not any time soon) be a single
universally compatible binary configuration even for say Windows 32
bit. I think that the binary wheels do need more compatibility
information. You mentioned C runtimes, also Fortran ABIs. But I don't
think the solution is for a PEP to enumerate the possibilities -
although it might be worth having an official stance on C runtimes for
Windows and for POSIX. I think the solution is to have
community-extensible compatibility information.

[snip]
>
> maybe we should just have conda talk to PyPi?
>
> As it stands, one of the POINTS of Anoconda is that it ISN'T the standard
> pyhton.org installer!

Actually I think conda does (or will soon) just invoke pip under the
hood for packages that aren't in the binstar/Anaconda channels but are
on PyPI.


Oscar


More information about the Distutils-SIG mailing list