[Distutils] Handling the binary dependency management problem

Nick Coghlan ncoghlan at gmail.com
Thu Dec 5 00:31:10 CET 2013


On 5 Dec 2013 07:29, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:
>
>
>
>
> On Wed, Dec 4, 2013 at 11:41 AM, Oscar Benjamin <
oscar.j.benjamin at gmail.com> wrote:
>>
>> On 4 December 2013 07:40, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> > On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald at stufft.io> wrote:
>> >>
>> >> I’d love to get Wheels to the point they are more suitable then they
are
>> >> for SciPy stuff,
>> >
>> > That would indeed be a good step forward. I'm interested to try to
help get
>> > to that point for Numpy and Scipy.
>>
>> Thanks Ralf. Please let me know what you think of the following.
>>
>> >> I’m not sure what the diff between the current state and what
>> >> they need to be are but if someone spells it out (I’ve only just
skimmed
>> >> your last email so perhaps it’s contained in that!) I’ll do the
arguing
>> >> for it. I
>> >> just need someone who actually knows what’s needed to advise me :)
>> >
>> > To start with, the SSE stuff. Numpy and scipy are distributed as
"superpack"
>> > installers for Windows containing three full builds: no SSE, SSE2 and
SSE3.
>> > Plus a script that runs at install time to check which version to use.
These
>> > are built with ``paver bdist_superpack``, see
>> > https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS
and
>> > CPU selector scripts are under tools/win32build/.
>> >
>> > How do I package those three builds into wheels and get the right one
>> > installed by ``pip install numpy``?
>>
>> This was discussed previously on this list:
>> https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
>
>
> Thanks, I'll go read that.
>
>> Essentially the current wheel format and specification does not
>> provide a way to do this directly. There are several different
>> possible approaches.
>>
>> One possibility is that the wheel spec can be updated to include a
>> post-install script (I believe this will happen eventually - someone
>> correct me if I'm wrong). Then the numpy for Windows wheel can just do
>> the same as the superpack installer: ship all variants, then
>> delete/rename in a post-install script so that the correct variant is
>> in place after install.
>>
>> Another possibility is that the pip/wheel/PyPI/metadata system can be
>> changed to allow a "variant" field for wheels/sdists. This was also
>> suggested in the same thread by Nick Coghlan:
>> https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
>>
>> The variant field could be used to upload multiple variants e.g.
>> numpy-1.7.1-cp27-cp22m-win32.whl
>> numpy-1.7.1-cp27-cp22m-win32-sse.whl
>> numpy-1.7.1-cp27-cp22m-win32-sse2.whl
>> numpy-1.7.1-cp27-cp22m-win32-sse3.whl
>> then if the user requests 'numpy:sse3' they will get the wheel with
>> sse3 support.
>>
>> Of course how would the user know if their CPU supports SSE3? I know
>> roughly what SSE is but I don't know what level of SSE is avilable on
>> each of the machines I use. There is a Python script/module in
>> numpexpr that can detect this:
>> https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py
>>
>> When I run that script on this machine I get:
>> $ python cpuinfo.py
>> CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2
>> is_32bit is_Core2 is_Intel is_i686
>>
>> So perhaps someone could break that script out of numexpr and release
>> it as a separate package on PyPI.
>
>
> That's similar to what numpy has - actually it's a copy from
numpy.distutils.cpuinfo
>
>>
>> Then the instructions for installing
>> numpy could be something like
>> """
>> You can install numpy with
>>
>>     $pip install numpy
>>
>> which will download the default version without any CPU-specific
optimisations.
>>
>> If you know what level of SSE support your CPU has then you can
>> download a more optimised numpy with either of:
>>
>>     $ pip install numpy:sse2
>>     $ pip install numpy:sse3
>>
>> To determine whether or not your CPU has SSE2 or SSE3 or no SSE
>> support you can install and run the cpuinfo script. For example on
>> this machine:
>>
>>     $ pip install cpuinfo
>>     $ python -m cpuinfo --sse
>>     This CPU supports the SSE3 instruction set.
>>
>> That means we can install numpy:sse3.
>> """
>
>
> The problem with all of the above is indeed that it's not quite
automatic. You don't want your user to have to know or care about what SSE
is. Nor do you want to create a new package just to hack around a pip
limitation. I like the post-install (or pre-install) option much better.
>
>>
>> Of course it would be a shame to have a solution that is so close to
>> automatic without quite being automatic. Also the problem is that
>> having no SSE support in the default numpy means that lots of people
>> would lose out on optimisations. For example if numpy is installed as
>> a dependency of something else then the user would always end up with
>> the unoptimised no-SSE binary.
>>
>> Another possibility is that numpy could depend on the cpuinfo package
>> so that it gets installed automatically before numpy. Then if the
>> cpuinfo package has a traditional setup.py sdist (not a wheel) it
>> could detect the CPU information at install time and store that in its
>> package metadata. Then pip would be aware of this metadata and could
>> use it to determine which wheel is appropriate.
>>
>> I don't quite know if this would work but perhaps the cpuinfo could
>> announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel
>> could "Requires" cpuinfo:sse2 or something along these lines. Or
>> perhaps this is better handled by the metadata extensions Nick
>> suggested earlier in this thread.
>>
>> I think it would be good to work out a way of doing this with e.g. a
>> cpuinfo package. Many other packages beyond numpy could make good use
>> of that metadata if it were available. Similarly having an extensible
>> mechanism for selecting wheels based on additional information about
>> the user's system could be used for many more things than just CPU
>> architectures.
>
>
> I agree extensibility is quite important. Whatever scheme you'd think of
with pre-defined tags will fail the next time anyone has a new idea (random
example: what if we start shipping parallel sets of binaries that only
differ in whether they're linked against ATLAS, OpenBLAS or MKL).

Hmm, rather than adding complexity most folks don't need directly to the
base wheel spec, here's a possible "multiwheel" notion - embed multiple
wheels with different names inside the multiwheel, along with a
self-contained selector function for choosing which ones to actually
install on the current system.

This could be used not only for the NumPy use case, but also allow the
distribution of external dependencies while allowing their installation to
be skipped if they're already present on the target system.

Cheers,
Nick.

>
> Ralf
>
>
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20131205/d9d536e7/attachment-0001.html>


More information about the Distutils-SIG mailing list