[Numpy-discussion] Intel random number package

Wed Oct 26 15:49:50 EDT 2016

On Wed, Oct 26, 2016 at 12:41 PM, Warren Weckesser
<warren.weckesser at gmail.com> wrote:
>
>
> On Wed, Oct 26, 2016 at 3:24 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor
>> <jtaylor.debian at googlemail.com> wrote:
>> > On 10/26/2016 06:00 PM, Julian Taylor wrote:
>> >>
>> >> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
>> >>> <jtaylor.debian at googlemail.com <mailto:jtaylor.debian at googlemail.com>>
>> >>> wrote:
>> >>>
>> >>>     On 26.10.2016 06:34, Charles R Harris wrote:
>> >>>     > Hi All,
>> >>>     >
>> >>>     > There is a proposed random number package PR now up on github:
>> >>>     > https://github.com/numpy/numpy/pull/8209
>> >>>     <https://github.com/numpy/numpy/pull/8209>. It is from
>> >>>     > oleksandr-pavlyk <https://github.com/oleksandr-pavlyk
>> >>>     <https://github.com/oleksandr-pavlyk>> and implements
>> >>>     > the number random number package using MKL for increased speed.
>> >>> I think
>> >>>     > we are definitely interested in the improved speed, but I'm not
>> >>> sure
>> >>>     > numpy is the best place to put the package. I'd welcome any
>> >>> comments on
>> >>>     > the PR itself, as well as any thoughts on the best way organize
>> >>> or use
>> >>>     > of this work. Maybe scikit-random
>> >>>
>> >>>
>> >>> Note that this thread is a continuation of
>> >>>
>> >>> https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html
>> >>>
>> >>>
>> >>>
>> >>>     I'm not a fan of putting code depending on a proprietary library
>> >>>     into numpy.
>> >>>     This should be a standalone package which may provide the same
>> >>> interface
>> >>>     as numpy.
>> >>>
>> >>>
>> >>> I don't really see a problem with that in principle. Numpy can use
>> >>> Intel
>> >>> MKL (and Accelerate) as well if it's available. It needs some thought
>> >>> put into the API though - a ``numpy.random_intel`` module is certainly
>> >>> not what we want.
>> >>>
>> >>
>> >> For me there is a difference between being able to optionally use a
>> >> proprietary library as an alternative to free software libraries if the
>> >> user wishes to do so and offering functionality that only works with
>> >> non-free software.
>> >> We are providing a form of advertisement for them by allowing it (hey
>> >> if
>> >> you buy this black box that you cannot modify or use freely you get
>> >> this
>> >> neat numpy feature!).
>> >>
>> >> I prefer for the full functionality of numpy to stay available with a
>> >> stack of community owned software, even if it may be less powerful that
>> >> way.
>> >
>> > But then if this is really just the same random numbers numpy already
>> > provides just faster, it is probably acceptable in principle. I haven't
>> > actually looked at the PR yet.
>>
>> The RNG stream is totally different, so yeah, it can't just be a
>> silent drop-in replacement like BLAS/LAPACK.
>>
>> The patch also adds ~10,000 lines of code; here's an example of what
>> some of it looks like:
>>
>>
>> https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833
>>
>> I don't see how we can realistically commit to maintaining this.
>>
>
>
> FYI:  numpy already maintains code exactly like that:
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L262-L397
>
> Perhaps the point should be that the numpy devs won't want to maintain two
> nearly identical versions of that code.

Heh, good catch! Okay, if random_intel is a massive copy-paste of
random with modifications applied on top, then that's its own issue...
on the one hand, yeah, we definitely don't want to carry around
massive copy/paste code. OTOH, it suggests that it might be possible
to refactor the code so that common parts are shared, and this would
be a benefit to integrating random and random_intel more closely. (And
this benefit would then have to be weighed against all the other
considerations, like how much sharing there actually was,
maintainability of the remaining random_intel-specific bits, the
desire to keep numpy free-and-open, etc.) Hard to make that call just
from skimming a 10,000 line patch, though...

Oleksandr, or others at Intel: how much possibility do you think there
is for sharing code between random and random_intel?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org