[Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

Thu Jun 5 10:24:31 EDT 2014

On Thu, Jun 5, 2014 at 2:51 PM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
>
> On Thu, Jun 5, 2014 at 6:40 AM, David Cournapeau <cournape at gmail.com>
> wrote:
>
>>
>>
>>
>> On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris <
>> charlesr.harris at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant <travis at continuum.io>
>>> wrote:
>>>
>>>> Believe me, I'm all for incremental changes if it is actually possible
>>>> and doesn't actually cost more.  It's also why I've been silent until now
>>>> about anything we are doing being a candidate for a NumPy 2.0.  I
>>>> understand the challenges of getting people to change.  But, features and
>>>> solid improvements *will* get people to change --- especially if their new
>>>> library can be used along with the old library and the transition can be
>>>> done gradually. Python 3's struggle is the lack of features.
>>>>
>>>> At some point there *will* be a NumPy 2.0.   What features go into
>>>> NumPy 2.0, how much backward compatibility is provided, and how much
>>>> porting is needed to move your code from NumPy 1.X to NumPy 2.X is the real
>>>> user question --- not whether it is characterized as "incremental" change
>>>> or "re-write".     What I call a re-write and what you call an
>>>> "incremental-change" are two points on a spectrum and likely overlap
>>>> signficantly if we really compared what we are thinking about.
>>>>
>>>> One huge benefit that came out of the numeric / numarray / numpy
>>>> transition that we mustn't forget about was actually the extended buffer
>>>> protocol and memory view objects.  This really does allow multiple array
>>>> objects to co-exist and libraries to use the object that they prefer in a
>>>> way that did not exist when Numarray / numeric / numpy came out.    So, we
>>>> shouldn't be afraid of that world.   The existence of easy package managers
>>>> to update environments to try out new features and have applications on a
>>>> single system that use multiple versions of the same library is also
>>>> something that didn't exist before and that will make any transition easier
>>>> for users.
>>>>
>>>> One thing I regret about my working on NumPy originally is that I
>>>> didn't have the foresight, skill, and understanding to work more on a more
>>>> extended and better designed multiple-dispatch system so that multiple
>>>> array objects could participate together in an expression flow.   The
>>>> __numpy_ufunc__ mechanism gives enough capability in that direction that it
>>>> may be better now.
>>>>
>>>> Ultimately, I don't disagree that NumPy can continue to exist in
>>>> "incremental" change mode ( though if you are swapping out whole swaths of
>>>> C-code for Cython code --- it sounds a lot like a "re-write") as long as
>>>> there are people willing to put the effort into changing it.   I think this
>>>> is actually benefited by the existence of other array objects that are
>>>> pushing the feature envelope without the constraints --- in much the same
>>>> way that the Python standard library is benefitted by many versions of
>>>> different capabilities being tried out before moving into the standard
>>>> library.
>>>>
>>>> I remain optimistic that things will continue to improve in multiple
>>>> ways --- if a little "messier" than any of us would conceive individually.
>>>>   It *is* great to see all the PR's coming from multiple people on NumPy
>>>> and all the new energy around improving things whether great or small.
>>>>
>>>
>>> @nathaniel IIRC, one of the objections to the missing values work was
>>> that it changed the underlying array object by adding a couple of variables
>>> to the structure. I'm willing to do that sort of thing, but it would be
>>> good to have general agreement that that is acceptable.
>>>
>>
>>
>> I think changing the ABI for some versions of numpy (2.0 , whatever) is
>> acceptable. There is little doubt that the ABI will need to change to
>> accommodate a better and more flexible architecture.
>>
>> Changing the C API is more tricky: I am not up to date to the changes
>> from the last 2-3 years, but at that time, most things could have been
>> changed internally without breaking much, though I did not go far enough to
>> estimate what the performance impact could be (if any).
>>
>>
>>
>>> As to blaze/dynd, I'd like to steal bits here and there, and maybe in
>>> the long term base numpy on top of it with a compatibility layer. There is
>>> a lot of thought and effort that has gone into those projects and we should
>>> use what we can. As is, I think numpy is good for another five to ten years
>>> and will probably hang on for fifteen, but it will be outdated by the end
>>> of that period. Like great whites, we need to keep swimming just to have
>>> oxygen. Software projects tend to be obligate ram ventilators.
>>>
>>> The Python 3 experience is definitely something we want to avoid. And
>>> while blaze does big data and offers some nice features, I don't know that
>>> it offers compelling reasons to upgrade to the more ordinary user at this
>>> time, so I'd like to sort of slip it into numpy if possible.
>>>
>>> If we do start moving numpy forward in more radical steps, we should try
>>> to have some agreement beforehand as to what sort of changes are
>>> acceptable. For instance, to maintain backward compatibility, is it
>>> sufficient that a recompile will do the job, or do we require forward
>>> compatibility for extensions compiled against earlier releases? Do we stay
>>> with C or should we support C++ code with its advantages of smart pointers,
>>> exception handling, and templates? We will need a certain amount of
>>> flexibility going forward and we should decide, or at least discuss, such
>>> issues up front.
>>>
>>
>> Last time the C++ discussion was brought up, no consensus could be made.
>> I think quite a few radical changes can be made without that consensus
>> already, though other may disagree there.
>>
>> IMO, what is needed the most is refactoring the internal to extract the
>> Python C API low level from the rest of the code, as I think that's the
>> main bottleneck to get more contributors (or get new core features more
>> quickly).
>>
>>
> What do you mean by "extract the Python C API"?
>

Poor choice of words: I meant extracting the lower level part of
array/ufunc/etc... from its wrapping into the python C API (with the idea
that the latter could be done in Cython, modulo improvements in cython to
manage the binary/code size explosion).

IOW, split numpy into core and core-py (I think dynd benefits a lots from
that, on top of its feature set).

David

>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140605/128cd290/attachment.html>