[Numpy-discussion] big-bangs versus incremental improvements (was: Re: SciPy 2014 BoF NumPy Participation)

David Cournapeau cournape at gmail.com
Thu Jun 5 08:40:02 EDT 2014


On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris <charlesr.harris at gmail.com>
wrote:

>
>
>
> On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant <travis at continuum.io>
> wrote:
>
>> Believe me, I'm all for incremental changes if it is actually possible
>> and doesn't actually cost more.  It's also why I've been silent until now
>> about anything we are doing being a candidate for a NumPy 2.0.  I
>> understand the challenges of getting people to change.  But, features and
>> solid improvements *will* get people to change --- especially if their new
>> library can be used along with the old library and the transition can be
>> done gradually. Python 3's struggle is the lack of features.
>>
>> At some point there *will* be a NumPy 2.0.   What features go into NumPy
>> 2.0, how much backward compatibility is provided, and how much porting is
>> needed to move your code from NumPy 1.X to NumPy 2.X is the real user
>> question --- not whether it is characterized as "incremental" change or
>> "re-write".     What I call a re-write and what you call an
>> "incremental-change" are two points on a spectrum and likely overlap
>> signficantly if we really compared what we are thinking about.
>>
>> One huge benefit that came out of the numeric / numarray / numpy
>> transition that we mustn't forget about was actually the extended buffer
>> protocol and memory view objects.  This really does allow multiple array
>> objects to co-exist and libraries to use the object that they prefer in a
>> way that did not exist when Numarray / numeric / numpy came out.    So, we
>> shouldn't be afraid of that world.   The existence of easy package managers
>> to update environments to try out new features and have applications on a
>> single system that use multiple versions of the same library is also
>> something that didn't exist before and that will make any transition easier
>> for users.
>>
>> One thing I regret about my working on NumPy originally is that I didn't
>> have the foresight, skill, and understanding to work more on a more
>> extended and better designed multiple-dispatch system so that multiple
>> array objects could participate together in an expression flow.   The
>> __numpy_ufunc__ mechanism gives enough capability in that direction that it
>> may be better now.
>>
>> Ultimately, I don't disagree that NumPy can continue to exist in
>> "incremental" change mode ( though if you are swapping out whole swaths of
>> C-code for Cython code --- it sounds a lot like a "re-write") as long as
>> there are people willing to put the effort into changing it.   I think this
>> is actually benefited by the existence of other array objects that are
>> pushing the feature envelope without the constraints --- in much the same
>> way that the Python standard library is benefitted by many versions of
>> different capabilities being tried out before moving into the standard
>> library.
>>
>> I remain optimistic that things will continue to improve in multiple ways
>> --- if a little "messier" than any of us would conceive individually.   It
>> *is* great to see all the PR's coming from multiple people on NumPy and all
>> the new energy around improving things whether great or small.
>>
>
> @nathaniel IIRC, one of the objections to the missing values work was that
> it changed the underlying array object by adding a couple of variables to
> the structure. I'm willing to do that sort of thing, but it would be good
> to have general agreement that that is acceptable.
>


I think changing the ABI for some versions of numpy (2.0 , whatever) is
acceptable. There is little doubt that the ABI will need to change to
accommodate a better and more flexible architecture.

Changing the C API is more tricky: I am not up to date to the changes from
the last 2-3 years, but at that time, most things could have been changed
internally without breaking much, though I did not go far enough to
estimate what the performance impact could be (if any).



> As to blaze/dynd, I'd like to steal bits here and there, and maybe in the
> long term base numpy on top of it with a compatibility layer. There is a
> lot of thought and effort that has gone into those projects and we should
> use what we can. As is, I think numpy is good for another five to ten years
> and will probably hang on for fifteen, but it will be outdated by the end
> of that period. Like great whites, we need to keep swimming just to have
> oxygen. Software projects tend to be obligate ram ventilators.
>
> The Python 3 experience is definitely something we want to avoid. And
> while blaze does big data and offers some nice features, I don't know that
> it offers compelling reasons to upgrade to the more ordinary user at this
> time, so I'd like to sort of slip it into numpy if possible.
>
> If we do start moving numpy forward in more radical steps, we should try
> to have some agreement beforehand as to what sort of changes are
> acceptable. For instance, to maintain backward compatibility, is it
> sufficient that a recompile will do the job, or do we require forward
> compatibility for extensions compiled against earlier releases? Do we stay
> with C or should we support C++ code with its advantages of smart pointers,
> exception handling, and templates? We will need a certain amount of
> flexibility going forward and we should decide, or at least discuss, such
> issues up front.
>

Last time the C++ discussion was brought up, no consensus could be made. I
think quite a few radical changes can be made without that consensus
already, though other may disagree there.

IMO, what is needed the most is refactoring the internal to extract the
Python C API low level from the rest of the code, as I think that's the
main bottleneck to get more contributors (or get new core features more
quickly).

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140605/aae05b56/attachment.html>


More information about the NumPy-Discussion mailing list