[Numpy-discussion] SciPy 2014 BoF NumPy Participation

Tue Jun 3 21:26:46 EDT 2014

On Wed, Jun 4, 2014 at 12:33 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> On Tue, Jun 3, 2014 at 5:08 PM, Kyle Mandli <kyle.mandli at gmail.com> wrote:
>>
>> Hello everyone,
>>
>> As one of the co-chairs in charge of organizing the birds-of-a-feather
>> sesssions at the SciPy conference this year, I wanted to solicit through the
>> NumPy list to see if we could get enough interest to hold a NumPy centered
>> BoF this year.  The BoF format would be up to those who would lead the
>> discussion, a couple of ideas used in the past include picking out a few of
>> the lead devs to be on a panel and have a Q&A type of session or an open Q&A
>> with perhaps audience guided list of topics.  I can help facilitate
>> organization of something but we would really like to get something
>> organized this year (last year NumPy was the only major project that was not
>> really represented in the BoF sessions).
>
> I'll be at the conference, but I don't know who else will be there. I feel
> that NumPy has matured to the point where most of the current work is
> cleaning stuff up, making it run faster, and fixing bugs. A topic that I'd
> like to see discussed is where do we go from here. One option to look at is
> Blaze, which looks to have matured a lot in the last year. The problem with
> making it a NumPy replacement is that NumPy has become quite widespread,
> with downloads from PyPi running at about 3 million per year. With that much
> penetration it may be difficult for a new core like Blaze to gain traction.
> So I'd like to also discuss ways to bring the two branches of development
> together at some point and explore what NumPy can do to pave the way. Mind,
> there are definitely things that would be nice to add to NumPy, a better
> type system, missing values, etc., but doing that is difficult given the
> current design.

I won't be at the conference unfortunately (I'm on the wrong continent
and have family commitments then anyway), but I think there's lots of
exciting stuff that can be done in numpy-land.

We absolutely could rewrite the dtype system, and this would
straightforwardly give us excellent support for missing values, units,
categorical data, automatic differentiation, better datetimes, etc.
etc. -- and make numpy much more friendly in general to third-party
extensions.

I'd like to see the ufunc system revisited in the light of all the
things we know now, to make gufuncs more first-class, provide better
support for user-defined types, more flexible loop selection (e.g.
make it possible to implement np.add.reduce(a, type="kahan")), etc.;
one goal would be to convert a lot of ufunc-like functions (np.mean
etc.) into being real ufuncs, and then they'd automatically benefit
from __numpy_ufunc__, which would also massively improve
interoperability with alternative array types like blaze.

I'd like to see support for extensible label-based indexing, like pandas.

Internally, I'd like to see internal migrating out of C and into
Cython -- we have hundreds of lines of code that could be replaced
with a few lines of Cython and no-one would notice. (Combining this
with a cffi cython backend and pypy would be pretty interesting
too...)

I'd like to see sparse ndarrays, with integration into the ufunc
looping machinery so all ufuncs just work. Or even better, I'd like to
see the right hooks added so that anyone can write a sparse ndarray
package using only public APIs, and have all ufuncs just work. (I was
going to put down deferred/loop-fused/out-of-core computation as a
wishlist item too, but if we do it right then this too could be
implemented by anyone without needing to be baked into numpy proper.)

All of these things would take some work and care, but I think they
could all be done incrementally and without breaking backwards
compatibility. Compare to ipython, which -- as Fernando likes to point
out :-) -- went from a little console program to its current
distributed-notebook-skynet-whatever-it-is by merging one working PR
at a time. Certainly these changes would much easier and less
disruptive than any plan that involves throwing out numpy and starting
over. But they also do help smooth the way for an incremental
transition to a world where numpy is regularly used alongside other
libraries.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org