[Pandas-dev] DyND and pandas [was Rewriting some of internals of pandas in C/C++? / Roadmap]

Tue Jan 12 19:20:15 EST 2016

On Tue, Jan 12, 2016 at 3:20 PM, Irwin Zaid <izaid at continuum.io> wrote:

>
> This discussion doesn't belong on this mailing list, but a couple of
>> brief points.
>>
>
> Wes, if you don't want this discussion on this mailing list then don't say
> things like: "it still feels like a political quagmirie leftover from the
> Continuum-Enthought rift in 2011". My email reply to that was simply a
> statement of facts, as this one will also be.
>
> I was approached by Travis and Peter about being a part of Continuum
>> Analytics in late 2011. According to my e-mail records we were having
>> these discussions at least as early as October 2011. The phrase "NumPy
>> 2.0" was spoken in this epoch (referring to
>> -the-project-now-known-as-DyND). So, I have quite a bit of first- and
>> second-hand information from this time period, including many of the
>> details of Mark's Enthought-sponsored NumPy development and the
>> problems that occurred online and offline.
>>
>
> The phrase "NumPy 2.0" means a number of things, and DyND was not one of
> them. Yes, you have some first-hand knowledge,
> but it's not relevant. Even IF it was, a lot of modern DyND also came from
> my massive contribution before I joined Continuum.
>
> Mark will speak up here as well.
>

It's certainly true that the phrase "NumPy 2.0" was spoken a lot during the
formation and early days of Continuum, but that's a term that was used
commonly even before the NumPy 1.6 release. It has long been the vehicle
for discussions about doing big refactoring and breaking changes in NumPy.
The discussions you're referring to were about a mixture of two things: a
NumPy 2.0 developed within the NumPy development process, and
re-conceptualizing NumPy at a higher level towards abstractions that could
be out of core, distributed, etc. The former is represented by emails like
https://mail.scipy.org/pipermail/numpy-discussion/2012-February/060623.html
and work that Continuum sponsored within NumPy. The latter is what became
branded as Blaze.

DyND itself began life as "dynamicndarray," and was a place to experiment
with some of the ideas I had about how the dtypes could be structured, how
things could work as a C++ library. It was started after all my involvement
with Enthought was completed and before Continuum began. It was completely
independent of either company. It was not adopted as part of development at
Continuum immediately, I did my best to present a solid case about how such
a thing would fit into Blaze, and the decision to open source the code and
include it as a component of the Blaze development was later made in one
swoop.

My hope during that time frame was that NumPy's internals could be
refactored in a way that isolated them more from its interface, and then
could begin a faster evolution without breaking that interface. I wanted
NumPy to transition ever so slowly into C++. Even if all of that occurred,
NumPy's evolution would have still been slow, and I knew that, so I saw
DyND as a place to boldly try things, to really experiment with how a
dynamic array programming library could look. We were particularly
sensitive to avoiding a recreation of the numeric vs numarray schism, and
DyND's Python bindings are separate from NumPy but interact naturally where
we found a way to do it.

The idea that DyND should have broad support from multiple companies is
something I strongly agree with, and I think specifically that should
extend to multiple industries. I believe the current development push led
by Irwin is bringing it close to a threshold where it's possible for that
to start happening, and developing it in close co-operation with Pandas
would be amazing for both DyND and Pandas. I'm reading this thread mostly
with hope that this possibility has a good chance of working, and a desire
that any decisions are made with an accurate picture of what DyND is and
aims to become.

-Mark

>
>
>> I applaud Continuum for using R&D budget to build something new and
>> forward thinking that is also permissively licensed open source
>> software. However, it is well known that open source projects driven
>> by for-profit organizations can run into governance problems that
>> place them in conflict with the community. Since DyND is a large
>> project that I would not be comfortable forking (if that were required
>> in the future), building an outside developer and user community is
>> essential if pandas is to consider using it as a hard dependency in
>> the future.
>>
>> The Apache Software Foundation exists for this reason and others, and
>> if you wish to place a community-oriented and merit-based governance
>> structure around DyND to assist with its incubation, the ASF may be
>> worth pursuing. NumFOCUS provides a fiscal sponsorship apparatus but
>> does not really address the governance questions. Whether or not the
>> governance issues are real doesn't really matter; it's about setting
>> people's minds at ease.
>>
>
> Okay, let me state again: The majority of DyND's contributions (as net
> from Mark, myself, and Ian) came without Continuum funding. Just because
> Continuum is funding DyND now does not make it a "Continuum project",
> whatever this means.
>
> Some of your other points are valid, and we'll address them as best we can
> as time goes on. DyND clearly needs a community, but it's a chicken-and-egg
> problem. If you try and build something hard, it takes time and users come
> when things work.
>
> The issue of refactoring Pandas is a different one that I'll add comments
> to in another email.
>
> Irwin
>
> _______________________________________________
> Pandas-dev mailing list
> Pandas-dev at python.org
> https://mail.python.org/mailman/listinfo/pandas-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20160112/ef1bd0f4/attachment.html>