[Pandas-dev] DyND and pandas [was Rewriting some of internals of pandas in C/C++? / Roadmap]

Wes McKinney wesmckinn at gmail.com
Tue Jan 12 19:49:33 EST 2016


On Tue, Jan 12, 2016 at 3:54 PM, Irwin Zaid <izaid at continuum.io> wrote:
> Thanks, Jeff. Let's talk about this.
>
>>
>> So this thread is off-topic, but I believe the gist of what wes is
>> proposing from a technical point of view for libpandas:
>>
>> - the user facing pandas API will not change (except better perf /
>> copy-on-write etc)
>> - the back-end API should not change much either
>> - c-API for the back-end.
>> - allows swappable / agnostic numpy-like back-ends.
>> - ideally libpandas won't rewrite a completely new dtype system, maybe
>> could co-op datashape / pluribus for extensible dtypes
>
>
> For the most part, I think these are good ideas, but I share many of
> Stephan's concerns. I'd much rather we improve the array ecosystem in
> general and, very specifically, I don't think new dtypes should be added to
> pandas via libpandas.
>
> What I'd really like to see is for Wes and I to collaborate on *something*
> that solves the dtype problem and can be shared across libraries. I think
> Wes and I working together could result in potentially phenomenal things,
> both for pandas and other projects. I believe that the DyND type system is
> pretty close to a solution here, I think it could be spun out as an
> independent data description system. If for some reason the DyND type system
> is not sufficient, I'd *still* be happy to work together on a solution that
> has nothing to do with DyND.
>

I am happy to collaborate and propagate requirements and ideas
upstream. I absolutely think we should be doing the work necessary to
make DyND a suitable optional backend for pandas right now. The
libpandas refactoring effort will provide a TODO list of array backend
requirements that should help with that.

But: I'm not comfortable with pandas and DyND getting married, so to
speak, right now. Once DyND gains more broad mindshare as a NumPy
replacement, let's re-evaluate as a team and decide whether
maintaining pandas's NumPy-based array backend is worth our time.

That leaves us at a slight impasse about how to fix pandas's data type
woes with NumPy as the internal data container. A lightweight
"pass-through" logical type apparatus (which dispatches to NumPy or
DyND or native pandas code, as needed) is the simplest way to do that.
This is already the way that pandas works (with a hodgepodge of NumPy
data type objects and pandas data type objects weakly proxying for
logical types), but it will be much cleaner / better abstracted. It
also has the benefit of both:

- making array backends "swappable" and
- hiding level level details of the array backend from the pandas user

I see both of these points as justifications for the implementation
approach. It will also help DyND "cut its teeth" on the pandas unit
test suite and fill in feature gaps (and build a performance test
suite, too), and when it's ready we can "flip the switch".

The logical type abstraction and the choice of array backend are
orthogonal issues for me. The details of NumPy that have "leaked"
through to pandas have harmed its users, so independent of the
DyND-backend discussion I feel that the cleaner abstraction will
improve the library's accessibility and make its users more
productive. To summarize this: it should be enough to "just learn
pandas". I wish I'd done this originally, but early on it seemed
better to cut a few corners and get the software shipped rather than
taking more time to build abstractions. At that time I was "funding"
the project out of my savings account.

- Wes

> Of course, I'm not a pandas developer. But, at the same time, I'm offering
> to do free work here to help pandas.
>
>> If the above are met by a back-end, e.g. numpy, potentially DyND, then it
>> a back-end should be allowed
>> (certainly as an optional dep, whether its required or not can be a choice
>> made down the road).
>>
>> I think during implementation, that wes will be congnizant of these
>> points, and leave things as wide open as
>> possible w/o going down the road we are currently in (where lots of
>> different API's are intermixed).
>
>
> If the above is true, that sounds great. Wes, I'd appreciate it if you left
> opinions about Continuum funding DyND out of it -- we've both had our say
> now.
>
> Irwin
>


More information about the Pandas-dev mailing list