[Pandas-dev] Rewriting some of internals of pandas in C/C++? / Roadmap

Mon Jan 11 19:34:00 EST 2016

On Mon, Jan 11, 2016 at 4:19 PM, Jeff Reback <jeffreback at gmail.com> wrote:

> Seaborn does use Series/DataFrame internally as first class data
>> structures. But for xarray and statsmodels it is the other way around --
>> pandas objects are accepted as input, but coerced into NumPy arrays
>> internally for storage and manipulation. This presents issues for new types
>> with metadata like categorical.
>
>
>
> care to elaborate on the xarray decision to keep data as numpy arrays,
> rather than Series in DataArray? (as you do keep the Index objects intact).
>

Sure -- the main point of xarray is that we need N-dimensional data
structures, so we definitely need to support NumPy as a backend. Xarray
operations are defined in terms of NumPy (or dask) arrays.

In principle, we could store data as a Series, but for the sake of sanity
we would need to convert to NumPy arrays before doing any operations. Duck
typing compatibility is nice in theory, but in practice lots of subtle
issues tend to come up.

The alternative is to write our own ndarray abstraction internally to
xarray that could handle special types like Categorical, but I'm pretty
reluctant to do that. It seems like a lot of work, and numpy is "good
enough" in most cases. And, of course, I'd rather solve those problems
upstream :).

Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20160111/fe77d2b6/attachment.html>