[Pandas-dev] Pandas Sprint Recap

Wed Jul 18 14:02:44 EDT 2018

Il giorno mer, 18/07/2018 alle 10.16 -0700, Stephan Hoyer ha scritto:
> [...]
> I certainly find stacking/unstacking useful, but it is isn't the only
> way to manipulate multi-dimensional tabular data. I do think R's
> tidyverse shows an alternative viable path. Without having used it
> extensively, it appears to be more consistent and easier to use than
> pandas.

Probably, but (from the limited knowledge I gathered in the last
months) it just doesn't seem as powerful, by far.

> For multi-dimensional data analysis, these days I generally prefer to
> use xarray (disclaimer: my project) instead of a pandas.MultiIndex. I
> find it more satisfying to have indexed N-D arrays (in an
> xarray.Dataset) rather than indexed 2D dataframes.

I certainly find xarray the closest thing to a pandas replacement; the
main tradeoff is the single dtype and the waste of memory if dimensions
are not aligned, right?

> The way that pandas.DataFrame uses an Index for both row and column
> labels makes it in some ways similar to the fixed 2D numpy.matrix,
> which personally I find less useful.

Tastes are tastes, but it is a fact that a 2D DataFrame + MultiIndex
offers possibilities that only nD numpy arrays + lot of manual effort
would rival.

(That's actually how I used to work before knowing pandas: I would
build my own indexes and use them to access numpy arrays)

Please correct me if I'm wrong, but I think that even a Series with
MultiIndex is quite close, in terms of manipulation abilities (that is,
discarding e.g. efficiency, and cleanness of the API) to a
xarray xarray.Dataset.

Pietro