[Pandas-dev] [pydata] Sparse data structures in pandas: refactor - feedback welcome!

Joris Van den Bossche jorisvandenbossche at gmail.com
Sun Nov 18 03:58:59 EST 2018


Op zo 18 nov. 2018 om 00:10 schreef Pietro Battiston <me at pietrobattiston.it
>:

> Il giorno sab, 17/11/2018 alle 02.34 +0000, Tom Augspurger ha scritto:
> > Just to be clear, the current sparse datatframe stores each column
> > independently. There’s no memory saving over a DataFrame or sparse
> > columns.
> >
>
> Oh, I had missed that (and the fact that sparse columns are never
> consolidated!). Thanks,
>
> Yes, so storage wise, a DataFrame with sparse columns or a SparseDataFrame
is identical under the hood. The main difference is that SparseDataFrame
adds some extra functionality (sparse-specific methods, those could be
exposed on a normal DataFrame with a sparse accessor) and guarantees (each
column will be sparse, a default fill value for the full dataframe, ..).
So the question we mostly seek feedback on is whether a normal DataFrame
with sparse columns would suffice in practice in most cases, or which
aspects of the current SparseDataFrame would be blockers to be able to
switch to a normal DataFrame with sparse columns.

Joris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20181118/717c127a/attachment.html>


More information about the Pandas-dev mailing list