[Numpy-discussion] multiple 2d vs n-dimensional arrays

Fri Sep 10 20:33:01 EDT 2010

On Fri, Sep 10, 2010 at 1:40 PM, Adam <adam at superbitbucket.com> wrote:

>  I'm keeping a large number of data points in multiple 2d arrays, for
> example:
>
> class c(object):
>     def __init__(self):
>         self.a = np.zeros((24, 60))
>         self.b = np.zeros((24, 60))
>         ...
>
> After processing the data, I'm serializing these to disk for future
> reference/post-processing.  It's a largish amount of data and is only
> going to get larger.
>
> Would it be more efficient (in terms of memory/disk storage to use a
> single n-dimensional array:
>
>         self.a = np.zeros((24, 60, 5))
>
> What other advantages (if any) would I gain from storing the data in a
> single array rather than multiple?  The deeper into this project I get,
> the more I am probably going to need to correlate data points from one
> or more of the arrays to one or more of the other arrays.
>
> I think I just answered my own question...
>
>
Adam,

One argument *against* merging all of the data into a single array is that
the array needs a contiguous portion of memory.  If the quantity of data
reaches a certain point, the OS may have to do more work in allocating the
space for your array.  Smaller chunks may fit better.

Of course, this is entirely dependent upon usage patterns, available RAM,
OS, the day of the week, and the color of your socks.  YMMV.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100910/299743a0/attachment.html>