Efficient python 2-d arrays?

Dan Stromberg drsalists at gmail.com
Mon Jan 17 17:54:20 EST 2011


On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger
<jake.biesinger at gmail.com> wrote:
> Hi all,
>
> Using numpy, I can create large 2-dimensional arrays quite easily.
>>>> import numpy
>>>> mylist = numpy.zeros((100000000,2), dtype=numpy.int32)
>
> Unfortunately, my target audience may not have numpy so I'd prefer not to use it.
>
> Similarly, a list-of-tuples using standard python syntax.
>>>> mylist = [(0,0) for i in xrange(100000000)
>
> but this method uses way too much memory (>4GB for 100 million items, compared to 1.5GB for numpy method).
>
> Since I want to keep the two elements together during a sort, I *can't* use array.array.
>>>> mylist = [array.array('i',xrange(100000000)), array.array('i',xrange(100000000))]
>
> If I knew the size in advance, I could use ctypes arrays.
>>>> from ctypes import *
>>>> class myStruct(Structure):
>>>>     _fields_ = [('x',c_int),('y',c_int)]
>>>> mylist_type = myStruct * 100000000
>>>> mylist = mylist_type()
>
> but I don't know that size (and it can vary between 1 million-200 million), so preallocating doesn't seem to be an option.
>
> Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append?
>
> Thanks!
> --
> http://mail.python.org/mailman/listinfo/python-list
>

I recently had need of a two dimensional array also, but I only needed
small ones, so I just used a dictionary indexed by tuples.

If you need to sort a row as an aggregate type, it seems that you
probably either want to:
1) Use a list of lists, where the outer list is the easier one to sort
by - this is analogous to the array of pointers in C - sorting by the
outer would want to compare "elements" by looking at the 0th inner,
then 1st inner, etc.  So if you can organize things this way, things
might be pretty easy.
2) Use a list of instances, where the instances (rows) might just include a list
3) Use array.array with a custom sort so that you can move multiple
things when a more common sort would move "one" (possibly an
aggregate).

If you want some sorting code to start from for #3, I have a variety
of sorts written in Python (pure python or cython, your option, each
automatically derived from the same m4 file for a given sort) at
http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even
the cython timsort provided there doesn't perform quite as well as the
standard library's timsort.

HTH



More information about the Python-list mailing list