Efficient python 2-d arrays?

OAN programming at toomuchcookies.net
Mon Jan 17 19:12:51 EST 2011


Hi,

what about pytables? It's built for big data collections and it doesn't 
clog up the memory.

Am 17.01.2011 23:54, schrieb Dan Stromberg:
> On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger
> <jake.biesinger at gmail.com>  wrote:
>> Hi all,
>>
>> Using numpy, I can create large 2-dimensional arrays quite easily.
>>>>> import numpy
>>>>> mylist = numpy.zeros((100000000,2), dtype=numpy.int32)
>> Unfortunately, my target audience may not have numpy so I'd prefer not to use it.
>>
>> Similarly, a list-of-tuples using standard python syntax.
>>>>> mylist = [(0,0) for i in xrange(100000000)
>> but this method uses way too much memory (>4GB for 100 million items, compared to 1.5GB for numpy method).
>>
>> Since I want to keep the two elements together during a sort, I *can't* use array.array.
>>>>> mylist = [array.array('i',xrange(100000000)), array.array('i',xrange(100000000))]
>> If I knew the size in advance, I could use ctypes arrays.
>>>>> from ctypes import *
>>>>> class myStruct(Structure):
>>>>>      _fields_ = [('x',c_int),('y',c_int)]
>>>>> mylist_type = myStruct * 100000000
>>>>> mylist = mylist_type()
>> but I don't know that size (and it can vary between 1 million-200 million), so preallocating doesn't seem to be an option.
>>
>> Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append?
>>
>> Thanks!
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
> I recently had need of a two dimensional array also, but I only needed
> small ones, so I just used a dictionary indexed by tuples.
>
> If you need to sort a row as an aggregate type, it seems that you
> probably either want to:
> 1) Use a list of lists, where the outer list is the easier one to sort
> by - this is analogous to the array of pointers in C - sorting by the
> outer would want to compare "elements" by looking at the 0th inner,
> then 1st inner, etc.  So if you can organize things this way, things
> might be pretty easy.
> 2) Use a list of instances, where the instances (rows) might just include a list
> 3) Use array.array with a custom sort so that you can move multiple
> things when a more common sort would move "one" (possibly an
> aggregate).
>
> If you want some sorting code to start from for #3, I have a variety
> of sorts written in Python (pure python or cython, your option, each
> automatically derived from the same m4 file for a given sort) at
> http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even
> the cython timsort provided there doesn't perform quite as well as the
> standard library's timsort.
>
> HTH




More information about the Python-list mailing list