Efficient python 2-d arrays?

Dan Stromberg drsalists at gmail.com
Thu Jan 20 21:10:35 EST 2011


On Thu, Jan 20, 2011 at 4:07 PM, Jacob Biesinger
<jake.biesinger at gmail.com> wrote:
> On Thu, Jan 20, 2011 at 12:48 PM, Dan Stromberg <drsalists at gmail.com> wrote:
>> On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger
>> <jake.biesinger at gmail.com> wrote:
>>>> Thanks for the great suggestions!
>>>
>>> On a related note, is there no way to efficiently sort a python array?
>>>
>>>
>>>>>> x = array('f', xrange(10000000))
>>>>>> x.sort()
>>> ---------------------------------------------------------------------------
>>> AttributeError                            Traceback (most recent call last)
>>>
>>> /home/wbiesing/src/python-sort/<ipython console> in <module>()
>>>
>>> AttributeError: 'array.array' object has no attribute 'sort'
>>>
>>>>>> type(sorted(x))
>>> <type 'list'>
>>>
>>> Seems like a common enough task, but no way to do it efficiently without scipy/numpy.
>>
>> Tap, tap: Is this thing on?  I keep sending messages to this list, but
>> no one seems to "hear" what I'm writing.
>
> Hey Dan, indeed thanks for the code; I pulled it down and played
> around with timsort a while ago. Timsort is indeed a beast. My current
> solution is to do an argsort.  Since I can't roll with your cython
> implementations (pure python *only*), this seems a bit faster than
> sorting the original two lists using a modified pure-python timsort.
> Something along the lines of:
>
> args = sorted(range(len(seq1)), key=seq1.__getitem__)
> seq1 = [seq1[i] for i in args]
> seq2 = [seq2[i] for i in args]
>
> but the method suffers from a fairly high memory usage (big list of
> int's for args).  If this barrier is too high, I will be switching
> over to your pure-python timsort.

Oh, cool.  I guess I'm not a ghost after all.  :)

And an argsort sounds like a nice solution.

>> Anyway, I have a bunch of sorts in python (pure or cython, your
>> option) at http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn
>>
>> The pure python versions should all work out of the box with an
>> array.array.  I tested the timsort on an array.array, and it worked
>> well.
>
> your pylint runs fail for me, saying you should use disable, not disable-msg.
> pylint 0.23.0,
> astng 0.21.1, common 0.54.0
> Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
> [GCC 4.4.5]

Earlier today I updated the code to deal with newer pylints -
including the disable vs disable-msg thing.  If you "svn update", you
should get these changes.

>> Also, I just made a trivial modification (not checked in! ...but I did
>> test it) to make the cythonized timsort support sorting arrays - just
>> change the "list" type declarations to "object" (or perhaps
>> "array.array") prior to compiling.
>
> I'm not very familiar with cython syntax-- how do you bring in the array module?
> cimport array? import array? I see you need to change cdef list to
> cdef array.array or something like that.  Perhaps you could send a
> diff?

http://stromberg.dnsalias.org/svn/sorts/compare/branches/arrays now
has a version of timsort that expects objects (IOW, about anything,
duck typed) instead of lists.  I tried briefly to declare some of the
types to be array.array, but perhaps Cython doesn't like that - I was
getting compile-time errors from it.

HTH.



More information about the Python-list mailing list