How to make this faster

Fri Jul 5 11:26:20 EDT 2013

On 5 July 2013 15:48, Helmut Jarausch <jarausch at igpm.rwth-aachen.de> wrote:
> On Fri, 05 Jul 2013 12:02:21 +0000, Steven D'Aprano wrote:
>
>> On Fri, 05 Jul 2013 10:53:35 +0000, Helmut Jarausch wrote:
>>
>>> Since I don't do any numerical stuff with the arrays, Numpy doesn't seem
>>> to be a good choice. I think this is an argument to add real arrays to
>>> Python.
>>
>> Guido's time machine strikes again:
>>
>> import array
>>
>>
>> By the way, I'm not exactly sure how you go from "I don't do numerical
>> calculations on numpy arrays" to "therefore Python should have arrays".
>
> I should have been more clear. I meant multi-dimensional arrays (2D, at least)
> Numpy is fine if I do math with matrices (without loops in python).
>
> Given that I don't like to use the old FORTRAN way (when "dynamic" arrays are passed to
> functions) of indexing a 2-d array I would need a MACRO or an INLINED function in Python
> or something like a META-compiler phase     transforming
>
> def access2d(v,i,j,dim1) :  # doesn't work on the l.h.s.
>   return v[i*dim1+j]
>
> access2d(v,i,j,dim1) = 7    # at compile time, please
>
> to
>
> v[i*dim1+j]= 7  # this, by itself, is considered ugly (the FORTRAN way)

The list of lists approach works fine for what you're doing. I don't
think that a[r][c] is that much worse than a[r, c]. It's only when you
want to do something like a[:, c] that it breaks down. In any case,
your algorithm would work better with Python's set/dict/list types
than numpy arrays.

One of the reasons that it's faster to use lists than numpy arrays (as
you found out) is precisely because the N-dimensional array logic
complicates 1-dimensional processing. I've seen discussions in Cython
and numpy about lighter-weight 1-dimensional array types for this
reason.

The other reason that numpy arrays are slower for what you're doing is
that (just like the stdlib array type Steven referred to) they use
homogeneous types in a contiguous buffer and each element is not a
Python object in its own right until you access it with e.g. a[0].
That means that the numpy array has to create a new object every time
you index into it whereas the list can simply return a new reference
to an existing object. You can get the same effect with numpy arrays
by using dtype=object but I'd still expect it to be slower for this.

Oscar