[Numpy-discussion] recarray slow?

wheres pythonmonks wherespythonmonks at gmail.com
Wed Jul 21 15:47:36 EDT 2010


Thank you very much....  better crack open a numpy reference manual
instead of relying on my python "intuition".

On Wed, Jul 21, 2010 at 3:44 PM, Pauli Virtanen <pav at iki.fi> wrote:
> Wed, 21 Jul 2010 15:12:14 -0400, wheres pythonmonks wrote:
>
>> I have an recarray -- the first column is date.
>>
>> I have the following function to compute the number of unique dates in
>> my data set:
>>
>>
>> def byName(): return(len(list(set(d['Date'])) ))
>
> What this code does is:
>
> 1. d['Date']
>
>   Extract an array slice containing the dates. This is fast.
>
> 2. set(d['Date'])
>
>   Make copies of each array item, and box them into Python objects.
>   This is slow.
>
>   Insert each of the objects in the set. Also this is somewhat slow.
>
> 3. list(set(d['Date']))
>
>   Get each item in the set, and insert them to a new list.
>   This is somewhat slow, and unnecessary if you only want to
>   count.
>
> 4. len(list(set(d['Date'])))
>
>
> So the slowness arises because the code is copying data around, and
> boxing it into Python objects.
>
> You should try using Numpy functions (these don't re-box the data) to do
> this. http://docs.scipy.org/doc/numpy/reference/routines.set.html
>
> --
> Pauli Virtanen
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list