[Numpy-discussion] recarray slow?

Wed Jul 21 16:49:53 EDT 2010

Wed, 21 Jul 2010 16:22:37 -0400, wheres pythonmonks wrote:
> However: is there an automatic way to convert a named index to a
> position?

It's not really a named index -- it's a field name. Since the fields of 
an array element can be of different size, they cannot be referred to 
with an array index (in the sense that Numpy understands the concept).

> What about looping over tuples of my recarray:
> 
> for t in d:
>     date = t['Date']
>     ....
> 
> I guess that the above does have to lookup 'Date' each time. 

As Pierre said, you can move the lookups outside the loop.

	for date in t['Date']:
	    ...

If you want to iterate over multiple fields, it may be best to use 
itertools.izip so that you unbox a single element at a time.

However, I'd be quite surprised if the hash lookups would actually take a 
significant part of the run time:

1) Python dictionaries are ubiquitous and the implementation appears
   heavily optimized to be fast with strings.

2) The hash of a Python string is cached, and only computed only once.

3) String literals are interned, and represented by a single object only:

   >>> 'Date' is 'Date'
   True

   So when running the above Python code, the hash of 'Date' is computed
   exactly once.

4) For small dictionaries containing strings, such as the fields
   dictionary, I'd expect 1-3) to be dwarfed by the overhead involved
   in making Python function calls (PyArg_*) and interpreting the
   bytecode.

So as the usual optimization mantra applies here: measure first :)

Of course, if you measure and show that the expectations 1-4) are 
actually wrong, that's fine.

-- 
Pauli Virtanen