[Numpy-discussion] recarray slow?
Pauli Virtanen
pav at iki.fi
Wed Jul 21 16:49:53 EDT 2010
Wed, 21 Jul 2010 16:22:37 -0400, wheres pythonmonks wrote:
> However: is there an automatic way to convert a named index to a
> position?
It's not really a named index -- it's a field name. Since the fields of
an array element can be of different size, they cannot be referred to
with an array index (in the sense that Numpy understands the concept).
> What about looping over tuples of my recarray:
>
> for t in d:
> date = t['Date']
> ....
>
> I guess that the above does have to lookup 'Date' each time.
As Pierre said, you can move the lookups outside the loop.
for date in t['Date']:
...
If you want to iterate over multiple fields, it may be best to use
itertools.izip so that you unbox a single element at a time.
However, I'd be quite surprised if the hash lookups would actually take a
significant part of the run time:
1) Python dictionaries are ubiquitous and the implementation appears
heavily optimized to be fast with strings.
2) The hash of a Python string is cached, and only computed only once.
3) String literals are interned, and represented by a single object only:
>>> 'Date' is 'Date'
True
So when running the above Python code, the hash of 'Date' is computed
exactly once.
4) For small dictionaries containing strings, such as the fields
dictionary, I'd expect 1-3) to be dwarfed by the overhead involved
in making Python function calls (PyArg_*) and interpreting the
bytecode.
So as the usual optimization mantra applies here: measure first :)
Of course, if you measure and show that the expectations 1-4) are
actually wrong, that's fine.
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list