[Numpy-discussion] Massive differences in numpy vs. numeric string handling
Jeremy Gore
jmgore75 at gmail.com
Wed Apr 12 14:30:05 EDT 2006
In Numeric:
Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,)
Numeric.array(['test','two']) ->
array([[t, e, s, t],
[t, w, o, ]],'c')
but in numpy:
numpy.array('test') -> array('test', dtype='|S4'); shape = ()
numpy.array('test','S1') -> array('t', dtype='|S1'); shape = ()
in fact you have to do an extra list cast:
numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1');
shape = (4,)
to get the desired result. I don't think this is very pythonic, as
strings are fully indexable and iterable objects. Furthermore,
converting/treating a string as an array of characters is a very
common thing. convertcode.py would not appear to convert this part
of the code correctly either. Also, the use of quotes in the shape
() array but not in the shape (4,) array is inconsistent.
I realize the ability to use strings of arbitrary length as array
elements is important in numpy, but there really should be a more
natural option to convert/cast strings as character arrays.
Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare
'|S1' arrays or presumably other strings for equality, although this
is a very useful comparison to make.
For the record, I have used the Numeric (and to a lesser degree the
numarray) module extensively in bioinformatics applications for its
speed and brevity.
Jeremy
More information about the NumPy-Discussion
mailing list