[Numpy-discussion] Massive differences in numpy vs. numeric string handling

Tim Hochberg tim.hochberg at cox.net
Wed Apr 12 15:15:05 EDT 2006


Jeremy Gore wrote:

> In Numeric:
>
> Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,)
> Numeric.array(['test','two']) ->
> array([[t, e, s, t],
>        [t, w, o,  ]],'c')
>
> but in numpy:
>
> numpy.array('test') -> array('test', dtype='|S4'); shape = ()
> numpy.array('test','S1') -> array('t', dtype='|S1'); shape = ()
>
> in fact you have to do an extra list cast:
>
> numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1');  
> shape = (4,)

The creation of arrays from python objects is full of all kinds of weird 
special cases. For numerical arrays this is works pretty well , but for 
other sorts of arrays, like strings and even worse, objects, it's 
impossible to always guess the correct kind of thing to return. I'll 
leave it to the various string array users to battle it out over what's 
the right way to convert strings. However,  in the meantime or if you do 
not prevail in this debate, I suggest you slap an appropriate three line 
function into your code somewhere.

If all you care about is the interface issues use:

    def chararray(astring):
        return numpy.array(list(astring), 'S1')

If you are worried about the performance of this, you could use the more 
cryptic, but more efficient:

    def chararray(astring):
        a = numpy.array(astring)
        return numpy.ndarray([len(astring)], 'S1', a.data)

Perhaps these will let you sleep at night.

Regards,

-tim



>
> to get the desired result.  I don't think this is very pythonic, as  
> strings are fully indexable and iterable objects.  Furthermore,  
> converting/treating a string as an array of characters is a very  
> common thing.  convertcode.py would not appear to convert this part  
> of the code correctly either.  Also, the use of quotes in the shape  
> () array but not in the shape (4,) array is inconsistent.
>
> I realize the ability to use strings of arbitrary length as array  
> elements is important in numpy, but there really should be a more  
> natural option to convert/cast strings as character arrays.
>
> Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare  
> '|S1' arrays or presumably other strings for equality, although this  
> is a very useful comparison to make.
>
> For the record, I have used the Numeric (and to a lesser degree the  
> numarray) module extensively in bioinformatics applications for its  
> speed and brevity.
>
> Jeremy
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting 
> language
> that extends applications into web and mobile media. Attend the live 
> webcast
> and join the prime developer group breaking into this new coding 
> territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>






More information about the NumPy-Discussion mailing list