[SciPy-dev] chararry array method

Wed Jan 4 14:19:08 EST 2006

Perry Greenfield wrote:

>On Dec 29, 2005, at 5:00 PM, Travis Oliphant wrote:
>  
>
>>So, this is taking a buffer and chopping it into string bits.
>>Currently, the chararray array function does not take a buffer input.
>>
>>    
>>
>Yes, this is common for us as we usually create these from tables 
>obtained
>from files where some columns of the tables contain fixed width strings.
>It would be uncommon for the data buffer to contain only strings, but we
>generally need to create such arrays from data buffers.
>  
>
Well, the new chararray function actually does support this (it was easy 
enough to just do it).

Right now, the chararray's are essentially string and/or unicode 
ndarray's with added methods for rich-comparisons, and the same methods 
as strings and unicode objects.    It's also a nice example for how to 
do broadcasting in Python alone....

I would like to move the rich comparisions into the ndarray object at 
some point (either by having ufuncs supported for extended types or by 
special-casing the richcompare for string and unicode type ndarray's), 
so that any string or unicode type can use them...

>I suppose this points to the fact that I'm not clear on what different
>roles the string array (and unicode) and character arrays play. In 
>numarray
>it was thought that eventually that character arrays would support all 
>the
>string methods (within reason considering the constraints of fixed 
>size) and
>that made it different enough from numeric arrays. Is this detailed 
>anywhere?
>  
>
The string and unicode arrays are separate data-types for ndarray's.  
They are supported at a fundamental level throughout the code base.   In 
other words you can have an ndarray of type (string, 30) (i.e. 'S30') or 
(unicode, 45) (i.e. 'U45').  However, because ufuncs do not support 
extended types at this time, and the richcompare for the ndarray 
defaults to use ufuncs, rich comparisons don't work on them. 

Now, it would be possible to make it so that the ndarray supported the 
string methods for string and unicode arrays, but it also makes sense to 
subclass for that kind of special support, which is what is done now.

>I tried finding it in the latest version of the Guide but it seems that 
>the
>topic of string arrays isn't discussed a lot. So a brief outline of how
>you see this working might help (e.g., should we really be working on
>enhancing the string array instead of focusing on character arrays?)
>
>  
>
My thinking is that we should get at least the rich comparisons working 
for string/unicode arrays (whether this makes sense by expanding the 
ufuncs or simply special casing support for them in the richcomparison 
function is an immediate question).  I can see how it would be possible 
(but not trivial) to do it in the ufuncs (which would make the ufunc 
interface more flexible -- but maybe too flexible... I'm not sure I know 
the use case beyond the comparisions). 

Whether or not we should look at over-riding the getattribute function 
to add string and unicode methods for all string/unicode chararrays is 
another question, but that could also be done...

Then, again, it is an easy enough thing to wrap a string array in a 
subclass if you really want to call the string methods on all the items 
in the array... So, what is there now is workable and is essentially the 
same as what was in numarray (I think numarray string comparison 
functions are faster though --- they were compiled).

-Travis