[Numpy-discussion] Extent of unicode types in numpy

Gerard Vermeulen gerard.vermeulen at grenoble.cnrs.fr
Wed Feb 8 01:30:02 EST 2006


On Wed, 08 Feb 2006 01:41:18 -0700
Travis Oliphant <oliphant.travis at ieee.org> wrote:


> >Well, probably I've overlooked something, but I really think that this
> >would be a nice thing to do.
> >  
> >
> There are details in the scalar-array conversions (getitem and setitem 
> that would have to be implemented but it is possible.  The UCS4 --> 
> UTF-16 encoding is one of the easiest.  It's done in unicodeobject.h in 
> Python, but I'm not sure it's exposed other than going through the 
> interpreter.
> 
> Does this seem like a solution that everyone can live with?
> 

Yes.

The only point that worries me a little bit that some problems are limited
by memory or memory bandwidth and for those cases UCS2 arrays are better
than UCS4 arrays.

I have run into memory problems before and I don't know if it will happen
for unicode strings.  Time will tell.

Gerard





More information about the NumPy-Discussion mailing list