[SciPy-dev] Some Q's vis-a-vis Numpy unicode support

Tue Aug 11 21:03:18 EDT 2009

On Tue, Aug 11, 2009 at 8:41 PM, <josef.pktd at gmail.com> wrote:
> On Tue, Aug 11, 2009 at 7:49 PM, David Goldsmith<d_l_goldsmith at yahoo.com> wrote:
>> OK, may have answered Q1 myself: unless I'm misunderstanding what I'm seeing, what I'm finding is that capitalize() does nothing at all if the chararray is of dtype unicode - correct?  Thanks,
>
>
>>>> b
> chararray(u'\xe9',
>      dtype='<U1')
>>>> b.capitalize()
> chararray(u'\xc9',
>      dtype='<U1')
>
> see http://stackoverflow.com/questions/1006450/capitalizing-non-ascii-words-in-python
>
>
>
>>
>> DG
>>
>> --- On Tue, 8/11/09, David Goldsmith <d_l_goldsmith at yahoo.com> wrote:
>>
>>> From: David Goldsmith <d_l_goldsmith at yahoo.com>
>>> Subject: Some Q's vis-a-vis Numpy unicode support
>>> To: scipy-dev at scipy.org
>>> Date: Tuesday, August 11, 2009, 4:02 PM
>>> First, a "reality check" question:
>>>
>>> 0) Is Windows (DOS) Terminal capable of rendering unicode?
>
> not by default ( in US english at least)
> but the code page number can be changed, which I never tried
>
>>help graftabl
> Enable Windows to display an extended character set in graphics mode.
>
> GRAFTABL [xxx]
> GRAFTABL /STATUS
>
>   xxx      Specifies a code page number.
>   /STATUS  Displays the current code page selected for use with GRAFTABL.
>
>
>
> from python session in windows command shell (it prints correctly in
> case mail doesn't render it)
>>>> print u'\xe9'
> é
>>>> print u'\xe9'.capitalize()
> É
>>>> u'\xe9'.capitalize()
> u'\xc9'
>>>>
>
>
> but I cannot print any numpy.chararrays without getting
>>>> c= np.array(u'\xe9','<U1')
>>>> print c
> ....
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> position 0: ordinal not in range(128)
>
> (this is in Idle, with cp1252 I think)
>
> the usual encode, decode problems with unicode, which take several
> hours of trial and error and reading docs to figure out.

actually this works (in Idle)

>>> b = np.array([u'\xe9',u'\xe9'],'<U1').view(np.chararray)
>>> print b.encode('cp1252')[0]
é
>>> print b.capitalize().encode('cp1252')[0]
É
>>> print b[0].encode('cp1252')
é

this looks like a bug ? or is it a known limitation that chararrays
cannot be 0-d

>>> b0= np.array(u'\xe9','<U1').view(np.chararray)
>>> print b0.encode('cp1252')
Traceback (most recent call last):
  File "<pyshell#47>", line 1, in <module>
    print b0.encode('cp1252')
  File "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
line 217, in encode
    return self._generalmethod('encode', broadcast(self, encoding, errors))
  File "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
line 162, in _generalmethod
    newarr[:] = res
ValueError: cannot slice a 0-d array

>
> Josef
>
>>>
>>> Unless the answer is "No," my real question:
>>>
>>> 1) Does chararray.capitalize() capitalize non-Roman letters
>>> that have different lower-case and upper-case forms (e.g.,
>>> the Greek letters)?  If "yes," are there any exceptions
>>> (e.g., Russian letters)?
>>>
>>> Thanks!
>>>
>>> DG
>>>
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>