[SciPy-dev] Some Q's vis-a-vis Numpy unicode support
josef.pktd at gmail.com
josef.pktd at gmail.com
Tue Aug 11 21:03:18 EDT 2009
On Tue, Aug 11, 2009 at 8:41 PM, <josef.pktd at gmail.com> wrote:
> On Tue, Aug 11, 2009 at 7:49 PM, David Goldsmith<d_l_goldsmith at yahoo.com> wrote:
>> OK, may have answered Q1 myself: unless I'm misunderstanding what I'm seeing, what I'm finding is that capitalize() does nothing at all if the chararray is of dtype unicode - correct? Thanks,
>
>
>>>> b
> chararray(u'\xe9',
> dtype='<U1')
>>>> b.capitalize()
> chararray(u'\xc9',
> dtype='<U1')
>
> see http://stackoverflow.com/questions/1006450/capitalizing-non-ascii-words-in-python
>
>
>
>>
>> DG
>>
>> --- On Tue, 8/11/09, David Goldsmith <d_l_goldsmith at yahoo.com> wrote:
>>
>>> From: David Goldsmith <d_l_goldsmith at yahoo.com>
>>> Subject: Some Q's vis-a-vis Numpy unicode support
>>> To: scipy-dev at scipy.org
>>> Date: Tuesday, August 11, 2009, 4:02 PM
>>> First, a "reality check" question:
>>>
>>> 0) Is Windows (DOS) Terminal capable of rendering unicode?
>
> not by default ( in US english at least)
> but the code page number can be changed, which I never tried
>
>>help graftabl
> Enable Windows to display an extended character set in graphics mode.
>
> GRAFTABL [xxx]
> GRAFTABL /STATUS
>
> xxx Specifies a code page number.
> /STATUS Displays the current code page selected for use with GRAFTABL.
>
>
>
> from python session in windows command shell (it prints correctly in
> case mail doesn't render it)
>>>> print u'\xe9'
> é
>>>> print u'\xe9'.capitalize()
> É
>>>> u'\xe9'.capitalize()
> u'\xc9'
>>>>
>
>
> but I cannot print any numpy.chararrays without getting
>>>> c= np.array(u'\xe9','<U1')
>>>> print c
> ....
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
> position 0: ordinal not in range(128)
>
> (this is in Idle, with cp1252 I think)
>
> the usual encode, decode problems with unicode, which take several
> hours of trial and error and reading docs to figure out.
actually this works (in Idle)
>>> b = np.array([u'\xe9',u'\xe9'],'<U1').view(np.chararray)
>>> print b.encode('cp1252')[0]
é
>>> print b.capitalize().encode('cp1252')[0]
É
>>> print b[0].encode('cp1252')
é
this looks like a bug ? or is it a known limitation that chararrays
cannot be 0-d
>>> b0= np.array(u'\xe9','<U1').view(np.chararray)
>>> print b0.encode('cp1252')
Traceback (most recent call last):
File "<pyshell#47>", line 1, in <module>
print b0.encode('cp1252')
File "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
line 217, in encode
return self._generalmethod('encode', broadcast(self, encoding, errors))
File "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
line 162, in _generalmethod
newarr[:] = res
ValueError: cannot slice a 0-d array
>
> Josef
>
>>>
>>> Unless the answer is "No," my real question:
>>>
>>> 1) Does chararray.capitalize() capitalize non-Roman letters
>>> that have different lower-case and upper-case forms (e.g.,
>>> the Greek letters)? If "yes," are there any exceptions
>>> (e.g., Russian letters)?
>>>
>>> Thanks!
>>>
>>> DG
>>>
>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>
More information about the SciPy-Dev
mailing list