[SciPy-dev] Some Q's vis-a-vis Numpy unicode support

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Aug 11 23:18:14 EDT 2009


On Tue, Aug 11, 2009 at 10:28 PM, David
Goldsmith<d_l_goldsmith at yahoo.com> wrote:
> Thanks, Josef.  This may just be an artifact of working in a DOS Terminal (but your example, though not printing the accented e, did at least print something different for b vs. b.capitalize()), or it may be because I don't know the right encoding to use, but I tried your code w/ what I found on Wikipedia to be the unicode for the Greek letter delta, namely, u'\x03b04', with both 'cp1252' and 'iso8859-7' encoding (the latter being inferred from the same Wikipedia article) and here's what I get:
>
>>>> b = np.array([u'\x03b04',u'\x03b04'],'<U1').view(np.chararray)
>>>> print b.encode('cp1252')[0]
>>>>> print b.capitalize().encode('cp1252')[0]
>>>>> print b.encode('iso8859-7')[0]
>>>>> print b.capitalize().encode('iso8859-7')[0]
>>
> i.e., no difference.  If I'm doing something wrong, please let me know; otherwise, for the purpose of documenting chararray.capitalize() - which is my ultimate goal - is there any rhyme or reason behind which unicode characters capitalize() works on and which it doesn't?
>
> Thanks,
>
> DG
> --- On Tue, 8/11/09, josef.pktd at gmail.com <josef.pktd at gmail.com> wrote:
>
>> actually this works (in Idle)
>>
>> >>> b =
>> np.array([u'\xe9',u'\xe9'],'<U1').view(np.chararray)
>> >>> print b.encode('cp1252')[0]
>> é
>> >>> print b.capitalize().encode('cp1252')[0]
>> É
>> >>> print b[0].encode('cp1252')
>> é
>>
>>
>> this looks like a bug ? or is it a known limitation that
>> chararrays
>> cannot be 0-d
>>
>> >>> b0=
>> np.array(u'\xe9','<U1').view(np.chararray)
>> >>> print b0.encode('cp1252')
>> Traceback (most recent call last):
>>   File "<pyshell#47>", line 1, in
>> <module>
>>     print b0.encode('cp1252')
>>   File
>> "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
>> line 217, in encode
>>     return self._generalmethod('encode',
>> broadcast(self, encoding, errors))
>>   File
>> "C:\Programs\Python25\Lib\site-packages\numpy\core\defchararray.py",
>> line 162, in _generalmethod
>>     newarr[:] = res
>> ValueError: cannot slice a 0-d array
>>
>>
>> >
>> > Josef
>> >
>> >>>
>> >>> Unless the answer is "No," my real question:
>> >>>
>> >>> 1) Does chararray.capitalize() capitalize
>> non-Roman letters
>> >>> that have different lower-case and upper-case
>> forms (e.g.,
>> >>> the Greek letters)?  If "yes," are there any
>> exceptions
>> >>> (e.g., Russian letters)?

I think yes, exceptions are languages for which no capital letters
exist, Cantonese(Chinese) ?
http://www.isthisthingon.org/unicode/index.phtml?page=03&subpage=B&glyph=03B04
  ??? google search for 03B04,

>> >>>
>> >>> Thanks!
>> >>>
>> >>> DG
>> >>>
>> >>>

I have problems finding the correct codes for the characters and
usually need a word processor.

To me it looks like your character is not a greek delta

>>> print u'\x03b04'
b04
>>> print u'\u03b04'
ΰ4
>>> print u'\u03b4'
δ

I don't know what it is since it doesn't render to anything meaningful

I managed to get the greek delta through the html code for it δ from page:
http://www.isthisthingon.org/unicode/index.phtml?page=00&subpage=3&hilite=003B4


running this script:


# -*- coding: utf-8 -*-

sd = u'δ'
print sd

b = np.array([u'\u03b4',u'\u0394'],'<U1').view(np.chararray)
print b[0]
print repr(b[0])
print b.capitalize()[0]
print repr(b.capitalize()[0])

***********
prints this in my Idle shell
>>>
δ
δ
u'\u03b4'
Δ
u'\u0394'

delta is correctly capitalized


Josef



More information about the SciPy-Dev mailing list