str.isnumeric and Cuneiforms

Marco Buttu name.surname at gmail.com
Fri May 18 09:56:12 EDT 2012


On 05/18/2012 02:50 AM, Steven D'Aprano wrote:

>> Is it normal the str.isnumeric() returns False for these Cuneiforms?
>> >
>> >  '\U00012456'
>> >  '\U00012457'
>> >  '\U00012432'
>> >  '\U00012433'
>> >
>> >  They are all in the Nl category.

> Are you sure about that? Do you have a reference?

I I was just playing with Unicode on Python 3.3a:

 >>> from unicodedata import category, name
 >>> from sys import maxunicode

 >>> nl = [chr(c) for c in range(maxunicode + 1) \
... if category(chr(c)).startswith('Nl')]

 >>> numerics = [chr(c) for c in range(maxunicode + 1) \
... if chr(c).isnumeric()]

 >>> for c in set(nl) - set(numerics):
...     print(hex(ord(c)), category(c), unicodedata.name(c))
...
0x12432 Nl CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH
0x12433 Nl CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN
0x12456 Nl CUNEIFORM NUMERIC SIGN NIGIDAMIN
0x12457 Nl CUNEIFORM NUMERIC SIGN NIGIDAESH

So they are in the Nl category but are not "numerics", and that sounds 
strange because other Cuneiforms are "numerics":

 >>> '\U00012455'.isnumeric(), '\U00012456'.isnumeric()
(True, False)

> It seems to me that they are not:
>
>
> py>  c = '\U00012456'
> py>  import unicodedata
> py>  unicodedata.numeric(c)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in<module>
> ValueError: not a numeric character

Exactly, as I wrote above, is that right?

-- 
Marco



More information about the Python-list mailing list