[New-bugs-announce] [issue10459] missing character names in unicodedata (CJK...)

Vlastimil Brom report at bugs.python.org
Fri Nov 19 15:36:27 CET 2010


New submission from Vlastimil Brom <vlastimil.brom at gmail.com>:

I just noticed an ommision of come character names in unicodedata module.
These are some CJK - Ideographs:

龼 (0x9fbc) - 鿋 (0x9fcb)
 (CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff])

𪜀 (0x2a700) - 𫜴 (0x2b734)
(CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f])

𫝀 (0x2b740) - 𫠝 (0x2b81d)
 (CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f])

The names are probably to be generated - e.g. CJK UNIFIED IDEOGRAPH-2A700 ... etc.

(Tested with the recompiled unicodedata - using unicode 6.0; with the py 27 - builtin module (unidata_version: '5.2.0') only the first two ranges are relevant (as CJK Unified Ideographs Extension D is an adition of Unicode 6)

(Also there are the unprintable ASCII controls, surrogates and private use areas, where the missing names are probably ok.)


I tested with the following rather clumsy code:

# # # # # # # # # # # # # # # 
# wide_unichr = custom unichr emulating unicode ranges beyond FFFF on narrow python build
codepoints_missing_char_names = [[-2,-2],] # dummy
for i in xrange(0x10FFFF+1):
    if unicodedata.category(wide_unichr(i))[:1] != 'C' and unicodedata.name(wide_unichr(i), u"??noname??") == u"??noname??":
        if codepoints_missing_char_names[-1][1] == i-1:
            codepoints_missing_char_names[-1][1] = i
        else:
            codepoints_missing_char_names.append([i, i])

for first, last in codepoints_missing_char_names[1:]:
    print u"%s (%s) - %s (%s)" % (wide_unichr(first), hex(first), wide_unichr(last), hex(last),)
# # # # # # # # # # # # # # # # # # # # # # # # # # 

Unfortunately, I can't provide a fix, as unicodedata involves C code, where my knowledge is near zero.

vbr

----------
messages: 121521
nosy: vbr
priority: normal
severity: normal
status: open
title: missing character names in unicodedata (CJK...)

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10459>
_______________________________________


More information about the New-bugs-announce mailing list