regenerating unicodedata for py2.7 using py3 makeunicodedata.py?

Vlastimil Brom vlastimil.brom at gmail.com
Fri Nov 19 09:49:21 EST 2010


2010/11/18 Martin v. Loewis <martin at v.loewis.de>:
>
>> Thanks for the confirmation Martin!
>>
>> Do you think, it the mentioned omission of the character names of some
>> CJK ranges in unicodedata intended, or should it be reported to the
>> tracker?
>
> It's certainly a bug. So a bug report would be appreciated, but much
> more so a patch. Ideally, the patch would either be completely
> forward-compatible (should the CJK ranges change in future Unicode
> versions),
> or at least have a safe-guard to detect that the data file is getting
> out of sync with the C implementation.
>
> Regards,
> Martin
>
Thanks,
I just created a bug ticket:
http://bugs.python.org/issue10459

The omissions of character names seem to be:

龼 (0x9fbc) - 鿋 (0x9fcb)
 (CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff])

𪜀 (0x2a700) - 𫜴 (0x2b734)
(CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f])

𫝀 (0x2b740) - 𫠝 (0x2b81d)
 (CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f])

(Also the unprintable ASCII controls, Surrogates and Private use area,
where the missing names are probably ok.)

Unfortunately, I am not able to provide a patch, mainly because of
unicodadate being C code.
A while ago I considered writing some unicodedata enhancements in
python, which would support the ranges and script names, full category
names etc., but sofar the direct programatic lookups in the online
unicode docs and with some simple processing also do work
sufficiently...

Regards,
   Vlastimil Brom



More information about the Python-list mailing list