Missing unicode data?

Klaus Alexander Seistrup klaus at seistrup.dk
Sat Jun 3 04:36:43 EDT 2006


Hi group,

I just came across the following exception:

#v+

$ python
Python 2.4.2 (#2, Sep 30 2005, 21:19:01)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> u'\N{LATIN LETTER SMALL CAPITAL BARRED B}'
UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-38: unknown Unicode character name
>>> unicodedata.name(u'\u1d03')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: no such name
>>> ^D
$ 

#v-

When checking unicodedata.name() against each uchar in the file 
/usr/share/unidata/UnicodeData-4.0.1d1b.txt that came with the 
console-data package on my Ubuntu Linux installation a total of 
1226 unicode characters seems to be missing from the unicodedata 
module (2477 missing characters when checking against the latest 
database from unicode.org¹).  Is this a deliberate omission?

Cheers,
Klaus.

 ¹) http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
-- 
Klaus Alexander Seistrup
SubZeroNet, Copenhagen, Denmark
http://magnetic-ink.dk/



More information about the Python-list mailing list