unicode mystery

Thu Sep 13 05:59:30 EDT 2001

Alex Rice <alex at integretechpub.com> writes:

> However, what I'm seeing coming out of Word and Python for a Greek 'Eta'
> character in Symbol font is:

When you say "out of Word and Python", what do you mean? Word has no
integrated Python support, nor does standard Python provide a Word
module. So I assume that you use the Word automation interface, right?

> It looks like F0 is designated "private use" in unicode space. Does this
> mean that either MSWord or python is doing something incorrect? 

No, neither does something wrong. Microsoft has all rights to use the
private use area of Unicode; this is precisely what it was designed
for. Unfortunately, by nature, two different interpretations of the
private use area are likely incompatible.

In Word, to interpret a character from the private use area, you have
to know what font family the character is formatted in. If it is from
a symbol font, you have to interpret them according to the symbol
charset, after subtracting 0xF000 from the Unicode numeric value. Note
that it is not safe to assume that all characters from the private use
area are symbol characters: They might be Wingdings characters as
well, or other characters which Microsoft was too lazy to find their
Unicode equivalents for (or which don't have Unicode equivalents).

> What are possible solutions to this?

In short, you *must* find out the font in use at the character
position. Then you can apply a Symbol-to-Unicode converter. You may
consider using

http://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt

as a starting point, and generating a Python codec for this using
Tools/scripts/gencodec.

Regards,
Martin