unicode mystery

Ignacio Vazquez-Abrams ignacio at openservices.net
Wed Sep 12 15:06:06 EDT 2001


On Wed, 12 Sep 2001, Alex Rice wrote:

> I have some unicode chars- python unicode objects- the source of these
> chars is MSWord docs using the "Symbol" font. Mathematical operators,
> Greek chars, and so forth. unicodedata.name() returns ValueError and the
> unicode numbers don't seem to match the code charts.
>
> For instance, the symbol font character Eta: according to the Unicode
> charts,
> Eta should be one of
> 1D776 MATHEMATICAL SANS-SERIF BOLD SMALL ETA
> 03B7 GREEK SMALL LETTER ETA
>
> However, what I'm seeing coming out of Word and Python for a Greek 'Eta'
> character in Symbol font is:
>
> `char` => u'\uf068'
> `type(char)` => <type 'unicode'>
> `ord(char)`  => 61544
>
> It looks like F0 is designated "private use" in unicode space. Does this
> mean that either MSWord or python is doing something incorrect? What are
> possible solutions to this?

Something that can be done is that a new Unicode codec can be written for an
encoding like 'MSUnicode' that takes Microsoft's non-standard Unicode encoding
and transcodes to or from standard UCS-2. Take a look at:

  http://www.python.org/doc/current/lib/module-codecs.html

-- 
Ignacio Vazquez-Abrams  <ignacio at openservices.net>





More information about the Python-list mailing list