[Python-Dev] Unicode Character Names

Andy Robinson andy@reportlab.com
Thu, 23 Mar 2000 21:54:23 GMT


>Message: 20
>From: "Andrew M. Kuchling" <akuchlin@mems-exchange.org>
>Date: Thu, 23 Mar 2000 15:27:41 -0500 (EST)
>To: "python-dev@python.org" <python-dev@python.org>
>Subject: Re: [Python-Dev] Unicode character names
>
>Paul Prescod writes:
>>The new \N escape interpolates named characters within strings. For
>>example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
>>unicode smiley face at the end.=20
>
>Cute idea, and it certainly means you can avoid looking up Unicode
>numbers.  (You can look up names instead. :) )  Note that this means the
>Unicode database is no longer optional if this is done; it has to be
>around at code-parsing time.  Python could import it automatically, as
>exceptions.py is imported.  Christian's work on compressing
>unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
>dragging around the Unicode database in the binary, or is it read out
>of some external file or data structure?)

I agree - the names are really useful.  If you are doing conversion
work, often you want to know what a character is, but don't have a
complete Unicode font handy.  Being able to get the description for a
Unicode character is useful, as well as being able to use the
description as a constructor for it.

Also, there are some language specific things that might make it
useful to have the full character descriptions in Christian's
database.  For example, we'll have an (optional, not in the standard
library) Japanese module with functions like=20
isHalfWidthKatakana(), isFullWidthKatakana() to help normalize things.
Parsing the database and looking for strings in the descriptions is
one way to build this - not the only one, but it might be useful.

So I'd vote to put names in at first, and give us a few weeks to see
how useful they are before a final decision.

- Andy Robinson