[Python-Dev] Unicode character names

M.-A. Lemburg mal@lemburg.com
Thu, 23 Mar 2000 22:07:35 +0100


"Andrew M. Kuchling" wrote:
> 
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
> 
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers.  (You can look up names instead. :) )  Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time.  Python could import it automatically, as
> exceptions.py is imported.  Christian's work on compressing
> unicodedatabase.c is therefore really important.  (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)

Sorry to disappoint you guys, but the Unicode name and comments
are *not* included in the unicodedatabase.c file Christian
is currently working on. The reason is simple: it would add
huge amounts of string data to the file. So this is a no-no
for the core distribution...

Still, the above is easily possible by inventing a new
encoding, say unicode-with-smileys, which then reads in
a file containing the Unicode names and applies the necessary
magic to decode/encode data as Paul described above.

Would probably make a cool fun-project for someone who wants
to dive into writing codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/