unicode study with unicodedata module

Xah Lee xah at xahlee.org
Tue Mar 15 07:55:17 EST 2005


how do i get a unicode's number?

e.g. 03ba for greek lowercase kappa? (or in decimal form)

 Xah


Xah Lee wrote:
> python has this nice unicodedata module that deals with unicode
nicely.
>
> #-*- coding: utf-8 -*-
> # python
>
> from unicodedata import *
>
> # each unicode char has a unique name.
> # one can use the “lookup” func to find it
>
> mychar=lookup('greek cApital letter sIgma')
> # note letter case doesn't matter
> print mychar.encode('utf-8')
>
> m=lookup('CJK UNIFIED IDEOGRAPH-5929')
> # for some reason, case must be right here.
> print m.encode('utf-8')
>
> # to find a char's name, use the “name” function
> print name(u'天')
>
> basically, in unicode, each char has a number of attributes (called
> properties) besides its name. These attributes provides necessary
info
> to form letters, words, or processing such as sorting,
capitalization,
> etc, of varous human scripts. For example, Latin alphabets has two
> forms of upper case and lower case. Korean alphabets are stacked
> together. While many symbols corresponds to numbers, and there are
also
>
> combining forms used for example to put a bar over any letter or
> character. Also some writings systems are directional. In order to
form
>
> these symbols for display or process them for computing, info of
these
> on each char is necessary.
>
> the rest of functions in unicodedata return these attributes.
>
> see unicodedata doc:
> http://python.org/doc/2.4/lib/module-unicodedata.html
>
> Official word on unicode character properties:
> http://www.unicode.org/uni2book/ch04.pdf
>
> --
> i don't know what's the state of Perl's unicode. Is there something
> similar?
>
> --
> this post is archived at
> http://xahlee.org/perl-python/unicodedata_module.html
> 
>  Xah
>  xah at xahlee.org
>  http://xahlee.org/PageTwo_dir/more.html




More information about the Python-list mailing list