unicode "table of character" implementation in python

Brian Beck exogen at gmail.com
Tue Aug 22 11:46:49 EDT 2006


Nicolas Pontoizeau wrote:
> I am handling a mixed languages text file encoded in UTF-8. Theres is
> mainly French, English and Asian languages. I need to detect every
> asian characters in order to enclose it by a special tag for latex.
> Does anybody know if there is a unicode "table of character"
> implementation in python? I mean, I give a character and python replys
> me with the language in which the character occurs.

Nicolas, check out the unicodedata module:
http://docs.python.org/lib/module-unicodedata.html

Find "import unicodedata" on this page for how to use it:
http://www.amk.ca/python/howto/unicode

I'm not sure if it has built-in support for finding which language block a
character is in, but a table like this might help you:
http://www.unicode.org/Public/UNIDATA/Blocks.txt

-- 
Brian Beck
Adventurer of the First Order



More information about the Python-list mailing list