Unicode support in python

John Roth JohnRoth1 at jhrothjr.com
Fri Oct 20 11:07:21 EDT 2006


sonald wrote:
> Hi,
> I am using python2.4.1
>
> I need to pass russian text into python and validate the same.
> Can u plz guide me on how to make my existing code support the
> russian  text.
>
> Is there any module that can be used for unicode support in python?
>
> Incase of decimal numbers, how to handle "comma as a decimal point"
> within a number
>
> Currently the existing code is woking fine for English text
> Please help.
>
> Thanks in advance.
>
> regards
> sonal

As both of the other responders have said, the
coding comment at the front only affects source
text; it has absolutely no effect at run time. In
particular, it's not even necessary to use it to
handle non-English languages as long as you
don't want to write literals in those languages.

What seems to be missing is the notion that
external files are _always_ byte files, and have to
be _explicitly_ decoded into unicode strings,
and then encoded back to whatever the external
encoding needs to be, each and every time you
read or write a file, or copy string data from
byte strings to unicode strings and back.
There is no good way of handling this implicitly:
you can't simply say "utf-8" or "iso-8859-whatever"
in one place and expect it to work.

You've got to specify the encoding on each and
every open, or else use the encode and decode
string methods. This is a great motivation for
eliminating duplication and centralizing your
code!

For your other question: the general words
are localization and locale. Look up locale in
the index. It's a strange subject which I don't
know much about, but that should get you 
started.

John Roth




More information about the Python-list mailing list