i18n: looking for expertise

Thu Mar 3 09:00:30 EST 2005

klappnase wrote:
> Hello all,
>
> I am trying to internationalize my Tkinter program using gettext and
> encountered various problems, so it looks like it's not a trivial
> task.

Considered that you decided to support old python versions, it's true.
Unicode support has gradually improved. If you choose to target old
python version, basically you're dealing with years old unicode
support.

> After some "research" I made up a few rules for a concept that I hope
> lets me avoid further encoding trouble, but I would feel more
> confident if some of the experts here would have a look at the
> thoughts I made so far and told me if I'm still going wrong somewhere
> (BTW, the program is supposed to run on linux only). So here is what
> I have so far:
>
> 1. use unicode instead of byte strings wherever possible. This can be
> a little tricky, because in some situations I cannot know in advance
> if a certain string is unicode or byte string; I wrote a helper
> module for this which defines convenience methods for fail-safe
> decoding/encoding of strings and a Tkinter.UnicodeVar class which I
> use to convert user input to unicode on the fly (see the code below).

I've never used tkinter, but I heard good things about it. Are you
sure it's not you who made it to return byte string sometimes?
Anyway, your idea is right, make IO libraries always return unicode.

> 3. make sure to NEVER mix unicode and byte strings within one
> expression

As a rule of thumb you should convert byte strings into unicode
strings at input and back to byte strings at output. This way
the core of your program will have to deal only with unicode
strings.

> 4. in order to maintain code readability it's better to risk excess
> decode/encode cycles than having one too few.

I don't think so. Either you need decode/encode or you don't.

> 5. file operations seem to be delicate;

You should be ready to handle unicode errors at file operations as
well as for example ENAMETOOLONG error. Any file system with path
argument can throw it, I don't think anything changed here with
introduction of unicode. For example access can return 11 (on
my linux system) error codes, consider unicode error to be twelveth.

> at least I got an error when I
> passed a filename that contains special characters as unicode to
> os.access(), so I guess that whenever I do file operations
> (os.remove(), shutil.copy() ...) the filename should be encoded back
> into system encoding before;

I think python 2.3 handles that for you. (I'm not sure about the
version)
If you have to support older versions, you have to do it yourself.

> 6. messages that are printed to stdout should be encoded first, too;
> the same with strings I use to call external shell commands.

If you use stdout as dump device just install the encoder in the
beginning of your program, something like

sys.stdout = codecs.getwriter(...) ...
sys.stderr = codecs.getwriter(...) ...

  Serge.