Internationalization bug?? [Python 2.2.1, RedHat 8.0, Swedish]

Alex Martelli aleax at aleax.it
Sun Oct 13 15:26:36 EDT 2002


Urban Anjar wrote:

> len() was not an issue, just a symptom of my problem. I was playing
> with a useless program to reverse a string and got funny results with åäö.
> Found out that some characters were two-byte and of course it will be
> funny if I reverse the bytes in a two-byte character...
        

I'd code this differently:

    temp = list(unicode(str, 'utf-8'))
    temp.reverse()
    print ''.join(temp).encode('utf-8'))

but of course that's not the point -- your life is much simpler
if, internally, you keep all strings in Unicode, and convert them
only if and when needed for input/output purposes.


> I still think this should be built-in in any way. I don't want
> to think about how things are coded internaly when doing Python
> programming, than I could chose C or assembler or something like
> that... I want to be able to tell kids 'n students that Python is
> easy and straightforward even for Swedish people...

Unfortunately, many different encodings are in use for input and
output of Unicode -- apparently your terminal wants utf-8, mine
wants iso-8859-15, yet another text terminal will want yet another
encoding.  You do have to deal with that, one way or another.  The
simplest way, IMHO, is to use module codecs' EncodedFile to wrap
sys.stdin and sys.stdout appropriately (i.e., with 'utf-8' in your
case, 'iso-8859-15' in mine, and so forth) and only manipulate
Unicode strings internally.

It makes no difference whether you're French, Swedish, Italian,
German, Danish, or Spanish -- and only very little if you're
Turkish, Estonian, Russian, and so forth; Unicode is the same
for all -- you only need to set the input/output codec however
appropriate for your system, and uniformly manipulate Unicode
strings, only, in your programs.  The wrapping of stdin and
stdout may usefully be performed in sitecustomize.py (create
it for the purpose if you don't have it yet).


Alex




More information about the Python-list mailing list