Internationalization bug?? [Python 2.2.1, RedHat 8.0, Swedish]

Gillou nospam at bigfoot.com
Sat Oct 12 13:38:38 EDT 2002


"Urban Anjar" <urban.anjar at hik.se> a écrit dans le message de news:
7d546104.0210120857.3f4e3857 at posting.google.com...
> Hi,
> I have found something that looks like a bug, or at least a not so
> pleasant feature. In Swedish we often use the characters å, ä and ö (a
> with a ring, a with two dots and o with two dots) and I don't get them
> to work perfectly
> well in Python.
>
> Python 2.2.1 (#1, Aug 30 2002, 12:15:30)
> [GCC 3.2 20020822 (Red Hat Linux Rawhide 3.2-4)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>>
> >>> S = 'abc'
> >>> print S
> abc
> >>> print len(S)
> 3
>
> That is perfectly OK, but...
>
> >>> S = 'åäö'
> >>> print S
> åäö
> >>> print len(S)
> 6
> Seems like every swedish character occupies 2 byte
> and len() returns number of byte but not number of
> characters...
>
>
> Look at this code snippet:
>
> #!/usr/bin/python
> def rev(S):
>      if  S:
>          return S[-1] + rev(S[:-1])
>      else:
>          return ''
>
> str = 'abcåäö'
> print rev(str)
>
> Running it gives:
> [urban at falcon urban]$ ./rev
> ?äå?cba
>
> I was waiting for  'öäåcba'
>
> Of course I can analyze how characters are representated in detail and
> make
> some kind of workaround, but I think this is not the Python way. In
> assembler or C I have to think of things like that but do I have to do
> that in Python?
>
> Another example:
>
> >>> L = ['Åke','Ärla','Östen']
> >>> print L
> ['\xc3\x85ke', '\xc3\x84rla', '\xc3\x96sten']
>
> Please let me know if I do something wrong or if you too think
> about this as a bug.
>
> There is some noice about Unicode... Does that solve my problems?
> How do I use it?
>
> Sincerely,
> Urban Anjar

Perhaps you should check your locales on your system. Python works perfectly
for me with French 8 bits characters (ISO-8859-1).
AFAIK, the characters set for Swedish are in the same encoding scheme.
In fact Python 2.1.3 on my French locale configured BSD box writes perfectly
swedish characters to the xterm console.

--Gilles






More information about the Python-list mailing list