Internationalization bug?? [Python 2.2.1, RedHat 8.0, Swedish]

Urban Anjar urban.anjar at hik.se
Sun Oct 13 13:52:01 EDT 2002


martin at v.loewis.de (Martin v. Loewis) wrote in message news:<m3of9zjm90.fsf at mira.informatik.hu-berlin.de>...
> urban.anjar at hik.se (Urban Anjar) writes:

> 
> It appears you are using an UTF-8 locale. In UTF-8, every accented
> latin character takes two bytes; many characters (CJK in particular)
> even take three bytes.
> 

Seems partly so, but something makes me confused

>>> s = "åäö"
>>> s = unicode(s,"utf-8")
>>> s = rev(s)
>>> print s.encode("utf-8")
öäå
>>>

Works fine in the python shell, but as a script I get an error. 

I also find that kind of constructs rather ugly. Would be better 
to redefine assignment and printing so we can work with a consistent
"string type" that is more straightforward to work with.

Python is supposed to be an easy language for kids and beginners, 
isn't it...

[urban at falcon urban]$ ./rev
Traceback (most recent call last):
  File "./rev", line 10, in ?
    s = unicode(s,"utf-8")
UnicodeError: UTF-8 decoding error: invalid data
[urban at falcon urban]$

Seems that I have got a conflict between different coding systems.
Cut-n-paste between emacs and the python prompt also generate some
crazy characters instead of åäö.

I run RedHat Linux 8.0 pretty right out of the box. The set command
shows


GDM_LANG=sv_SE.UTF-8
GNOME_DESKTOP_SESSION_ID=Default
GTK_RC_FILES=/etc/gtk/gtkrc:/home/urban/.gtkrc-1.2-gnome2
INPUTRC=/etc/inputrc
LANG=sv_SE.UTF-8
LESSOPEN='|/usr/bin/lesspipe.sh %s'
LINES=24
LOGNAME=urban
LS_COLORS='no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;
5:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:'
MACHTYPE=i686-pc-linux-gnu
MAIL=/var/spool/mail/urban
MAILCHECK=60
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
SUPPORTED=en_US.UTF-8:en_US:en:sv_SE.UTF-8:sv_SE:sv
XMODIFIERS=@im=none
_=/etc/bashrc
i=/etc/profile.d/vim.sh
langfile=/home/urban/.i18n
(Cut away lots of lines that I'm pretty shure are irrelevant)

Are there any settings that the Python interpreter reads before
running
a script? Have I fu*ed up something?

Urban



More information about the Python-list mailing list