Internationalization bug?? [Python 2.2.1, RedHat 8.0, Swedish]

Martin v. Loewis martin at v.loewis.de
Sun Oct 13 14:31:06 EDT 2002


urban.anjar at hik.se (Urban Anjar) writes:

> Works fine in the python shell, but as a script I get an error. 
[...]
> Traceback (most recent call last):
>   File "./rev", line 10, in ?
>     s = unicode(s,"utf-8")
> UnicodeError: UTF-8 decoding error: invalid data

Can you please post the script you are using? Preferably by URL, or by
attaching it uuencoded.

> Seems that I have got a conflict between different coding systems.
> Cut-n-paste between emacs and the python prompt also generate some
> crazy characters instead of åäö.

If that has happened, it appears that you use Emacs to write the
script. What encoding is Emacs using to save the file?

Please understand that unicode(s,"utf-8") is only correct if s is
encoded in UTF-8.

Notice that Emacs does not support cut-n-paste of UTF-8.

> Are there any settings that the Python interpreter reads before
> running a script?

Yes, site.py

> Have I fu*ed up something?

Most likely. My guess is that Emacs uses a different encoding when
saving the file. Posting the script literally won't help, since your
news reader will again perform modifications. To analyse the problem,
one needs the file on a byte level.

Please understand that this is not primarily a problem with Python,
but with the processing of character strings in a computer per se.
Different places in the world use different encodings. If you need
more than 256 characters, things get really difficult, no matter what
you do. Python can do only so much about it, if it also wants to
preserve the backwards compatibility.

Regards,
Martin



More information about the Python-list mailing list