Strings and Unicode

- madsurfer2000 at hotmail.com
Mon Jul 21 14:01:31 EDT 2003


sjmachin at lexicon.net (John Machin) wrote in message news:<c76ff6fc.0307201651.34040c07 at posting.google.com>...
> madsurfer2000 at hotmail.com (-) wrote in message news:<fef0a228.0307200828.6c171de7 at posting.google.com>...
> > 
> > I would have expected the following:
> > ['param=abc+%E6']
> 
> So would I. See below. However despite the fact that the last
> character in your 'value' shows up as "small ae ligature" in MSIE, we
> would really like to see some code of yours that *minimally* and
> *unambiguously* shows the problem, and can be executed in the minimal
> Python environment (i.e. sans gui, command prompt, again: see below).
> 

Turns out it wasn't Python's fault. I used IDLE, and it saved the
program in UTF-8 format. I didn't think of checking that before,
because I assumed it wouldn't save in that format. When the parameters
were sent to urlencode, it was interpreted as 8 bit characters. The
'æ'-character ("small ae ligature")is represented as 2 characters in
UTF-8, so urlencode() encoded them separatly.

The solution was to open the file in Notepad (!) and save it in a
different format. When I ran the program again, it produced the
expected results.

Is there any way I can use IDLE, and still save in ISO-Latin1? The
problem is probably related to the input from the keyboard.

It seems that I can edit the file after I have converted it from
UTF-8, and keep the encoding if I use the lower part of the character
set. If I press the 'æ' key on the keyboard, the character is
translated to a 2 byte representation, but the other characters are
left alone.




More information about the Python-list mailing list