Decoding url with country specific chars (like æøå)

Anders R raskpetersen at hotmail.com
Tue Sep 24 09:05:57 EDT 2002


Hi,

i had a fustrating experience while doing some url decoding on a
unicode string that contains country specific chars:

The original string (unicode, so it might look strange here):
TestProduct™, opdateret 27/8 Prøver lige en umlaut: ä

The string in encoded form:
TestProduct%26%238482%3B%2C%20opdateret%2027/8%20Pr%C3%B8ver%20lige%20en%20umlaut%3A%20%26%23228%3B

the output i get when i decode it with urllib.unquote_plus(var):
TestProduct™, opdateret 27/8 Pr%C3%B8ver lige en umlaut: ä
in case special chars are escaped:
TestProduct™, opdateret 27/8 Pr%C3%B8ver lige en umlaut:
ä

So instead of 'ø' in the (danish) word 'Prøver' isn't decoded correct,
in fact it looks like Python (version 2.1) totally leaves the '%C3%B8'
alone...

Can anyone explain this behaviour?

It seems like unicode *is* supported fine, since the TM sign and the ä
(a with umlaut) is translated fine!

Maybe its only us danes getting discriminated ;-O

please help :-S

//Rask



More information about the Python-list mailing list