urllib.unquote and unicode

Thu Dec 21 07:12:08 EST 2006

Martin v. Löwis wrote:
> Duncan Booth schrieb:
>> The way that uri encoding is supposed to work is that first the input
>> string in unicode is encoded to UTF-8 and then each byte which is not in
>> the permitted range for characters is encoded as % followed by two hex
>> characters. 
> 
> Can you back up this claim ("is supposed to work") by reference to
> a specification (ideally, chapter and verse)?
> 
> In URIs, it is entirely unspecified what the encoding is of non-ASCII
> characters, and whether % escapes denote characters in the first place.

http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1

Servus,
   Walter