urllib.unquote and unicode

Tue Dec 19 00:02:58 EST 2006

George Sakkis wrote:
> The following snippet results in different outcome for (at least) the
> last three major releases:
>
> >>> import urllib
> >>> urllib.unquote(u'%94')
>
> # Python 2.3.4
> u'%94'
>
> # Python 2.4.2
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0:
> ordinal not in range(128)
>
> # Python 2.5
> u'\x94'
>
> Is the current version the "right" one or is this function supposed to
> change every other week ?

IMHO, none of the results is right. Either unicode string should be
rejected by raising ValueError or it should be encoded with ascii
encoding and result should be the same as
urllib.unquote(u'%94'.encode('ascii')) that is '\x94'. You can consider
current behaviour as undefined just like if you pass a random object
into some function you can get different outcome in different python
versions.

  -- Leo