[Python-bugs-list] [ python-Bugs-534613 ] urllib.unquote() is not idempotent

noreply@sourceforge.net noreply@sourceforge.net
Mon, 25 Mar 2002 23:39:33 -0800


Bugs item #534613, was opened at 2002-03-25 10:32
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=534613&group_id=5470

Category: Python Library
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Ralf Juengling (rjuengling)
Assigned to: Tim Peters (tim_one)
Summary: urllib.unquote() is not idempotent

Initial Comment:
There are URLs s, where

unquote(unquote(s)) != unquote(s)

I.e. unquote sometimes does not complete its task. Example (real world):

s = 
'/r?ck_sm=765ea975&ref=20096&r=http%3A%2F%2Fwww.altavista.com%2Fsites%2Fsearch%2Fmm
_resultframe%3Fq%3Dbike%26type%3DIMG%26url%3Dhttp%253A%252F%252Fvmesa17.u-3mrs.fr
%253A10081%252F%257Em9402001%252Fbike.htm%26title%3Dbike_1.jpg%26isrc%3Dhttp%253A
%252F%252Fthumb-1.image.altavista.com%252Fimage%252F49909424%26src%3Dhttp%253A%25
2F%252Fvmesa17.u-3mrs.fr%253A10081%252F%257Em9402001%252Fbike_1.jpg%26stq%3D50%2
6stype%3Dsimage'

unquote(s) = 
'/r?ck_sm=765ea975&ref=20096&r=http://www.altavista.com/sites/search/mm_resultframe?q=bike
&type=IMG&url=http%3A%2F%2Fvmesa17.u-3mrs.fr%3A10081%2F%7Em9402001%2Fbike.htm&titl
e=bike_1.jpg&isrc=http%3A%2F%2Fthumb-1.image.altavista.com%2Fimage%2F49909424&src=http
%3A%2F%2Fvmesa17.u-3mrs.fr%3A10081%2F%7Em9402001%2Fbike_1.jpg&stq=50&stype=simage
'
unquote(unquote(s)) = 
'/r?ck_sm=765ea975&ref=20096&r=http://www.altavista.com/sites/search/mm_resultframe?q=bike
&type=IMG&url=http://vmesa17.u-3mrs.fr:10081/~m9402001/bike.htm&title=bike_1.jpg&isrc=http:
//thumb-1.image.altavista.com/image/49909424&src=http://vmesa17.u-3mrs.fr:10081/~m9402001/
bike_1.jpg&stq=50&stype=simage'


----------------------------------------------------------------------

>Comment By: Ralf Juengling (rjuengling)
Date: 2002-03-26 07:39

Message:
Logged In: YES 
user_id=495820

Thanks for clearing up matters; now I agree, it's not a bug.
A pointer to RFC 1738 in the documentation would be reasonable to state more precisely, what unquote() is 
supposed to do.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-03-25 22:46

Message:
Logged In: YES 
user_id=31435

Python's behavior here is correct.  URL encoding is defined 
in RFC 1738:

http://www.ietf.org/rfc/rfc1738.txt

If you still believe Python's behavior is in error here, 
quote the RFC for justification.

If you ever find a URL decoder that, for example, changes

    %2525

into

    %

instead of into

    %25

it's a broken decoder.

----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2002-03-25 10:45

Message:
Logged In: YES 
user_id=43607

What makes you think that urllib.unquote *should* be idempotent?
The whole reason for unquote is to remove the %-escapes from a URL, and this *cannot* be idempotent.

I would say "Not a bug".

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=534613&group_id=5470