urllib.quote fails on Unicode URL

John Nagle nagle at animats.com
Fri May 4 02:15:28 EDT 2007


    The code in urllib.quote fails on Unicode input, when
called by robotparser.

    That bit of code needs some attention.
    - It still assumes ASCII goes up to 255, which hasn't been true in Python
      for a while now.
    - The initialization may not be thread-safe; a table is being initialized
      on first use.  The code is too clever and uncommented.

"robotparser" was trying to check if a URL,
"http://www.highbeam.com/DynamicContent/%E2%80%9D/mysaved/privacyPref.asp%22"
could be accessed, and there are some wierd characters in there.  Unicode
URLs are legal, so this is a real bug.

Logged in as Bug #1712522.

					John Nagle
	



More information about the Python-list mailing list