Unescaping URLs in Python

Lawrence D'Oliveiro ldo at geek-central.gen.new_zealand
Mon Dec 25 00:52:55 EST 2006


In message <hWHjh.25037$Gr2.6406 at newssvr21.news.prodigy.net>, John Nagle
wrote:

> Here's a URL from a link on the home page of a major company.
> 
> <a href="/adsk/servlet/index?siteID=123112&id=1860142">About Us</a>
> 
> What's the appropriate Python function to call to unescape a URL
> which might contain things like that?

Just use any HTML-parsing library. I think the standard Python HTMLParser
will do the trick, provided there aren't any errors in the HTML.

> Will this interfere with the usual "%" type escapes in URLs?

No. Just think of it as an HTML attribute value; the fact that it's a URL is
a question of later interpretation, nothing to do with the fact that it
comes from an HTML attribute.




More information about the Python-list mailing list