Unescaping URLs in Python
Lawrence D'Oliveiro
ldo at geek-central.gen.new_zealand
Mon Dec 25 00:52:55 EST 2006
In message <hWHjh.25037$Gr2.6406 at newssvr21.news.prodigy.net>, John Nagle
wrote:
> Here's a URL from a link on the home page of a major company.
>
> <a href="/adsk/servlet/index?siteID=123112&id=1860142">About Us</a>
>
> What's the appropriate Python function to call to unescape a URL
> which might contain things like that?
Just use any HTML-parsing library. I think the standard Python HTMLParser
will do the trick, provided there aren't any errors in the HTML.
> Will this interfere with the usual "%" type escapes in URLs?
No. Just think of it as an HTML attribute value; the fact that it's a URL is
a question of later interpretation, nothing to do with the fact that it
comes from an HTML attribute.
More information about the Python-list
mailing list