Converting relative URLs to absolute
brueckd at tbye.com
brueckd at tbye.com
Wed Mar 13 10:20:22 EST 2002
On Tue, 12 Mar 2002, James A Roush wrote:
> Does anyone have any code that, given that absolute URL of a web page, can
> convert all the relative URLs on that page to their absolute equivalent?
Hi James,
You have to separate problems here: finding/replacing the URLs and
converting relatives to absolutes. The second problem is the easiest:
>>> import urlparse
>>> absolute = 'http://www.foo.com/bar/baz/file.avi'
>>> url = '../boink/boom.mpg'
>>> urlparse.urljoin(absolute, url)
'http://www.foo.com/bar/boink/boom.mpg'
In order to handle the first problem you have to decide which URLs you're
interested in: just hrefs or everything (at least img, embed, object,
param can have URLs too). Also, you need to search the head of the HTML
document for an optional 'base' HTML tag that specifies the new base URL
to use.
-Dave
More information about the Python-list
mailing list