Converting relative URLs to absolute

brueckd at tbye.com brueckd at tbye.com
Wed Mar 13 10:20:22 EST 2002


On Tue, 12 Mar 2002, James A Roush wrote:

> Does anyone have any code that, given that absolute URL of a web page, can
> convert all the relative URLs on that page to their absolute equivalent?

Hi James,

You have to separate problems here: finding/replacing the URLs and
converting relatives to absolutes. The second problem is the easiest:

>>> import urlparse
>>> absolute = 'http://www.foo.com/bar/baz/file.avi'
>>> url = '../boink/boom.mpg'
>>> urlparse.urljoin(absolute, url)
'http://www.foo.com/bar/boink/boom.mpg'

In order to handle the first problem you have to decide which URLs you're
interested in: just hrefs or everything (at least img, embed, object,
param can have URLs too). Also, you need to search the head of the HTML
document for an optional 'base' HTML tag that specifies the new base URL
to use.

-Dave





More information about the Python-list mailing list