Converting relative URLs to absolute

James A Roush jar at mminternet.com
Wed Mar 13 13:08:08 EST 2002


In article <23891c90.0203130331.6f030f5b at posting.google.com>, paul at boddie.net 
says...
> Fernando Pérez <fperez528 at yahoo.com> wrote in message news:<a6mpda$79e$1 at peabody.colorado.edu>...
> > James A Roush wrote:
> > 
> > > Does anyone have any code that, given that absolute URL of a web page, can
> > > convert all the relative URLs on that page to their absolute equivalent?
> > 
> > Assuming absolute is a string and relative_list a list of strings, the 
> > followinng comes to mind:
> > 
> > [absolute+'/'+relative for relative in relative_list]
> > 
> > Maybe you wanted something fancier, don't know.
> 
> I suppose it would be nicer or more appropriate to deal with "back
> references" as well as being able to detect "base" elements. For
> example, given the following "base"...
> 
>   http://www.python.org/invented/framework/demo/
> 
> ...and the following URLs...
> 
>   moreinfo.html
>   docs.html
>   ../apps.html
>   ../../stuff.html
>   /index.html
>   http://www.zope.org
> 
> ...one would want to remove certain parts of the "base" before
> concatenating the relative URLs to it. Thus, we would produce these
> absolute URLs:
> 
>   http://www.python.org/invented/framework/demo/moreinfo.html
>   http://www.python.org/invented/framework/demo/docs.html
>   http://www.python.org/invented/framework/apps.html
>   http://www.python.org/invented/stuff.html
>   http://www.python.org/index.html
>   http://www.zope.org
> 
> I'm not so sure that urllib supports such operations, at least not in
> any version of it that I have (from Python 2.0 or 2.1). Instead,
> there's some fairly low-level split operations which aren't especially
> useful in this case. In addition, you might need to use some parser to
> get hold of any "base" elements in the HTML.
> 
> I've written some page-mining tools which help with these kinds of
> activities, and I suppose I should get round to releasing them at some
> point. Let me know if you're interested!
> 
> Paul

This precisely what I'm looking for.  As someone else pointed out in another 
post, urllib should have this but, sadly, does not.
-- 
-----------------------
James A Roush
jar @ mminternet.com
-----------------------



More information about the Python-list mailing list