[Tutor] Beautiful Soup

Shitiz Bansal shitizb at yahoo.com
Wed Nov 29 21:42:29 CET 2006


Thanks, urlparse.urljoin did the trick.
Akash- the problem with directly prefixing url to the link is that the url most of the times contains not just the page address but also parameters and fragments.

Andreas Kostyrka <andreas at kostyrka.org> wrote: * Akash  [061129 20:54]:
> On 11/30/06, Shitiz Bansal  wrote:
> > I am using beautiful soup for extracting links from a web page.
> > Most pages use relative links in their pages which is causing a problem. Is
> > there any library to extract complete links or do i have to parse this
> > myself?
> >
> 
> Beautiful Soup can also extract text which is present on the page. If
> there are no complete links no library can do that for you. But since
> you are reaching a certain web page to extract you already have that
> URL information with you. All you have to do then is to prefix it to
> each extracted URL.
Take a look at urlparse.urljoin from the standard library.

Andreas


 
---------------------------------
Access over 1 million songs - Yahoo! Music Unlimited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20061129/68e071d6/attachment.html 


More information about the Tutor mailing list