[Tutor] Beautiful Soup
Shitiz Bansal
shitizb at yahoo.com
Wed Nov 29 21:42:29 CET 2006
Thanks, urlparse.urljoin did the trick.
Akash- the problem with directly prefixing url to the link is that the url most of the times contains not just the page address but also parameters and fragments.
Andreas Kostyrka <andreas at kostyrka.org> wrote: * Akash [061129 20:54]:
> On 11/30/06, Shitiz Bansal wrote:
> > I am using beautiful soup for extracting links from a web page.
> > Most pages use relative links in their pages which is causing a problem. Is
> > there any library to extract complete links or do i have to parse this
> > myself?
> >
>
> Beautiful Soup can also extract text which is present on the page. If
> there are no complete links no library can do that for you. But since
> you are reaching a certain web page to extract you already have that
> URL information with you. All you have to do then is to prefix it to
> each extracted URL.
Take a look at urlparse.urljoin from the standard library.
Andreas
---------------------------------
Access over 1 million songs - Yahoo! Music Unlimited.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20061129/68e071d6/attachment.html
More information about the Tutor
mailing list