[Tutor] Beautiful Soup

Andreas Kostyrka andreas at kostyrka.org
Wed Nov 29 21:07:15 CET 2006


* Akash <akashmahajan at gmail.com> [061129 20:54]:
> On 11/30/06, Shitiz Bansal <shitizb at yahoo.com> wrote:
> > I am using beautiful soup for extracting links from a web page.
> > Most pages use relative links in their pages which is causing a problem. Is
> > there any library to extract complete links or do i have to parse this
> > myself?
> >
> 
> Beautiful Soup can also extract text which is present on the page. If
> there are no complete links no library can do that for you. But since
> you are reaching a certain web page to extract you already have that
> URL information with you. All you have to do then is to prefix it to
> each extracted URL.
Take a look at urlparse.urljoin from the standard library.

Andreas


More information about the Tutor mailing list