how to scrape url out of href

Mike Meyer mwm at mired.org
Sun Jan 1 19:59:10 EST 2006


homepricemaps at gmail.com writes:
> i need to scrape a url out of an href.  it seems that people recommend
> that i use beautiful soup but had some problems.

What problem are you having with BeautifulSoup? It's working fine for
here.

> does anyone have sample code for scraping the actual url out of an href
> like this one
>
> <a href="http://www.cnn.com" target="_blank">

The following fragment works fine for me:

        linktext = soup.fetchText('Next')
        if not linktext:
            return pages
        else:
            url = linktext[0].findParent('a')['href']


So you probably want something like:

   for anchor in soup.fetch('a', {'target': '_blank'}):
       print anchor['href']


       <mike

-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list