how to scrape url out of href
Mike Meyer
mwm at mired.org
Sun Jan 1 19:59:10 EST 2006
homepricemaps at gmail.com writes:
> i need to scrape a url out of an href. it seems that people recommend
> that i use beautiful soup but had some problems.
What problem are you having with BeautifulSoup? It's working fine for
here.
> does anyone have sample code for scraping the actual url out of an href
> like this one
>
> <a href="http://www.cnn.com" target="_blank">
The following fragment works fine for me:
linktext = soup.fetchText('Next')
if not linktext:
return pages
else:
url = linktext[0].findParent('a')['href']
So you probably want something like:
for anchor in soup.fetch('a', {'target': '_blank'}):
print anchor['href']
<mike
--
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list
mailing list