Parsing an HTML a tag

George Sakkis gsakkis at rutgers.edu
Sat Sep 24 18:29:34 EDT 2005


"Stephen Prinster" <prinster at mail.com> wrote:
> George wrote:
> > How can I parse an HTML file and collect only that the A tags. I have a
> > start for the code but an unable to figure out how to finish the code.
> > HTML_parse gets the data from the URL document. Thanks for the help
>
> Have you tried using Beautiful Soup?
>
> http://www.crummy.com/software/BeautifulSoup/

I agree; you can do what you want in two lines:

from BeautifulSoup import BeautifulSoup
hrefs = [link['href'] for link in BeautifulSoup(urllib.urlopen(url)).fetch('a')]

George





More information about the Python-list mailing list