[Tutor] BeautifulSoup confusion

Steve Lyskawa steve.mckmps at gmail.com
Fri Apr 10 01:27:22 CEST 2009


I am not a programmer by trade but I've been using Python for 10+ years,
usually for text file conversion and protocol analysis.  I'm having a
problem with Beautiful Soup.  I can get it to scrape off all the href links
on a web page but I am having problems selecting specific URI's from the
output supplied by Beautiful Soup.
What exactly is it returning to me and what command would I use to find that
out?  Do I have to take each line it give me and put it into a list before I
can, for example, get only certain URI's containing a certain string or use
the results to get the web page that the URI is referring to?

The pseudo code for what I am trying to do:

Get all URI's from web page that contain string "env.html"
Open the web page it is referring to.
Scrape selected information off of that page.

I'm have problem with step #1.  I can get all URI's but I can't see to get
re.compile to work right.  If I could get it to give me the URI only without
tags or link description, that would be ideal.

Thanks for your help.

Steve Lyskawa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090409/7472fe3c/attachment.htm>


More information about the Tutor mailing list