[Tutor] Value Error solved. Another question

Mon Feb 14 12:06:38 CET 2005

Ron Nixon wrote:
> Ignore my first posting. Here's what I'm trying to do.
> I want to extract headlines from a newspaper's website
> using this code. It works, but I want to match the
> second group in <h2><a href="(.*)">(.*)</p> and print
> that out.
> Sugguestions
> 
> 
> import urllib, re
> pattern = re.compile("""<h2><a
> href="(.*)">(.*)</p>""", re.DOTALL)
> page =
> urllib.urlopen("http://www.startribune.com").read()   
> for headline in pattern.findall(page):
>     print headline

I think you want
for headline, body in pattern.findall(page):
     print body

pattern.findall() returns a list of tuples of groups. You have two groups in your regex so in your 
code headline is being assigned to a tuple with two items. In my code the tuple is split and you can 
print just the second item.

PS You might want to look at BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/

Kent

> 
> 
> 	
> 		
> __________________________________ 
> Do you Yahoo!? 
> Yahoo! Mail - You care about security. So do we. 
> http://promotions.yahoo.com/new_mail
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>