[Tutor] RE help

Ron Nixon nixonron at yahoo.com
Tue Feb 15 20:59:58 CET 2005


Trying to scrape a newspaper site for articles using
this code whic ws done with help from the list:

import urllib, re
pattern = re.compile("""<h[1-2]><a
href="/(.*)">(.*).</p>""", re.DOTALL)
page
=urllib.urlopen("http://www.startribune.com").read()  

for headline, body in pattern.findall(page):
    print body

It should grab articles from this:

<h2><a href="/stories/507/5240764.html">Sid Hartman:
Franchise could be moved</a></h2><p>If Reggie Fowler
and his business partners from New Jersey are approved
to buy the Vikings franchise from Red McCombs, it is
my opinion the franchise remains in danger of
eventually being relocated.</p>

and give me this: Sid Hartman: Franchise could be
moved</a></h2><p>If Reggie Fowler and his business
partners from New Jersey are approved to buy the
Vikings franchise from Red McCombs, it is my opinion
the franchise remains in danger of eventually being
relocated.

Instead it gives me this:<b>Boxerjam</b></a>. from
this :
href="http://www.startribune.com/stories/1559/4773140.html"><b>Boxerjam</b></a>.
</p></div>

I know the re works in other programs I've tried. Is
there something different about re's in Python?




		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250


More information about the Tutor mailing list