why does this call to re.findall() loop forever?

Sun Nov 9 18:42:13 EST 2008

james.kirin40 at gmail.com wrote:
> Hi everyone,
> 
> I am using Python's re module to extract some data from html. The
> following code never returns, and I was wondering if someone can
> explain to me why. Is this a problem with my regexp (I tried really
> hard to find it?)?
[snip] html/xml string
> regexp = re.compile("<li class=\"post\".*?<h4 class=\"desc\"><a href=
> \"(.*?)\" rel=\"nofollow\">(.*?)</a>.*?</div>\s*(?:<p class=\"notes
> \">(.*?)</p>)?.*?<div class=\"meta\">(?:to ((?:<a class=\"tag\".*?> )
> +))*.*?<span class=\"date\" title=\"(.*?)\">.*?</span>\s*</div>.*?</
> li>", re.DOTALL)
> 
> re.findall(regexp, s)

Python have several modules for parsing and working with xml.  Do you 
not know of them or is there some reason they won't work?