why does this call to re.findall() loop forever?
Nick Craig-Wood
nick at craig-wood.com
Mon Nov 10 07:29:59 EST 2008
james.kirin40 at gmail.com <james.kirin40 at gmail.com> wrote:
> My apologies, given that Google Groups messes up the formatting, the
> regexp should read
>
> regexp = re.compile("""<li class=\"post\".*?<h4 class=\"desc\"><a
> href=
> \"(.*?)\" rel=\"nofollow\">(.*?)</a>.*?</div>\s*(?:<p class=\"notes
> \">(.*?)</p>)?.*?<div class=\"meta\">(?:to ((?:<a class=\"tag\".*?> )
> +))*.*?<span class=\"date\" title=\"(.*?)\">.*?</span>\s*</div>.*?</
> li>""", re.DOTALL)
Some regular expressions can't be searched in a reasonable length of
time. Not sure whether this is your problem but it might be! Search
for "exponential time regular expression" if you want some examples.
Eg http://bugs.python.org/issue1515829
I'd attack this problem using beatifulsoup probably rather than
regexps!
--
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list
mailing list