[Tutor] RegEx query

Sat Dec 17 08:59:55 CET 2005

Hi all,

Using Beautiful Soup and regexes.. I've noticed that all the examples
used regexes like so - anchors = parseTree.fetch("a",
{"href":re.compile("pattern")} )  instead of precompiling the pattern.

Myself, I have the following code -
>>> z = []
>>> x = q.findNext("a", {"href":re.compile(".*?thread/[0-9]*?/.*",
re.IGNORECASE)})

>>> while x:
... 	num = x.findNext("td", "tableColA")
... 	h = (x.contents[0],x.attrMap["href"],num.contents[0])
... 	z.append(h)
... 	x = x.findNext("a",{"href":re.compile(".*?thread/[0-9]*?/.*",
re.IGNORECASE)})
...

This gives me a correct set of results. However, using the following -

>>> z = []
>>> pattern = re.compile(".*?thread/[0-9]*?/.*", re.IGNORECASE)
>>> x = q.findNext("a", {"href":pattern)})

>>> while x:
... 	num = x.findNext("td", "tableColA")
... 	h = (x.contents[0],x.attrMap["href"],num.contents[0])
... 	z.append(h)
... 	x = x.findNext("a",{"href":pattern} )

will only return the first found tag.

Is the regex only evaluated once or similar?

(Also any pointers on how to get negative lookahead matching working
would be great.
the regex (/thread/[0-9]*)(?!\/) still matches "/thread/28606/" and
I'd assumed it wouldn't.

Regards,

Liam Clarke