using re.finditer()

Wed Oct 27 16:35:04 EDT 2004

Erik Johnson wrote:

> pat = r'<td.*?>(.*?)</td>'
> for match in re.finditer(pat, html):
> print match.group(1)
> 
> 
> The iterator returned seems to work fine to step through items that
> happen to be contained within one line. That is, you can step through
> flat, one-line td's, but if you want to step through tr's, this doesn't
> work (run this code and notice Data 2-2 is not there). finditer() doesn't
> accept a flag like re.DOTALL, as re.match() and re.search() do. It seems a
> shame not to be able to put an otherwise smart design to use.

There was a discussion on python-dev recently concerning "missing arguments"
in re.findall() and re.finditer(), see

http://mail.python.org/pipermail/python-dev/2004-September/048662.html

I think no change was made as there is already an alternative spelling:

r = re.compile(r'<td.*?>(.*?)</td>', re.DOTALL)
for match in r.finditer(html):
    print match.group(1)

(or two, I didn't know about the option to embed flags in the string until
Robert Brewer's post).

Peter