How do I get to all of the groups of an re search?

Fri Jan 10 10:34:30 EST 2003

In article <oe23f-ta3.ln1 at news.lairds.org>,
Kyler Laird  <Kyler at news.Lairds.org> wrote:
			.
			.
			.
>Sure, I can do that but then I have to parse the second group
>again.  In this case it's fairly trivial, but in my application
>there is a lot of junk in between each of the groups.
>
>The text I'm matching is more like this.
>	<a href="foo.html">
>	blah blah blah
>	<img src="fooabc.jpg">
>	blah blah
>	<img src="foocde.jpg">
>	more stuff
>	</a>
>I want [('foo', ['fooabc', 'foocde'])].  I have no problem with
>getting the RE to match everything.  It's just getting to all of
>the matched groups that's stopping me.
>
>If I use the RE you gave, I'll end up with something like this.
>	[('foo', ' blah blah blah <img src="fooabc.jpg"> blah blah <img
>src="foocde.jpg">', 'foocde')]
>That's going to require me to reprocess the second element.  It's
>inefficient and ugly.  Worse, it's not what I expected from the
>description in the documentation.
>
>--kyler

Got it.

1.  Harvey Thomas, in a nearby follow-up (how're
    gateway propagation delays today?) has sum-
    marized the main point far more aptly than 
    anything I wrote:  "You can't return a vari-
    able number of groups from a regex." (but 
    can Perl people?  They apparently tried to
    cram cement mixers, kitchen toasters, and
    turbojets inside their REs, so, who knows?)
2.  Oh, what you *really* want is HTML parsing.
    There are serious limits to RE's applicability
    in that role, as the columnists of <URL: http://
    www.unixreview.com/documents/s=2472/uni1037388368795/ >
    assert.  Get an HTML parser--then be ready to
    tweak it to accept all the junk that roams
    around in the wild.
-- 

Cameron Laird <Cameron at Lairds.com>
Business:  http://www.Phaseit.net
Personal:  http://phaseit.net/claird/home.html

How do I get to *all* of the groups of an re search?

How do I get to all of the groups of an re search?