How do I get to *all* of the groups of an re search?

Kyler Laird Kyler at news.Lairds.org
Fri Jan 10 09:41:46 EST 2003


On Fri, Jan 10, 2003 at 08:33:41AM -0500, Cameron Laird wrote:

> >How do I get to the other groups that were matched? 

> Oh, it's matching all the groups.

Yes, I realize that.  I just want to *access* those matched groups.

> Does the code below help
> explain why?

It verifies that the group is being matched, but it doesn't get me
any closer to the missing groups.

> I'm clumsy with REs--I don't immediately see how to achieve
> your desired result.  I can quickly observe that
>   import re
> 
>   text = 'foo foo1 foo2 bar bar1 bar2 bar3'
> 
>   test_re = re.compile('([a-z]+)(( \\1[0-9]+)+)')
> 
>   print test_re.findall(text) 
> yields
>   [('foo', ' foo1 foo2', ' foo2'), ('bar', ' bar1 bar2 bar3', ' bar3')]

Sure, I can do that but then I have to parse the second group
again.  In this case it's fairly trivial, but in my application
there is a lot of junk in between each of the groups.

The text I'm matching is more like this.
	<a href="foo.html">
	blah blah blah
	<img src="fooabc.jpg">
	blah blah
	<img src="foocde.jpg">
	more stuff
	</a>
I want [('foo', ['fooabc', 'foocde'])].  I have no problem with
getting the RE to match everything.  It's just getting to all of
the matched groups that's stopping me.

If I use the RE you gave, I'll end up with something like this.
	[('foo', ' blah blah blah <img src="fooabc.jpg"> blah blah <img src="foocde.jpg">', 'foocde')]
That's going to require me to reprocess the second element.  It's
inefficient and ugly.  Worse, it's not what I expected from the
description in the documentation.

--kyler




More information about the Python-list mailing list