How do I get to *all* of the groups of an re search?
Kyler Laird
Kyler at news.Lairds.org
Fri Jan 10 09:41:46 EST 2003
On Fri, Jan 10, 2003 at 08:33:41AM -0500, Cameron Laird wrote:
> >How do I get to the other groups that were matched?
> Oh, it's matching all the groups.
Yes, I realize that. I just want to *access* those matched groups.
> Does the code below help
> explain why?
It verifies that the group is being matched, but it doesn't get me
any closer to the missing groups.
> I'm clumsy with REs--I don't immediately see how to achieve
> your desired result. I can quickly observe that
> import re
>
> text = 'foo foo1 foo2 bar bar1 bar2 bar3'
>
> test_re = re.compile('([a-z]+)(( \\1[0-9]+)+)')
>
> print test_re.findall(text)
> yields
> [('foo', ' foo1 foo2', ' foo2'), ('bar', ' bar1 bar2 bar3', ' bar3')]
Sure, I can do that but then I have to parse the second group
again. In this case it's fairly trivial, but in my application
there is a lot of junk in between each of the groups.
The text I'm matching is more like this.
<a href="foo.html">
blah blah blah
<img src="fooabc.jpg">
blah blah
<img src="foocde.jpg">
more stuff
</a>
I want [('foo', ['fooabc', 'foocde'])]. I have no problem with
getting the RE to match everything. It's just getting to all of
the matched groups that's stopping me.
If I use the RE you gave, I'll end up with something like this.
[('foo', ' blah blah blah <img src="fooabc.jpg"> blah blah <img src="foocde.jpg">', 'foocde')]
That's going to require me to reprocess the second element. It's
inefficient and ugly. Worse, it's not what I expected from the
description in the documentation.
--kyler
More information about the Python-list
mailing list