How do I get to *all* of the groups of an re search?

Tim Peters tim.one at comcast.net
Sat Jan 11 12:16:02 EST 2003


[Kyler Laird]
>>> Yes, and that surprises me.  It seems so obvious that it [a regexp
>>> group] should return all matched pieces and so arbitrary that it
>>> only returns the last one.

[Tim]

>> It would take potentially unbounded storage to remember all matches,

[Kyler]
> O.k.

>> and would also (at least) complicate the meaning of backreferences
>> (what is \1 >supposed to match then?  "a list" of all strings ever
>> matched by group 1?

> Yes, that is what I have suggested.

I don't think so.  You expected that .group(n) would return a list of all
substrings matched by group n, but that's a different question than what the
regular expression component \n should match.  If you mean what you say
here, then \1 as a regular expression component could never match anything
if group 1 matched more than once ("a list" doesn't match "a string").

>> the catentation of them?

> Lists seem a whole lot more appropriate.  Why would you
> eliminate the possibility of accessing them individually?

I'm not asking about what .group(1) should return, but about what the
backreference \1 should match *while* regexp search is in progress.  I think
you're missing this issue.

>>> ...
>>> Regardless, do you find it useful?  Can you think of any time
>>> when you want to match a bunch of things and just end up with
>>> the last one?

>> Sure.  For example, finding the last component of a path expression,
>> in>order to isolate the file name, is a common application.

> That's readily done using simple REs without depending on this
> behavior.

Of course, but what of it?  As you replied to another poster who pointed out
that there are other ways to do what you want, you don't care, you want what
you want.  Likewise people want their existing code to continue to work,
except possibly even more than you like to argue <wink>.

> ...
> Yes, I can understand that developers are already programmers
> and have been tainted by other languages.

You misunderstand, then.  The success of Perl5 in promoting a specific
regular expression notation was a major practical triumph, establishing a de
facto standard in an area previously splintered into hundreds of
irritatingly different dialects.  Perl6 will apparently splinter it again,
but sanity was fun while it lasted <0.9 wink>.

> I'm advocating the use of Python for people who have not learned
> other languages.

Then you're doing them a great disservice by pointing them toward regexps at
all.  They're a miserable (difficult to understand, use correctly, and
debug) subsystem for any newbie to tackle.  They survive because they can be
powerful in expert hands, and a casual newbie is best advised to copy an
expert's regexp if they feel they need to muck with regexps at all.  For
that practical newbie use case, uniformity of meaning across languages'
regexp notations is a great help -- it vastly increases the odds that "copy
and paste" coding will work for those without deep understanding of the
subject.

> They won't have the experience to look at a piece of Python
> documentation and say "Oh, but surely they don't mean *that*."

If they also have the humility to read all the docs before presuming they
understand any of it, no problem -- it's documented multiple times.  If you
actually write docs for newbies, you'll know that newbies also space out
easily when too much info is presented.  Repeating all relevant pieces of
info at every point isn't friendly to newbies either.






More information about the Python-list mailing list