How do I get to *all* of the groups of an re search?

Andrew Dalke adalke at mindspring.com
Sat Jan 11 13:15:41 EST 2003


Tim Peters:
>>would also (at least) complicate the meaning of backreferences (what is \1
>>supposed to match then?  "a list" of all strings ever matched by group 1?
> 

Kyler Laird wrote:
> Yes, that is what I have suggested.

I can say for the implementation that that is tricky to implement
and susceptable to unexpected slowdowns.

>>Sure.  For example, finding the last component of a path expression, in
>>order to isolate the file name, is a common application.
> 
> 
> That's readily done using simple REs without depending on this
> behavior.

The simple re is

 >>> import re
 >>> pat = re.compile(r"(/([^/]+)*)+")
 >>> pat.match("/home/dalke/tmp/whatever.txt").group(2)
'whatever.txt'
 >>>

If I understand you rightly, you would want to use the same
re to get all the filename matches, eg, so that 'allmatches(2)'\
(or somesuch) would return ['home', 'dalke', 'tmp', 'whatever.txt']

So you do want your behaviour for all regexps with groups.


Again, I see your point in that the interface to regexp results
isn't as powerful as it could be.  I still think your proposed
inteface isn't itself powerful enough.

> Yes, I can understand that developers are already programmers
> and have been tainted by other languages.  I'm advocating the
> use of Python for people who have not learned other languages.
> They won't have the experience to look at a piece of Python
> documentation and say "Oh, but surely they don't mean *that*."

In my case, I learned about regexps from taking a theory of
automata course.  I tried to figure it out from the documentation
(this was the documentation for archie, some 10 years ago) but
only managed to understand ".*" and a couple other simple ones.

I learned regexps as a mathematical language first, without its
bindings to any specific language.  I then slowly learned about
all the tricks available in real regexps.  For example, back
references mean regexps are not regular languages, since you
can match a**n b a**n.

And I most assurdly learned how to use the regexps libraries
by reading the documentation many times and doing a lot of
experimentation.   And by coding in Perl for several years.

So I would have seen the comment in the documentation about how
they only match the last group and would have experimented with
it to get a better grasp of the behaviour.

Now in your case, you want to advocate Python for people who
haven't learned other languages.  In other words, for people
who are not programmers.  This is a new criteria, since you
hadn't mentioned it before.

With that additional information, I'll suggest that regular
expressions are not for the faint of heart nor for just about
every new programmer.  Eg, in Cameron's first response he said
"I am clumsy with REs."  Yet Cameron's been programming for
a while in many different languages and writes articles about
it.  Or take me.  I wrote a regexp engine and still had to
fiddle with the grammar a few times to make it work against
your original example.

					Andrew
					dalke at dalkescientific.com





More information about the Python-list mailing list