How do I get to *all* of the groups of an re search?

Kyler Laird Kyler at news.Lairds.org
Fri Jan 10 16:07:39 EST 2003


claird at lairds.com (Cameron Laird) writes:

>1.  Harvey Thomas, in a nearby follow-up (how're
>    gateway propagation delays today?) has sum-
>    marized the main point far more aptly than 
>    anything I wrote:  "You can't return a vari-
>    able number of groups from a regex." (but 
>    can Perl people?  They apparently tried to
>    cram cement mixers, kitchen toasters, and
>    turbojets inside their REs, so, who knows?)

I'm not asking for a variable number of groups.  That would
be awkward and confusing.  I'd be quite happy with a group
being a list when it's appropriate.

As it is, I am resigned to understanding that Python's re
module makes an arbitrary and undocumented decision to return
the last instance of a match for a group.  I'm embarrassed.

At the very least, the documentation should be changed to say
that only the last match of a group will be returned.  Better
still would be an explanation of why the last one was chosen
and how that makes Python's behavior more predictable.

>2.  Oh, what you *really* want is HTML parsing.

No, I *really* want the re module to work like it's documented.
What will I do when I encounter a need to do something like this
and it doesn't happen to be related to HTML?

(What is it with Python people always trying to answer questions
that weren't asked?!  I expect to be blamed for breaking the re
module any moment.)

I do realize I shouldn't have included an HTML example.

>    There are serious limits to RE's applicability
>    in that role, as the columnists of <URL: http://
>    www.unixreview.com/documents/s=2472/uni1037388368795/ >
>    assert.

I certainly did not encounter a limitation with REs - I can
define the solution perfectly using an RE.  The problem is just
getting the Python re module to share its results.  Python's
broken re module doesn't make REs any less appropriate.

Seems like someone from regularexpressions.com would be a bit
more sympathetic...

>    Get an HTML parser--then be ready to
>    tweak it to accept all the junk that roams
>    around in the wild.

Exactly.  I think I've thrown up my hands most times I've
attempted to use an HTML parser.  I considered it for this
task but after thinking about it for awhile I decided that an
RE would be far more elegant.

(In fact, I wanted so much to use REs that I developed this as
a CGI script instead of doing it in Zope, where REs are not
supported.  Maybe this is another good reason for using Zope;
I only have to explain to people why REs can't be used instead
of why their handling is broken.)

--kyler




More information about the Python-list mailing list