Possible regex match bug (re module)

Randall Hopper aa8vb at vislab.epa.gov
Tue Apr 6 07:26:49 EDT 1999


Tim Peters:
 |>      Re doesn't handle named groups in alternative patterns like it
 |> seems it should.  Given an alternative pattern with a particular group
 |> name in each, it only assigns the match if the group name matches the
 |> last alterative.
 |
 |re should raise an exception here -- it never intended to allow your
 |pattern.  The deal is that symbolic group names are no more than that:
 |names for numbered groups.  Like so:

Thanks for the reply Tim.  BTW, what's "re never intended...".  A little AI
at work in that module?  :-)

Well, to the point, it seems to me it would be more intuitive to have named
groups in alternatives to be assigned strings only when an alternative is
matched.  It certainly yields more readable regexes:

     '(---(?P<id>[^-]*)---)|(===(?P<id>[^=]*)===)'
     r"([-=])\1\1(?P<id>((?!\1).)*)\1\1\1"

In which is it more apparent what the patterns are?  Or even how many there
are?

Also, as I noted, I simplified this example a good bit so that the re
behavior would be apparent.  The original regex was a good bit more
complex.  Basically it was parsing fields from a spreadsheet text import
file, where the fields are delimeted by commas, but fields can be single or
double quoted so that commas and spaces can be embedded:

       1,"Brown, Charlie",127.37,Hi

The field match regex for this wouldn't be as simple to collapse into a
single regex as you did above, and assuming it is possible, the result
would have been very tough to decipher.  I think we're also stuck in
attempting to do this if the prefixes don't match the suffixes, and the
named group matches aren't virtually identical.

     I'll post the regex later.  I'm not at the box it's sitting on right now.

     BTW, what I ended up doing (again, continuing with the trivial example
regex), was something like this:

     '(---(?P<id1>[^-]*)---)|(===(?P<id2>[^=]*)===)'

     str = id1 or id2

It just seemed to make since that I should be able to use "id" for both and
just say "str = id".

Thanks,

Randall




More information about the Python-list mailing list