New implementation of re module

Mike tutufan at gmail.com
Wed Jul 29 13:21:39 EDT 2009


On Jul 29, 10:45 am, MRAB <pyt... at mrabarnett.plus.com> wrote:
> Mike wrote:
> > - findall/finditer doesn't find overlapping matches.  Sometimes you
> > really *do* want to know all possible matches, even if they overlap.
>
> Perhaps by adding "overlapped=True"?

Something like that would be great, yes.


> > - split won't split on empty patterns, e.g. empty lookahead patterns.

> Already addressed (see issue2636 for the full details).

Glad to hear it.


> > - Repeated subgroup match information is not available.  That is, for
> > a match like this
>
> >     re.match('(.){3}', 'xyz')
>
> > there's no way to discover that the subgroup first matched 'x', then
> > matched 'y', and finally matched 'z'.  Here is one past proposal
> > (mine), perhaps over-complex, to address this problem:
>
> >    http://mail.python.org/pipermail/python-dev/2004-August/047238.html
>
> Yikes! I think I'll let you code that... :-)

I agree that that document looks a little scary--maybe I was trying to
bite off too much at once.

My intuition, though, is that the basic idea should be fairly simple
to implement, at least for a depth-first matcher.  The repeated match
subgroups are already being discovered, it's just that they're not
being saved, so there's no way to report them out once a complete
match is found.  If some trail of breadcrumbs were pushed onto a stack
during the DFS, it could be traced at the end.  And the whole thing
might not even been that expensive to do.

The hardest parts about this, in my mind, are figuring out how to
report the repeated matches out in a useful form (hence all that
detail in the proposal), and getting users to understand that using
this feature *could* suck up a lot of memory, if they're not careful.

As always, it's possible that my intuition is totally wrong.  Plus I'm
not sure how this would work out in the breadth-first case.

Details aside, I would really, really, really like to have a way to
get at the repeated subgroup matches.  I write a lot of code that
would be one-liners if this capability existed.  Plus, it just plain
burns me that Python is discovering this information but impudently
refuses to tell me what it's found!  ;-)





More information about the Python-list mailing list