New implementation of re module

MRAB python at mrabarnett.plus.com
Wed Jul 29 11:45:55 EDT 2009


Mike wrote:
> On Jul 27, 11:34 am, MRAB <pyt... at mrabarnett.plus.com> wrote:
>> I've been working on a new implementation of the re module.
> 
> Fabulous!
> 
> If you're extending/changing the interface, there are a couple of sore
> points in the current implementation I'd love to see addressed:
> 
> - findall/finditer doesn't find overlapping matches.  Sometimes you
> really *do* want to know all possible matches, even if they overlap.
> This comes up in bioinformatics, for example.
> 
Perhaps by adding "overlapped=True"?

> - split won't split on empty patterns, e.g. empty lookahead patterns.
> This means that it can't be used for a whole class of interesting
> cases.  This has been discussed previously:
> 
>     http://bugs.python.org/issue3262
>     http://bugs.python.org/issue852532
>     http://bugs.python.org/issue988761
> 
Already addressed (see issue2636 for the full details).

> - It'd be nice to have a version of split that generates the parts
> (one by one) rather than returning the whole list.
> 
Hmm, re.splititer() perhaps.

> - Repeated subgroup match information is not available.  That is, for
> a match like this
> 
>     re.match('(.){3}', 'xyz')
> 
> there's no way to discover that the subgroup first matched 'x', then
> matched 'y', and finally matched 'z'.  Here is one past proposal
> (mine), perhaps over-complex, to address this problem:
> 
>     http://mail.python.org/pipermail/python-dev/2004-August/047238.html
> 
Yikes! I think I'll let you code that... :-)



More information about the Python-list mailing list