New implementation of re module

Mike tutufan at gmail.com
Wed Jul 29 11:24:52 EDT 2009


On Jul 27, 11:34 am, MRAB <pyt... at mrabarnett.plus.com> wrote:
> I've been working on a new implementation of the re module.

Fabulous!

If you're extending/changing the interface, there are a couple of sore
points in the current implementation I'd love to see addressed:

- findall/finditer doesn't find overlapping matches.  Sometimes you
really *do* want to know all possible matches, even if they overlap.
This comes up in bioinformatics, for example.

- split won't split on empty patterns, e.g. empty lookahead patterns.
This means that it can't be used for a whole class of interesting
cases.  This has been discussed previously:

    http://bugs.python.org/issue3262
    http://bugs.python.org/issue852532
    http://bugs.python.org/issue988761

- It'd be nice to have a version of split that generates the parts
(one by one) rather than returning the whole list.

- Repeated subgroup match information is not available.  That is, for
a match like this

    re.match('(.){3}', 'xyz')

there's no way to discover that the subgroup first matched 'x', then
matched 'y', and finally matched 'z'.  Here is one past proposal
(mine), perhaps over-complex, to address this problem:

    http://mail.python.org/pipermail/python-dev/2004-August/047238.html

Mike




More information about the Python-list mailing list