[Python-ideas] regex module - proper implementation of alternation?

Eli Bendersky eliben at gmail.com
Tue Jul 16 05:10:06 CEST 2013


Since the 'regex' module is a candidate for inclusion into the stdlib, I
figured this would be a good place to ask.

While discussing something related in pss (https://github.com/eliben/pss),
Ben Hoyt brought to my attention that the implementation of alternation
(foo|bar) in Python's default regex module (the SRE implementation) is very
inefficient. And indeed, looking there it seems that | gets translated to
an opcode that simply means going over all the alternatives in a loop
trying to match each. This is not how a proper regex engine should
implement alternation!

A common advice given to Python programmers is to combine their regexes
into a single one with |. This makes code faster, but it turns out that
it's far from its full potential because the combination doesn't go full
way to the DFA as it's supposed to.

A question about 'regex' - is it implemented properly there?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130715/1ad344bc/attachment.html>


More information about the Python-ideas mailing list