[Python-Dev] Regular expressions, Unicode etc.

Nick Maclaren nmm1 at cus.cam.ac.uk
Fri Aug 10 10:28:58 CEST 2007


James Y Knight <foom at fuhm.net> wrote:
>
> > Firstly, things like backreferences are an absolute no-no.  They
> > are not regular, and REs with them in cannot be converted to DFAs.
> > That could be 'solved' by a parser that kicked out such constructions,
> > but it would get screams from many users.
> 
> People keep saying things like this as if GNU grep and tcl's regular  
> expression matchers didn't exist.
> See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example.

PCRE also has a breadth-first engine, but it does not convert the
NFA to a DFA (its author is a close colleague of mine).  Those
engines won't do the conversion, either, and I am prepared to bet
that I could produce a pattern that would either run very slowly
or expose the semantics differences in most of them.

I did NOT say that there were not, alternative, approaches.  What
I said was correct - you cannot convert such extended expressions
to DFAs.  You can convert them to things that are sort of NFA/DFA
hybrids, which might or might not be a good way to proceed.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679


More information about the Python-Dev mailing list