Regex engines

Andrew Kuchling akuchlin at mems-exchange.org
Wed Jun 21 15:44:33 EDT 2000


[Posted only to c.l.tcl and comp.lang.python; comp.lang.perl doesn't exist.]

Paul Duffin <pduffin at hursley.ibm.com> writes:
> front end to a C++ program. Note that is Tcl, not Tk. And someone posted
> don't use Tcl, use Python because it has a Tk binding and "Tcl is crap".

Ummm... If you mean the thread "C++ and Tcl", that person was writing
about Perl, not Python.

> I would love to see (and this is not my idea) all of our communities
> working together to produce really simple building blocks which can be
> used by all of us. e.g.
> 	Regular expression engine.

Unfortunately all the regex engines seem inextricably tied in to their
respective languages.  

* Python 1.5 uses a hacked version of PCRE, which is a very
good library, but it doesn't handle Unicode, though its author has said
he's thinking about adding UTF-8 support.  I think PHP and some Scheme
interpreters also have bindings for PCRE.  

* Python 1.6 will use a new regex engine called SRE to handle both
8-bit and UTF-16 strings; the compiler is still written in Python at
this point, though the engine itself is in C.  

* Perl 5's regex engine is unreadable code, and disentangling it from
the rest of Perl seems impossible.

* Tcl uses a Unicode-aware engine written by Henry Spencer, but it's
not available as a standalone library.  At the Ottawa Linux conference
last July I asked about releasing it, but have seen no sign of it
almost a year later.

* Ruby seems to use a modified version of one of the GNU regex
libraries; I can't tell if it's GNU rx or regex.

Unifying this situation seems impossible; maybe when SRE's compiler is
translated to C, or if Spencer's library is released in a standalone
version...

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Thank you for letting me borrow your objects.
  -- Ute Lemper in concert, March 13, 1997



More information about the Python-list mailing list