Regex engines

Mo mo at nospam.com
Wed Jun 21 19:12:17 EDT 2000


Andrew Kuchling wrote:
> 
> [Posted only to c.l.tcl and comp.lang.python; comp.lang.perl doesn't exist.]
> 
> Paul Duffin <pduffin at hursley.ibm.com> writes:
> > front end to a C++ program. Note that is Tcl, not Tk. And someone posted
> > don't use Tcl, use Python because it has a Tk binding and "Tcl is crap".
> 
> Ummm... If you mean the thread "C++ and Tcl", that person was writing
> about Perl, not Python.
> 
> > I would love to see (and this is not my idea) all of our communities
> > working together to produce really simple building blocks which can be
> > used by all of us. e.g.
> >       Regular expression engine.
> 
> Unfortunately all the regex engines seem inextricably tied in to their
> respective languages.
> 
> * Python 1.5 uses a hacked version of PCRE, which is a very
> good library, but it doesn't handle Unicode, though its author has said
> he's thinking about adding UTF-8 support.  I think PHP and some Scheme
> interpreters also have bindings for PCRE.
> 
> * Python 1.6 will use a new regex engine called SRE to handle both
> 8-bit and UTF-16 strings; the compiler is still written in Python at
> this point, though the engine itself is in C.
> 
> * Perl 5's regex engine is unreadable code, and disentangling it from
> the rest of Perl seems impossible.
> 
> * Tcl uses a Unicode-aware engine written by Henry Spencer, but it's
> not available as a standalone library.  At the Ottawa Linux conference
> last July I asked about releasing it, but have seen no sign of it
> almost a year later.
> 
> * Ruby seems to use a modified version of one of the GNU regex
> libraries; I can't tell if it's GNU rx or regex.
> 
> Unifying this situation seems impossible; maybe when SRE's compiler is
> translated to C, or if Spencer's library is released in a standalone
> version...


Did you know that there is a standalone version of henry's regexp
engine ported to Java? It is included in the Jacl interp (Tcl in Java),
but it is just a package, so you should be able to use it anywhere.
In fact, it is just two file Regexp.java and Regsub.java. You should
check it out.

Mo DeJong
Red Hat Inc



More information about the Python-list mailing list