[Python-Dev] Why Foo is better than Baz

Andrew M. Kuchling akuchlin at cnri.reston.va.us
Mon May 3 17:14:15 CEST 1999


Fredrik Lundh writes:
>-- regexps: has anyone compared the new uni-
>code-aware regexp package in Tcl with pcre?

	I looked at it a bit when Tcl 8.1 was in beta; it derives from
Henry Spencer's 1998-vintage code, which seems to try to do a lot of
optimization and analysis.  It may even compile DFAs instead of NFAs
when possible, though it's hard for me to be sure.  This might give it
a substantial speed advantage over engines that do less analysis, but
I haven't benchmarked it.  The code is easy to read, but difficult to
understand because the theory underlying the analysis isn't explained
in the comments; one feels there should be an accompanying paper to
explain how everything works, and it's why I'm not sure if it really
is producing DFAs for some expressions.

	Tcl seems to represent everything as UTF-8 internally, so
there's only one regex engine; there's .  The code is scattered over
more files:

amarok generic>ls re*.[ch]
regc_color.c    regc_locale.c   regcustom.h     regerrs.h       regfree.c
regc_cvec.c     regc_nfa.c      rege_dfa.c      regex.h         regfronts.c
regc_lex.c      regcomp.c       regerror.c      regexec.c       regguts.h
amarok generic>wc -l re*.[ch]
     742 regc_color.c
     170 regc_cvec.c
    1010 regc_lex.c
     781 regc_locale.c
    1528 regc_nfa.c
    2124 regcomp.c
      85 regcustom.h
     627 rege_dfa.c
      82 regerror.c
      18 regerrs.h
     308 regex.h
     952 regexec.c
      25 regfree.c
      56 regfronts.c
     388 regguts.h
    8896 total
amarok generic>

	This would be an issue for using it with Python, since all
these files would wind up scattered around the Modules directory.  For
comparison, pypcre.c is around 4700 lines of code.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Things need not have happened to be true. Tales and dreams are the
shadow-truths that will endure when mere facts are dust and ashes, and forgot.
    -- Neil Gaiman, _Sandman_ #19: _A Midsummer Night's Dream_





More information about the Python-Dev mailing list