How to make regexes faster? (Python v. OmniMark)

Andrew Dalke dalke at dalkescientific.com
Sat Apr 20 05:16:51 EDT 2002


[Donn Cave]
> By the way, I'll second Johannes Stiehler's recommendation of
> MxTextTools.  Definitely appropriate for SGML parsing, and much
> better than regexps for extensive parsing in my opinion - not
> just in terms of speed, but I suspect a more powerful way to
> describe text patterns than regexps.

Uncle Tim:
> Yes, it is.  "More convenient" is arguable, though -- there's a steep
> learning curve, but then people often forget how hard it was to learn
regexp
> syntax and pragmatics too.

Let's see.  Regexps.  I started learning them because of archie
searches.  I treated them as modified versions of file globs, and
slowly picked up a few other things over a couple months.

I then took a theory of computer languages course, which went into
DFAs, NFAs, and several other As.  I then had to learn how Perl did
it, which was different from grep, which was different from ....

So, figure three to four months, on and off.

Now, mxTextTools.  Three attempts.  First two failures took a week,
thereabouts.  The final success took another couple weeks to get a
working regexp parser (yes, on top of mxTextTools) going.  Let's
call it three weeks.  And this was after knowing a lot about how
those As work.

But I gotta say, mxTextTools is amazingly fast and flexible.  On
one test it was only 50% slower than fgrep for a simple text search.
(Not quite a direct comparison because the Python code was doing
other things as well.)

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list