How to make regexes faster? (Python v. OmniMark)

Fri Apr 19 22:06:10 EDT 2002

claird at starbase.neosoft.com (Cameron Laird) wrote in message news:<48F07A11B1D3ADCA.2691EAF64BEB7B96.D086A2379D2D43D4 at lp.airnews.net>...
> In article <3CBF9168.AAE0FC46 at engcorp.com>,
> Peter Hansen  <peter at engcorp.com> wrote:
> >"Frederick H. Bartlett" wrote:
> >> 
> >> I was recently introduced to OmniMark. One of our exercises was to take
> >> a plain text file of Hamlet and convert it to SGML.
> >> 
> >> So I did it in Python, too. But the best time I could get from Python
> >> was .57 sec, while OmniMark came in at .20 sec. What's the most
> >> efficient technique for Pythonesque regex-based text processing?
> >
> >Hmmm... how fast do you need it to be?  Sounds to me like 0.57 seconds
> >is pretty darned fast.
> >
> >Do you have specific goals, or are you just on a search for 
> >something faster?  Remember, "better is the enemy of good"
> >and the grass is always greener.
> >
> >(See http://www.seds.org/~chrisl/akin.html )
> >
> >-Peter
> 
> I was counting on Peter to write this.
> 
> Because it's correct, of course.  Moreover, you should
> know that Peter has payroll responsibilities.  He writes
> from a more practical perspective than other acquaintances
> might afford you.
> 
>

Another practical perspective:

When people mention excessive run-time to me, I ask them to rank the
severity of their problem on this scale:
1. Job doesn't finish before they get back to their desk with a fresh
mug of coffee.
2. Job has (exclusive access to database or excessive impact on
on-line response time) and doesn't finish by sun-up.
3. Job doesn't finish before the weekly cold back-up is due.