Python and regexp efficiency.. again.. :)

M.-A. Lemburg mal at lemburg.com
Fri Dec 10 18:31:30 EST 1999


Markus Stenberg wrote:
> 
> I'm actually looking for bit of a ClueBrick(tm) regarding a system I have
> been writing for awhile now. The basic idea of The System(tm) is to monitor
> log files of various systems' different components  (syslog, ..).
> 
> Writing the toy in Python was very straightforward and quick
> process. Problems started to surface when I raised to myself a question,
> "is this fast enough?". As a background, I intended to use the tool to
> monitor fairly _heavily_ now and then spamming services, and therefore N
> megabytes of logs/day would be expected.
> 
> Problem:
>         ~900k log file (~10k lines) takes roughly 16 seconds to process,
>         after optimization, with roughly 150 different things to match for
>         (combined to one massive regexp). Initial version took half a
>         minute.
> 
> Question:
>         Can this be optimized further?

Hmm, have you tried mxTextTools ? As an example, the HTML parser
provided as example can easily handle 900kB HTML/sec. on a
K6/266 machine.

There are some nice tools available to help build the needed
Tagging Tables. More infos are available on my Python Pages,
including pointers to those tools.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    21 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/





More information about the Python-list mailing list