Python and regexp efficiency.. again.. :)
M.-A. Lemburg
mal at lemburg.com
Fri Dec 10 18:31:30 EST 1999
Markus Stenberg wrote:
>
> I'm actually looking for bit of a ClueBrick(tm) regarding a system I have
> been writing for awhile now. The basic idea of The System(tm) is to monitor
> log files of various systems' different components (syslog, ..).
>
> Writing the toy in Python was very straightforward and quick
> process. Problems started to surface when I raised to myself a question,
> "is this fast enough?". As a background, I intended to use the tool to
> monitor fairly _heavily_ now and then spamming services, and therefore N
> megabytes of logs/day would be expected.
>
> Problem:
> ~900k log file (~10k lines) takes roughly 16 seconds to process,
> after optimization, with roughly 150 different things to match for
> (combined to one massive regexp). Initial version took half a
> minute.
>
> Question:
> Can this be optimized further?
Hmm, have you tried mxTextTools ? As an example, the HTML parser
provided as example can easily handle 900kB HTML/sec. on a
K6/266 machine.
There are some nice tools available to help build the needed
Tagging Tables. More infos are available on my Python Pages,
including pointers to those tools.
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 21 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
More information about the Python-list
mailing list