[Tutor] Logfile Manipulation
Stephen Nelson-Smith
sanelson at gmail.com
Mon Nov 9 06:41:12 CET 2009
I've got a large amount of data in the form of 3 apache and 3 varnish
logfiles from 3 different machines. They are rotated at 0400. The
logfiles are pretty big - maybe 6G per server, uncompressed.
I've got to produce a combined logfile for 0000-2359 for a given day,
with a bit of filtering (removing lines based on text match, bit of
substitution).
I've inherited a nasty shell script that does this but it is very slow
and not clean to read or understand.
I'd like to reimplement this in python.
Initial questions:
* How does Python compare in performance to shell, awk etc in a big
pipeline? The shell script kills the CPU
* What's the best way to extract the data for a given time, eg 0000 -
2359 yesterday?
Any advice or experiences?
S.
--
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
More information about the Tutor
mailing list