[Tutor] parsing sendmail logs

Tue Jul 15 21:58:36 CEST 2008

"Monika Jisswel" <monjissvel at googlemail.com> wrote

> to say the truth I never  thought about "additional overhead of 
> getting the
> input/output data transferred" because the suprocess itself will 
> contain the
> (bash)pipe to redirect output to the next utility used not the 
> python
> subprocess.PIPE pipe so it will be like one subprocess with each 
> utility
> piping stdout to the next as if run from the shell,

If you run a pipeline chain from within subprocess every part of the
chain will be a separate process, thats a lot of overhead. Thats why
admins tend to prefer writing utilities in Perl rather than bash these 
days.
Also for the pipeline to work every element must work with text
which may notr be the native data type so we have to convert it
to/from ints or floats etc.

But mostly I was thinking about the I/O to//from the Python program.
If the sublprocess or pipeline is feeding data into Python it will 
usually
be much more efficient to store the data directly in Python variables
that to write it to stdout and read it back from stdin (which is what
happens in the pipeline).

> I have to say that I have seen awk, grep & sort, wc, work on files 
> of
> handreds of Mbytes in a matter of 1 or 2 seconds ... why would I 
> replace
> such a fast tools ?

awk is not very fast. It is an interpreted language in the old sense
of interpreted language, it literally ijnterprets line by line. I have 
not
compared awk to Python directly but I hhave compared it to perl
which is around 3 times faster in my experience and more if you
run awk from Perl rather than doing the equivalent in Perl.. Now
Python is usually a littlebit slower than Perl - especially when
using regex - but not that much slower so I'd expect Python to
be about 2 times faster than awk. (Not sure how nawk or gawk
compare, they may be compiled to byte code like perl/python.)

But as you say even awk if gast enough for normal sized use
so unless you are processing large numbers of files spawning
an awk process is probably not a killer, it just seems redundant
given that Python is just as capable for processing text and much
better at processing collections.

> Alan do you think python can beat awk in speed when it
> comes to replacing text ?  I always wanted to know it !

It would be interesting to try but I'd expect it to be significantly
faster, yes.

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld