speed problems

Martin Maney maney at pobox.com
Sat Jun 12 21:15:32 EDT 2004


Hans-Peter Jansen <hpj at urpla.net> wrote:
>      if logfile.endswith('.gz'):
>        #ifd, lfd = os.popen2("%s %s" % (gzip, logfile))
>        #XXX: cheating
>        ifd, lfd = os.popen2("%s %s | grep INFECTED" % (gzip, logfile))
>      elif logfile.endswith('.bz2'):
>        #ifd, lfd = os.popen2("%s %s" % (bzip2, logfile))
>        #XXX: cheating
>        ifd, lfd = os.popen2("%s %s | grep INFECTED" % (bzip2, logfile))
>      else:
>        # uncompressed
>        lfd = open(logfile, "r")

Why stop there?  You've left on the verge of collapsing into the fully
reduced (and regularized) form:

  if logfile.endswith('.gz'):
    cat_command = 'zcat'
  elif logfile.endswith('.bz2'):
    cat_command = 'bzcat'
  else:
    cat_command = 'cat'
  ifd, lfd = os.popen2("%s %s | grep INFECTED" % (cat_command, logfile))

(for that matter, is there some reason to use popen2 and the
unnecessary ifd?)

I've found it advantageous to preprocess large inputs with grep - the
tens of MB of squid logs that are skimmed by a useful little CGI script
really benefited from that!  Python's raw I/O may be as good as
anything, but for line by line parsing where a majority of the (many)
lines are discarded, a grep prefilter is a big win.

Which may or may not bring us back to... well, it's not a corollary to
Steve Lamb's guideline for using shell script, though it's clearly
related.  Maybe it's the contrapositive.  Never get so wrapped up in
using that Python hammer that you forget about those carefully honed
specialized tools.  Use that one line of shell to the best effect! 
<grin>

-- 
Anyone who calls economics the dismal science
has never been exposed to educationist theories
at any length.  An hour or two is a surfeit.



More information about the Python-list mailing list