speed problems

^ axel at axel.truedestiny.net
Thu Jun 3 10:33:58 EDT 2004


Hi group,

I've become interested in Python a while ago and just converted a simple
perl script to python. The script is very simple, it generates a list of
found virusses from some maillog files for further processing.
I've found that there's a huge difference in execution time for the scripts,
in favor of perl and I can't pinpoint what's going wrong;
perl runs:
0.07 real         0.05 user         0.01 sys
0.07 real         0.05 user         0.01 sys
0.07 real         0.04 user         0.02 sys
python runs:
0.27 real         0.23 user         0.03 sys
0.28 real         0.21 user         0.05 sys
0.27 real         0.19 user         0.06 sys

This was measured with  a small uncompressed logfile (1.4M). The difference
grows much bigger whenever it needs to uncompress things.

Here are both scripts, could you please have a look and tell me where I
should look for optimizations?

perl:
my (@maillogs) = (
                   "/home/logs/maillog",      "/home/logs/maillog.0.gz",
                   "/home/logs/maillog.1.gz", "/home/logs/maillog.2.gz",
                   "/home/logs/maillog.3.gz",
                 );

my ($gzip) = "/usr/bin/gzip";
my ($bzip2)= "/usr/bin/bzip2";

my ($total) = 0.0;
my (%virstat);

foreach my $logfile (@maillogs)
{
  if ( -f $logfile )
  {
    # is it compressed?
    if ( $logfile =~ /\.[bg]z2?$/ )
    {
      if ( !open LF, "$gzip -cd $logfile|" )
      {
        open LF, "$bzip2 -cd $logfile|" or
          die "unable to uncompress '$logfile'\n";
      }
    }
    else
    {
      open LF, "<$logfile" or die "couldn't open '$logfile'\n";
    }

    while (<LF>)
    {
      if (/INFECTED/)
      {
        # we need only the virus name
        $_ =~ s/.*INFECTED.*\((.*)\).*/$1/g;
        # if multiple virusses found
        if (/, /)
        {
          # split them
          my (@vir) = split /, /, $_;
          foreach my $v (@vir)
          {
            chomp $v;
            $virstat{$v}++;
            $total++;
          }
        }
        else
        {
          chomp;
          $virstat{$_}++;
          $total++;
        }
      }
    }
    close LF;
  }
#  else
#  {
#    print STDERR "'$logfile' doesn't exist, skipping it.\n";
#  }
}

foreach my $v (sort keys %virstat)
{
  my $p = ($virstat{$v}/$total)*100;
  $p = sprintf "%s:\t%5.2f%%", $v, $p;
  print "$p\n";
}
#---end of perl script ---

python:
import os
import string
import re

maillogs = [
             "/home/logs/maillog",      "/home/logs/maillog.0.gz",
             "/home/logs/maillog.1.gz", "/home/logs/maillog.2.gz",
             "/home/logs/maillog.3.gz"
           ]
virstat={}
total=0.0         # keep this float

for logfile in maillogs:
  if os.path.isfile( logfile ):
    # is it compressed?
    if logfile[-3:] == '.gz':
      import gzip
      lf = gzip.GzipFile( logfile, "r" )
    else:
      if logfile[-4:] == '.bz2':
        import bz2
        lf = bz2.BZ2File( logfile, "r" )
      else:
        # uncompressed
        lf = open( logfile, "r" )

    for line in lf.readlines():
      if string.count( line, "INFECTED" ):
        vname = re.compile( "INFECTED \((.*)\)" ).search( line ).group(1)
        if string.count( vname, ", " ):
          for vnam in string.split( vname, ", " ):
            if vnam not in virstat:
              virstat[vnam] = 1
            else:
              virstat[vnam] += 1
            total += 1
        else:
          if vname not in virstat:
            virstat[vname] = 1
          else:
            virstat[vname] += 1
          total += 1
    lf.close()
#  else:
#    print "logfile '%s' doesn't exist, skipping it." % logfile

for vname in virstat.keys():
  p = (virstat[vname]/total)*100
  print "%s:    %5.2f%%" % (vname,  p)
#--- End of python script ---


Thanks for any help you can provide,
Kind regards,

Axel





More information about the Python-list mailing list