speed problems
^
axel at axel.truedestiny.net
Thu Jun 3 10:33:58 EDT 2004
Hi group,
I've become interested in Python a while ago and just converted a simple
perl script to python. The script is very simple, it generates a list of
found virusses from some maillog files for further processing.
I've found that there's a huge difference in execution time for the scripts,
in favor of perl and I can't pinpoint what's going wrong;
perl runs:
0.07 real 0.05 user 0.01 sys
0.07 real 0.05 user 0.01 sys
0.07 real 0.04 user 0.02 sys
python runs:
0.27 real 0.23 user 0.03 sys
0.28 real 0.21 user 0.05 sys
0.27 real 0.19 user 0.06 sys
This was measured with a small uncompressed logfile (1.4M). The difference
grows much bigger whenever it needs to uncompress things.
Here are both scripts, could you please have a look and tell me where I
should look for optimizations?
perl:
my (@maillogs) = (
"/home/logs/maillog", "/home/logs/maillog.0.gz",
"/home/logs/maillog.1.gz", "/home/logs/maillog.2.gz",
"/home/logs/maillog.3.gz",
);
my ($gzip) = "/usr/bin/gzip";
my ($bzip2)= "/usr/bin/bzip2";
my ($total) = 0.0;
my (%virstat);
foreach my $logfile (@maillogs)
{
if ( -f $logfile )
{
# is it compressed?
if ( $logfile =~ /\.[bg]z2?$/ )
{
if ( !open LF, "$gzip -cd $logfile|" )
{
open LF, "$bzip2 -cd $logfile|" or
die "unable to uncompress '$logfile'\n";
}
}
else
{
open LF, "<$logfile" or die "couldn't open '$logfile'\n";
}
while (<LF>)
{
if (/INFECTED/)
{
# we need only the virus name
$_ =~ s/.*INFECTED.*\((.*)\).*/$1/g;
# if multiple virusses found
if (/, /)
{
# split them
my (@vir) = split /, /, $_;
foreach my $v (@vir)
{
chomp $v;
$virstat{$v}++;
$total++;
}
}
else
{
chomp;
$virstat{$_}++;
$total++;
}
}
}
close LF;
}
# else
# {
# print STDERR "'$logfile' doesn't exist, skipping it.\n";
# }
}
foreach my $v (sort keys %virstat)
{
my $p = ($virstat{$v}/$total)*100;
$p = sprintf "%s:\t%5.2f%%", $v, $p;
print "$p\n";
}
#---end of perl script ---
python:
import os
import string
import re
maillogs = [
"/home/logs/maillog", "/home/logs/maillog.0.gz",
"/home/logs/maillog.1.gz", "/home/logs/maillog.2.gz",
"/home/logs/maillog.3.gz"
]
virstat={}
total=0.0 # keep this float
for logfile in maillogs:
if os.path.isfile( logfile ):
# is it compressed?
if logfile[-3:] == '.gz':
import gzip
lf = gzip.GzipFile( logfile, "r" )
else:
if logfile[-4:] == '.bz2':
import bz2
lf = bz2.BZ2File( logfile, "r" )
else:
# uncompressed
lf = open( logfile, "r" )
for line in lf.readlines():
if string.count( line, "INFECTED" ):
vname = re.compile( "INFECTED \((.*)\)" ).search( line ).group(1)
if string.count( vname, ", " ):
for vnam in string.split( vname, ", " ):
if vnam not in virstat:
virstat[vnam] = 1
else:
virstat[vnam] += 1
total += 1
else:
if vname not in virstat:
virstat[vname] = 1
else:
virstat[vname] += 1
total += 1
lf.close()
# else:
# print "logfile '%s' doesn't exist, skipping it." % logfile
for vname in virstat.keys():
p = (virstat[vname]/total)*100
print "%s: %5.2f%%" % (vname, p)
#--- End of python script ---
Thanks for any help you can provide,
Kind regards,
Axel
More information about the Python-list
mailing list