Creating a dictionary from log file records
Stephen D Evans
stevee at recombinant.demon.co.uk
Fri Feb 16 16:18:34 EST 2001
Roy,
For an alternative Python solution (without resorting to C) you could try
using compile() to create code objects for eval or exec rather than using a
while or for loop. On my Intel Pentium 200 machine the exec and eval
methods in the following code ran about 4% faster than the for loop. Please
note that I have only tested this on Python 2.0
####
filename = 'logfile.log'
# Old FORTRAN technique. Static allocation of list length.
# If an IndexError occurs, change the value and start again.
MAX_LEN = 20
# pre-create a list of code objects that return list 'L' as a dictionary
eval_code_object_list = [None] * MAX_LEN
exec_code_object_list = [None] * MAX_LEN
for i in range(0, MAX_LEN, 2): # assume list of even length
text = ''
for j in range(0, i, 2):
text='%sL[%d]:L[%d],' % (text, j, j+1)
text = '{%s}' % text
eval_code_object_list[i] = compile(text, '', 'eval')
text = 'dict=%s' % text
exec_code_object_list[i] = compile(text, '', 'exec')
import time # for some stats
start_time = time.clock()
# for Python 1.5.2
print 'Python readlines():',
lines = open(filename, 'rt').readlines()
print time.clock() - start_time
start_time = time.clock()
# an alternative method for Python 2.0 (not as fast), will fail on Python
1.5.2
print 'Python read().splitlines():',
lines = open(filename, 'rt').read().splitlines()
print time.clock() - start_time
# Python 2.1 could use the 'for line in open(filename).xreadlines():'
method
# for large files
start_time = time.clock()
print 'for loop:',
## The list 'L' is converted to a dictionary 'dict' using a simple for
loop.
for line in lines:
L = line.split()
dict = {}
for i in range(0, len(L), 2):
dict[L[i]] = L[i+1]
print time.clock() - start_time
start_time = time.clock()
print 'eval:',
## The list 'L' is converted to a dictionary 'dict' using an index into the
list
## of code objects. Lookups and no iteration during the creation of 'dict'
for line in lines:
L = line.split()
dict = eval(eval_code_object_list[len(L)])
print time.clock() - start_time
start_time = time.clock()
print 'exec:',
## as above but use exec to produce dict
for line in lines:
L = line.split()
exec exec_code_object_list[len(L)]
# dict is here !
print time.clock() - start_time
####
I normally do not bother with optimisations that give less than 10%
improvement.
There has been a request for a Python equivalent of the Perl split
mentioned
somewhere in the Python world...
Stephen D Evans
Roy.Culley at switzerland.org wrote:
> I'm new to python and am trying to convert a perl script which analyses
> firewall logs to python as a learning exercise.
>
> The log files contain lines of multiple key / value pairs such as:
>
> interface qfe0 proto tcp src 1.1.1.1 dst 2.2.2.2 service smtp \
> s_port 44008 len 44 rule 7
>
> Not all records are the same and the key / value pairs are not at
> fixed positions. In perl, assuming the line is in $_, I can do:
>
> %Rec = split
>
> Is there an equivalent simple way to do this with python? I've done
> it by converting the data into a list and using a while loop to set
> the dictionary entries. However, the log files have about 4 million
> entries per day so I need something that is fast.
>
> Any help / suggestions most appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20010216/48cbc034/attachment.html>
More information about the Python-list
mailing list