Creating a dictionary from log file records

Stephen D Evans stevee at recombinant.demon.co.uk
Fri Feb 16 16:18:34 EST 2001


Roy,

For an alternative Python solution (without resorting to C) you could try
using compile() to create code objects for eval or exec rather than using a
while or for loop. On my Intel Pentium 200 machine the exec and eval
methods in the following code ran about 4% faster than the for loop. Please
note that I have only tested this on Python 2.0

####
filename = 'logfile.log'

# Old FORTRAN technique. Static allocation of list length.
# If an IndexError occurs, change the value and start again.
MAX_LEN = 20

# pre-create a list of code objects that return list 'L' as a dictionary
eval_code_object_list = [None] * MAX_LEN
exec_code_object_list = [None] * MAX_LEN
for i in range(0, MAX_LEN, 2): # assume list of even length
    text = ''
    for j in range(0, i, 2):
        text='%sL[%d]:L[%d],' % (text, j, j+1)
    text = '{%s}' % text

    eval_code_object_list[i] = compile(text, '', 'eval')

    text = 'dict=%s' % text
    exec_code_object_list[i] = compile(text, '', 'exec')


import time # for some stats

start_time = time.clock()

# for Python 1.5.2
print 'Python readlines():',
lines = open(filename, 'rt').readlines()

print time.clock() - start_time
start_time = time.clock()

# an alternative method for Python 2.0 (not as fast), will fail on Python
1.5.2
print 'Python read().splitlines():',
lines = open(filename, 'rt').read().splitlines()

print time.clock() - start_time

# Python 2.1 could use the 'for line in open(filename).xreadlines():'
method
# for large files

start_time = time.clock()
print 'for loop:',

## The list 'L' is converted to a dictionary 'dict' using a simple for
loop.
for line in lines:
    L = line.split()
    dict = {}
    for i in range(0, len(L), 2):
        dict[L[i]] = L[i+1]

print time.clock() - start_time
start_time = time.clock()
print 'eval:',

## The list 'L' is converted to a dictionary 'dict' using an index into the
list
## of code objects. Lookups and no iteration during the creation of 'dict'
for line in lines:
    L = line.split()
    dict = eval(eval_code_object_list[len(L)])

print time.clock() - start_time
start_time = time.clock()
print 'exec:',

## as above but use exec to produce dict
for line in lines:
    L = line.split()
    exec exec_code_object_list[len(L)]
    # dict is here !

print time.clock() - start_time

####

I normally do not bother with optimisations that give less than 10%
improvement.

There has been a request for a Python equivalent of the Perl split
mentioned
somewhere in the Python world...

Stephen D Evans

Roy.Culley at switzerland.org wrote:

> I'm new to python and am trying to convert a perl script which analyses
> firewall logs to python as a learning exercise.
>
> The log files contain lines of multiple key / value pairs such as:
>
>     interface qfe0 proto tcp src 1.1.1.1 dst 2.2.2.2 service smtp \
>         s_port 44008 len 44 rule 7
>
> Not all records are the same and the key / value pairs are not at
> fixed positions. In perl, assuming the line is in $_, I can do:
>
>     %Rec = split
>
> Is there an equivalent simple way to do this with python? I've done
> it by converting the data into a list and using a while loop to set
> the dictionary entries. However, the log files have about 4 million
> entries per day so I need something that is fast.
>
> Any help / suggestions most appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20010216/48cbc034/attachment.html>


More information about the Python-list mailing list