Creating a dictionary from log file records

Fri Feb 16 13:11:18 EST 2001

In article <mailman.982341324.16674.python-list at python.org>,
	"Sean 'Shaleh' Perry" <shaleh at valinux.com> wrote:
>
> On 16-Feb-2001 Roy.Culley at switzerland.org wrote:
>> I'm new to python and am trying to convert a perl script which analyses
>> firewall logs to python as a learning exercise.
>> 
>> The log files contain lines of multiple key / value pairs such as:
>> 
>>     interface qfe0 proto tcp src 1.1.1.1 dst 2.2.2.2 service smtp \
>>         s_port 44008 len 44 rule 7
>> 
>> Not all records are the same and the key / value pairs are not at
>> fixed positions. In perl, assuming the line is in $_, I can do:
>> 
>>     %Rec = split
>> 
>> Is there an equivalent simple way to do this with python? I've done
>> it by converting the data into a list and using a while loop to set
>> the dictionary entries. However, the log files have about 4 million
>> entries per day so I need something that is fast.
>> 
> 
> so for every line in the file, you create a temporary perl hash.  What do you
> do with the hash?  generating a new hash for every line can not be but so fast.

On a Sun E250 with a single 300MHz processor the perl script can
process a days log of 3.5 - 4 MB in a little over half an hour. From
the hash I make lots of other hashes which I then use to produce a log
report with many different types of summary data. For example,
services by IP source address, destination address (both accepted and
dropped connections), connections by src addr + dst addr + service,
etc. These are all in hashes. Many different combinations. This
enables us to see quickly scans (both IP addr and port), what services
are used most (helps order firewall rules), etc. It is quite a simple
perl script and most of the time is spent sorting the hashes and
producing the report. The initial data extraction into the hashes is
very fast.