tips requested for a log-processing script

George Sakkis george.sakkis at gmail.com
Sun Nov 5 17:12:58 EST 2006


Jaap wrote:

> Apart from this I have a configuration file, which contains the list of
> itemID's i need to focus on per month. Not all itemID's are relevant for
> each month, but for example only every second or third month. All
> records in the logfile with other itemID's can be ignored. I have yet to
> define the format of this configuration file, but am thinking about a 0
> or 1 for each month, and then the itemID, like:
> "1 0 0 1 0 0 1 0 0 1 0 0 123456" for a itemID 123456 which only needs
> consideration at first month of each quarter.

It's probably not necessary if your records are in the order of 100K,
but if you're dealing with millions and above, you can write your
config file in binary using the struct module and condense it down to 6
bytes per record (32 bits for the ID and 12 bits for the months
occurences). Filtering will also be faster, as for each record you just
have to do a bitwise AND with the 0..010...0 mask corresponding to a
given month.

George




More information about the Python-list mailing list