fastest method

Thu Jun 21 09:59:01 EDT 2012

david.garvey at gmail.com wrote:
> I am looking for the fastest way to parse a log file.
>
>
> currently I have this... Can I speed this up any? The script is 
> written to be a generic log file parser so I can't rely on some 
> predictable pattern.
>
>
> def check_data(data,keywords):
>     #get rid of duplicates
>     unique_list = list(set(data))
>     string_list=' '.join(unique_list)
>     #print string_list
>     for keyword in keywords:
>         if keyword in string_list:
>             return True
>
>
> I am currently using file seek and maintaining a last byte count file:
>
> with open(filename) as f:
>     print "Here is filename:%s" %filename
>     f.seek(0, 2)
>     eof = f.tell()
>     print "Here is eof:%s" %eof
>     if last is not None:
>         print "Here is last:%s" %last
>         # if last is less than current
>         last = int(last)
>         if (eof - last  > 0):
>             offset = eof - last
>             offset = offset * -1
>             print "Here is new offset:%s" %offset
>             f.seek(offset, 2)
>             mylist = f.readlines()
>     else:
>         # if last doesn't exist or is greater than current
>         f.seek(0)
>         bof = f.tell()
>         print "Here is bof:%s" %bof
>         mylist = f.readlines()
>
>
>
> Thanks,
> -- 
> David Garvey
I have a log parser that take action upon some log patterns.
I rely on my system 'grep' program to do the hard work, i.e. find 
occurrences

Of course that means it is system dependant, but I don't think you can 
beat grep's speed.

    def _grep(self, link, pattern):
        # return the number of occurences for the pattern in the file
        proc = subprocess.Popen(['grep', '-c', pattern, link], 
stdout=subprocess.PIPE)
        return int(proc.communicate()[0])

Cheers,

JM