Could you suggest optimisations ?

Wed Jan 14 03:45:37 EST 2009

Hi Terry,

-----Original Message-----
From: Terry Reedy [mailto:tjreedy at udel.edu]
Sent: Wednesday, January 14, 2009 01:57
To: python-list at python.org
Subject: Re: Could you suggest optimisations ?

Barak, Ron wrote:
> Hi,
>
> In the attached script, the longest time is spent in the following
> functions (verified by psyco log):

I cannot help but wonder why and if you really need all the rigamorole with file pointers, offsets, and tells instead of

for line in open(...):
   do your processing.

I'm building a database of the found events in the logs (those records between the first and last regexs in regex_array).
The user should then be able to navigate among these events (among other functionality).
This is why I need the tells and offsets, so I'd know the place in the logs where an event starts/ends.

Bye,
Ron.

>
>     def match_generator(self,regex):
>         """
>         Generate the next line of self.input_file that
>         matches regex.
>         """
>         generator_ = self.line_generator()
>         while True:
>             self.file_pointer = self.input_file.tell()
>             if self.file_pointer != 0:
>                 self.file_pointer -= 1
>             if (self.file_pointer + 2) >= self.last_line_offset:
>                 break
>             line_ = generator_.next()
>             print "%.2f%%   \r" % (((self.last_line_offset -
> self.input_file.tell()) / (self.last_line_offset * 1.0)) * 100.0),
>             if not line_:
>                 break
>             else:
>                 match_ = regex.match(line_)
>                 groups_ = re.findall(regex,line_)
>                 if match_:
>                     yield line_.strip("\n"), groups_
>
>     def get_matching_records_by_regex_extremes(self,regex_array):
>         """
>         Function will:
>         Find the record matching the first item of regex_array.
>         Will save all records until the last item of regex_array.
>         Will save the last line.
>         Will remember the position of the beginning of the next line in
>         self.input_file.
>         """
>         start_regex = regex_array[0]
>         end_regex = regex_array[len(regex_array) - 1]
>
>         all_recs = []
>         generator_ = self.match_generator
>
>         try:
>             match_start,groups_ = generator_(start_regex).next()
>         except StopIteration:
>             return(None)
>
>         if match_start != None:
>             all_recs.append([match_start,groups_])
>
>             line_ = self.line_generator().next()
>             while line_:
>                 match_ = end_regex.match(line_)
>                 groups_ = re.findall(end_regex,line_)
>                 if match_ != None:
>                     all_recs.append([line_,groups_])
>                     return(all_recs)
>                 else:
>                     all_recs.append([line_,[]])
>                     line_ = self.line_generator().next()
>
>     def line_generator(self):
>         """
>         Generate the next line of self.input_file, and update
>         self.file_pointer to the beginning of that line.
>         """
>         while self.input_file.tell() <= self.last_line_offset:
>             self.file_pointer = self.input_file.tell()
>             line_ = self.input_file.readline()
>             if not line_:
>                 break
>             yield line_.strip("\n")
>
> I was trying to think of optimisations, so I could cut down on
> processing time, but got no inspiration.
> (I need the "print "%.2f%%   \r" ..." line for user's feedback).
>
> Could you suggest any optimisations ?
> Thanks,
> Ron.
>
>
> P.S.: Examples of processing times are:
>
>         * 2m42.782s  on two files with combined size of    792544 bytes
>           (no matches found).
>         * 28m39.497s on two files with combined size of 4139320 bytes
>           (783 matches found).
>
>     These times are quite unacceptable, as a normal input to the program
>     would be ten files with combined size of ~17MB.
>
>
> ----------------------------------------------------------------------
> --
>
> --
> http://mail.python.org/mailman/listinfo/python-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090114/68b10a10/attachment-0001.html>