[Tutor] look back comprehensively

Mats Wichmann mats at wichmann.us
Sun Dec 23 18:40:31 EST 2018


On 11/14/18 3:01 PM, Steven D'Aprano wrote:
> On Tue, Nov 13, 2018 at 11:59:24PM -0500, Avi Gross wrote:
>> I have been thinking about the thread we have had where the job seemed to be
>> to read in a log file and if some string was found, process the line before
>> it and generate some report. Is that generally correct?
> 
> If that description is correct, then the solution is trivial: iterate 
> over the file, line by line, keeping the previous line:
> 
> previous_line = None
> for current_line in file:
>     process(current_line, previous_line)
>     previous_line = current_line
> 
> 
> No need for complex solutions, or memory-hungry solutions that require 
> reading the entire file into memory at once (okay for, say, a million 
> lines, but not if your logfile is 2GB in size). If you need the line 
> number:

Absolutely, let's not go reading everything in in bulk, Python has tried
very hard to build elegant iterators all over the place to avoid "doing
the whole thing" when you don't have to - and it has helped heaps with
what years ago used to be an indictment of Python as being "too slow":
not doing work you don't need to do is always a good thing.

The general problem is pretty common, I think, and expands a bit beyond
the trivial case.  Log files may have a start and end marker for a case
you have to examine, and the number of lines between those may be fixed
(0, 1, 2, whatever - 0 being the most trivial case) or variable - I
think that's the situation that started this thread way back, and it
comes up lot. You can have a search on Stack{Exchange,Overflow}, a
non-trivial number of people have asked.

I just now have a different scenario, similar requirement... I happen to
want to scan a bunch of Python code to locate instances of the Python
idiom for ignoring certain possible/expected error conditions:

try:
    block of code
except SomeError:
    pass


to experiment with replacing those with contextlib.suppress and see if
the team of a particular project thinks that makes code more readable:

from contextlib import suppress
...

with suppress(SomeError):
    block of code


This is pretty similar - I want to identify a multi-line sequence that
starts with "try:", has one or more lines, then ends with, in this case,
a two-line sequence where the first line starts with "except" and is
immediately followed by "pass" - but to make it more exciting, is can
then not then followed by either "else" or "finally", because if the try
block has either of those clauses, it is not a candidate for using
suppress instead.  Regexes aren't necessarily helpful on multiline
patterns, even if you ignore the jokes about regexes ("now you have two
problems")

As common as this is, I suspect there are elegant solutions that go
beyond everyone rolling their own.  I'm thinking that maybe pyparsing
has the tools to help with this kind of problem...  I may take a look
into that over then next few days since I just ended up with a personal
interest.


More information about the Tutor mailing list