Python script for searching variable strings between two constant strings

Steve D'Aprano steve+python at pearwood.info
Fri Aug 26 21:44:55 EDT 2016


On Sat, 27 Aug 2016 08:33 am, ddream.merchantt at gmail.com wrote:

> My log file has several sections starting with ==== START ==== and ending
> with ==== END   ====. 

Um. Is this relevant? Are you saying that you only wish to search the file
between those lines, and ignore anything outside of them? If the file looks
like:

    xxxx
    xxxx
    xxxx --operation(): AutoAuthOSUserSubmit StartOperation
    xxxx
    xxxx
    ==== START ====
    xxxx
    xxxx
    xxxx
    ==== END ====
    xxxx
    xxxx
    xxxx --operation(): AutoAuthOSUserSubmit StartOperation
    xxxx


do you expect to say that nothing is found?


I'm going to assume that you wouldn't have mentioned this if it wasn't
important, so let's start by filtering out everything outside of
===START=== and ===END=== sections. For that, we want a filter that swaps
between "ignore these lines" and "search these lines" depending on whether
you are inside or outside of a START...END section.

We'll use regular expressions for matching.


import re

START = r'''
    (?x)        (?# verbose mode)
    ={2,}       (?# two or more equal signs)
    \s*         (?# any amount of whitespace)
    START       (?# the literal word START in uppercase)
    \s*         (?# more optional whitespace)
    ={2,}       (?# two or more equal signs)
    $           (?# end of the line)
'''

END = r'={2,}\s*END\s*={2,}$'  # Similar to above, without verbose mode.

START = re.compile(START)
END = re.compile(END)

def filter_sections(lines):
    outside = True
    for line in lines:
        line = line.strip()  # ignore leading and trailing whitespace
        if outside:
            # ignore all lines until we see START
            if re.match(START, line):
                outside = False
            else:
                pass  # just ignore the line
        else:
            # pass on every line until we see END
            if re.match(END, line):
                outside = True
            else:
                yield line
            


Now you need to test that this does what you expect:


with("mylogfile.log") as f:
    for line in filter_sections(f):
        print(line)


should print *only* the lines between the START and END lines. Once you are
satisfied that this works correctly, move on to the next part: extracting
the relevant information from each line. There are three things you wish to
look for, so you want three regular expressions. I'm not being paid for
this, so here's one, the other two are up to you:

OPERATION = r'''
    (?x)                (?# verbose mode)
    --operation\(\):    (?# literal string)
    \s*                 (?# optional whitespace)
    (.*)                (?# anything at all, in a group)
    \s*                 (?# more optional whitespace)
    StartOperation      (?# another literal string)
    .*?$                (?# ignore everything to the end of the line)
'''

OPERATION = re.compile(OPERATION)

FOO = ...  # match second thing, similar to above
BAR = ...  # match third thing


Now let's extract the data we want:

def extract(lines):
    for line in lines:
        line = line.strip()
        mo = (re.match(OPERATION, line)
              or re.match(FOO, line) 
              or re.match(BAR, line)
              )
        if mo:
            yield mo.groups(0)


with open('mylogfile.log') as f:
    for match in extract(filter_sections(f)):
        print(match)




By the way, the above code is untested.






-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list