[Tutor] Extracting data between strings

richard kappler richkappler at gmail.com
Wed May 27 15:26:15 CEST 2015


I'm writing a script that reads from an in-service log file in xml format
that can grow to a couple gigs in 24 hours, then gets zipped out and
restarts at zero. My script must check to see if new entries have been
made, find specific lines based on 2 different start tags, and from those
lines extract data between the start and end tags (hopefully including the
tags) and write it to a file. I've got the script to read the file, see if
it's grown, find the appropriate lines and write them to a file. I  still
need to strip out just the data I need (between the open and close tags)
instead of writing the entire line, and also to reset eof when the nightly
zip / new log file creation occurs. I could use some guidance on stripping
out the data, at the moment I'm pretty lost, and I've got an idea about the
nightly reset but any comments about that would be welcome as well. Oh, and
the painful bit is that I can't use any modules that aren't included in the
initial Python install. My code is appended below.

regards, Richard


import time

while True:
    #open the log file containing the data
    file = open('log.txt', 'r')
    #find inital End Of File offset
    file.seek(0,2)
    eof = file.tell()
    #set the file size again
    file.seek(0,2)
    neweof = file.tell()
    #if the file is larger...
    if neweof > eof:
        #go back to last position...
        file.seek(eof)
# open file to which the lines will be appended
        f1 = open('newlog.txt', 'a')
# read new lines in log.txt
        for line in file.readlines():
            #check if line contains needed data
            if "usertag1" in line or "SeMsg" in line:

############################################################
#### this should extract the data between usertag1 and  ####
#### and /usertag1, and between SeMsg and /SeMsg,       ####
#### writing just that data to the new file for         ####
#### analysis. For now, write entire line to file       ####
############################################################

                # if yes, send line to new file
                f1.write(line)
        # update log.txt file size
        eof = neweof

###########################################################
#### need an elif for when neweof < eof (nightly       ####
#### reset of log file) that continues loop with reset ####
#### eof and neweof                                    ####
###########################################################
    file.close()
    f1.close()
    # wait x number of seconds until back to begining of loop
    # set at ten for dev and test, set to 300 for production
    time.sleep(10)


More information about the Tutor mailing list