Parsing by Line Data

Eddie Corns eddie at holyrood.ed.ac.uk
Thu Jun 17 13:43:56 EDT 2004


python1 <python1 at spamless.net> writes:

>Having slight trouble conceptualizing a way to write this script. The 
>problem is that I have a bunch of lines in a file, for example:

>01A\n
>02B\n
>01A\n
>02B\n
>02C\n
>01A\n
>02B\n
>.
>.
>.

>The lines beginning with '01' are the 'header' records, whereas the 
>lines beginning with '02' are detail. There can be several detail lines 
>to a header.

>I'm looking for a way to put the '01' and subsequent '02' line data into 
>one list, and breaking into another list when the next '01' record is found.

>How would you do this? I'm used to using 'readlines()' to pull the file 
>data line by line, but in this case, determining the break-point will 
>need to be done by reading the '01' from the line ahead. Would you need 
>to read the whole file into a string and use a regex to break where a 
>'\n01' is found?

def gen_records(src):
    rec = []
    for line in src:
        if line.startswith('01'):
            if rec: yield rec
            rec = [line]
        else:
            rec.append(line)
    if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
    do_something_to_list (record)

Eddie



More information about the Python-list mailing list