Parsing by Line Data

Thu Jun 17 18:41:21 EDT 2004

Eddie Corns wrote:
> python1 <python1 at spamless.net> writes:
> 
> 
>>Having slight trouble conceptualizing a way to write this script. The 
>>problem is that I have a bunch of lines in a file, for example:
> 
> 
>>01A\n
>>02B\n
>>01A\n
>>02B\n
>>02C\n
>>01A\n
>>02B\n
>>.
>>.
>>.
> 
> 
>>The lines beginning with '01' are the 'header' records, whereas the 
>>lines beginning with '02' are detail. There can be several detail lines 
>>to a header.
> 
> 
>>I'm looking for a way to put the '01' and subsequent '02' line data into 
>>one list, and breaking into another list when the next '01' record is found.
> 
> 
>>How would you do this? I'm used to using 'readlines()' to pull the file 
>>data line by line, but in this case, determining the break-point will 
>>need to be done by reading the '01' from the line ahead. Would you need 
>>to read the whole file into a string and use a regex to break where a 
>>'\n01' is found?
> 
> 
> def gen_records(src):
>     rec = []
>     for line in src:
>         if line.startswith('01'):
>             if rec: yield rec
>             rec = [line]
>         else:
>             rec.append(line)
>     if rec:yield rec
> 
> inf = file('input-file')
> for record in gen_records (inf):
>     do_something_to_list (record)
> 
> Eddie

Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)