file reading by record separator (not line by line)

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Thu May 31 09:41:57 EDT 2007


In <1180614374.027569.235540 at g4g2000hsf.googlegroups.com>, Lee Sander
wrote:

> Dear all,
> I would like to read a really huge file that looks like this:
> 
>> name1....
> line_11
> line_12
> line_13
> ...
>>name2 ...
> line_21
> line_22
> ...
> etc
> 
> where line_ij is just a free form text on that line.
> 
> how can i read file so that every time i do a "read()" i get exactly
> one record
> up to the next ">"

There was just recently a thread with a `itertools.groupby()` solution. 
Something like this:

from itertools import count, groupby, imap
from operator import itemgetter

def mark_records(lines):
    counter = 0
    for line in lines:
        if line.startswith('>'):
            counter += 1
        yield (counter, line)


def iter_records(lines):
    fst = itemgetter(0)
    snd = itemgetter(1)
    for dummy, record_lines in groupby(mark_records(lines), fst):
        yield imap(snd, record_lines)


def main():
    source = """\
> name1....
line_11
line_12
line_13
...
> name2 ...
line_21
line_22
...""".splitlines()

    for record in iter_records(source):
        print 'Start of record...'
        for line in record:
            print ':', line

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list