Read a record instead of a line from a file

Fri Aug 24 14:35:27 EDT 2001

So the generator stuff is just for fun, right?  I mean, this
can just as easily be expressed as a conventional buffer
object, minus the for loop application but I believe possibly
allowing a little more flexibility in other respects.

import sys

class SepFile:
    def __init__(self, infile, sep = "\n\n"):
        self.fp = infile
        self.sep = sep
        self.text = ''
    def readline(self):
        #  This function should eventually return '' on end of file.
        if self.text is None:
            return ''

        while 1:
            #  To include line ending in result, use find() and slice,
            #  instead of split().
            s = self.text.split(self.sep, 1)
            if len(s) > 1:
                ln, self.text = s
                return ln
            else:
                moretext = self.fp.read(10000)
                if not moretext:
                    #  Notice end of file.  Return the unterminated
                    #  data already here.  If that isn't empty, the
                    #  caller will come back for more, so set self.text
                    #  to short circuit the next read.
                    ln = self.text
                    self.text = None
                    return ln
                self.text = self.text + moretext

sf = SepFile(sys.stdin)
while 1:
    ln = sf.readline()
    if not ln:
        break
    print 'line:', repr(ln)

| If you want something that's really high speed, but uses the
| mxTextTools C extension, you can try my Martel parser, which
| is part of biopython.org.  The specific record readers are in
| http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Martel/Record
| Reader.py?cvsroot=biopython

mxTextTools rules.

	Donn Cave, donn at u.washington.edu