Read a record instead of a line from a file
Donn Cave
donn at u.washington.edu
Fri Aug 24 14:35:27 EDT 2001
Quoth "Andrew Dalke" <dalke at dalkescientific.com>:
| YMK wrote:
|> If I know the "Record Separator" of a flat file, how do I set to read
|> one record at a time ?
|
| Here's something I just tried out using 2.2's 'yield' statement.
| (2.2 is currently in alpha release.) Warning: this is my first
| generator and I've also not fully tested it.
|
| from __future__ import generators
|
| def SepReader(infile, sep = "\n\n"):
| text = infile.read(10000)
| if not text:
| return
| while 1:
| fields = text.split(sep)
| for field in fields[:-1]:
| yield field
| text = fields[-1]
| new_text = infile.read(10000)
| if not new_text:
| yield text
| break
| text += new_text
|
| It's used like this
|
| for record in SepReader(open(fortunes), "%\n"):
| print record
So the generator stuff is just for fun, right? I mean, this
can just as easily be expressed as a conventional buffer
object, minus the for loop application but I believe possibly
allowing a little more flexibility in other respects.
import sys
class SepFile:
def __init__(self, infile, sep = "\n\n"):
self.fp = infile
self.sep = sep
self.text = ''
def readline(self):
# This function should eventually return '' on end of file.
if self.text is None:
return ''
while 1:
# To include line ending in result, use find() and slice,
# instead of split().
s = self.text.split(self.sep, 1)
if len(s) > 1:
ln, self.text = s
return ln
else:
moretext = self.fp.read(10000)
if not moretext:
# Notice end of file. Return the unterminated
# data already here. If that isn't empty, the
# caller will come back for more, so set self.text
# to short circuit the next read.
ln = self.text
self.text = None
return ln
self.text = self.text + moretext
sf = SepFile(sys.stdin)
while 1:
ln = sf.readline()
if not ln:
break
print 'line:', repr(ln)
| If you want something that's really high speed, but uses the
| mxTextTools C extension, you can try my Martel parser, which
| is part of biopython.org. The specific record readers are in
| http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Martel/Record
| Reader.py?cvsroot=biopython
mxTextTools rules.
Donn Cave, donn at u.washington.edu
More information about the Python-list
mailing list