file reading by record separator (not line by line)
Tijs
tijs_news at bluescraper.com
Thu May 31 09:14:11 EDT 2007
Lee Sander wrote:
> I wanted to also say that this file is really huge, so I cannot
> just do a read() and then split on ">" to get a record
> thanks
> lee
Below is the easy solution. To get even better performance, or if '<' is not
always at the start of the line, you would have to implement the buffering
that is done by readline() yourself (see _fileobject in socket.py in the
standard lib for example).
def chunkreader(f):
name = None
lines = []
while True:
line = f.readline()
if not line: break
if line[0] == '>':
if name is not None:
yield name, lines
name = line[1:].rstrip()
lines = []
else:
lines.append(line)
if name is not None:
yield name, lines
if __name__ == '__main__':
from StringIO import StringIO
s = \
"""> name1
line1
line2
line3
> name2
line 4
line 5
line 6"""
f = StringIO(s)
for name, lines in chunkreader(f):
print '***', name
print ''.join(lines)
$ python test.py
*** name1
line1
line2
line3
*** name2
line 4
line 5
line 6
--
Regards,
Tijs
More information about the Python-list
mailing list