Reading variable length records...
Steve Holden
sholden at holdenweb.com
Tue Sep 18 08:53:45 EDT 2001
<salvasan at yahoo.com> wrote in message
news:9o0guu$i9p$1 at news.carib-link.net...
> "Bjorn Pettersen":
> > I'm trying to read records from a 2 GB datafile ... The records are
> > variable length and are separated by a five character delimiter.
>
> The following Python class ought to do what you need and it handles the
> partial delimiter problem using a buffer that grows as needed.
>
>
> # Python class for reading a variable length record file
> # with fixed multi-character delimiter
> class VLR_File:
> def __init__(self, filename, delimiter):
> self.fhand = open(filename)
> self.buffer = ""
> self.delim = delimiter
> self.dlen = len(delimiter)
> self.eof = 0
>
> def read_record(self):
> if self.eof: return ""
> while 1:
> "read one character at a time"
> c = self.fhand.read(1)
> if not c:
> self.eof = 1
> "end of file -> return current buffer contents as last
record"
> return self.buffer
> "append to buffer until delimiter is detected"
> self.buffer = self.buffer + c
> if len(self.buffer) >= self.dlen \
> and self.buffer[-self.dlen:] == self.delim:
> "flush buffer"
> record = self.buffer
> self.buffer = ""
> return record
>
> def close(self):
> self.fhand.close()
>
>
> #main program test
> f = VLR_File("vlr_stuff.txt", "+++++")
> rno = 0
> while 1:
> line = f.read_record()
> if not line: break
> print "REC",rno
> print line
> rno = rno + 1
> f.close()
>
>
For an example of code that finds a variable-length delimiter, possibly
spanning separate blocks, take a look at the asynchat library module, which
seems to take a sound approach to the problem. Look for code dependent of
the type of self.terminator being a string type.
regards
Steve
--
http://www.holdenweb.com/
More information about the Python-list
mailing list