Reading variable length records...

Brian Quinlan BrianQ at ActiveState.com
Wed Sep 12 18:36:16 EDT 2001


> I'm trying to read records from a 2 GB datafile, but my brain has
> stopped working, so I was wondering if someone has allready 
> solved this problem. The records are variable length and are 
> separated by a five character delimiter. I was trying to use 
> file.read(n) with a blocksize of ~1Mb, but got a serious 
> brainfart when trying to think of how to handle the case where 
> only part of the delimiter was read in the current block.

Here is some pseudo-code to get you started:

data = ''
records = []

while 1:    
    readData = datafile.read(size)
    if not readData:
        break
    
    data += readData 
    partialRecords = data.split('12345')
    records += partialRecords[:-1] # Last record is incomplete
    data = records[-1]

if data:
	# Hmmm, there is still data left over, probably bad

The basic idea is that you use split to collect as many records as
possible and just keep the left-over partial record for the next
round. Let me know if you need clarification.

Cheers,
Brian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2220 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20010912/7c9c7ba6/attachment.bin>


More information about the Python-list mailing list