Qustion about struct.unpack

Steven D'Aprano steve at REMOVEME.cybersource.com.au
Mon Apr 30 04:41:33 EDT 2007


On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:

> Hi!
> I have a really long binary file that I want to read.
> The way I am doing it now is:
> 
> for i in xrange(N):  # N is about 10,000,000
>     time = struct.unpack('=HHHH', infile.read(8))
>     # do something
>     tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

I assume that is supposed to be infile.read()


>     # do something
> 
> Each loop takes about 0.2 ms in my computer, which means the whole for loop 
> takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.


> I would like it to run faster.
> Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32  # record size
for i in xrange(N//1000):
    buffer = infile.read(rsize*1000) # read 1000 records at once
    for j in xrange(1000): # process each record
        offset = j*rsize
        time = struct.unpack('=HHHH', buffer[offset:offset+8])
        # do something
        tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
        # do something


(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)


-- 
Steven D'Aprano 




More information about the Python-list mailing list