Qustion about struct.unpack

Tue May 1 04:22:49 EDT 2007

On Apr 30, 9:41 am, Steven D'Aprano <s... at REMOVEME.cybersource.com.au>
wrote:
> On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
> > Hi!
> > I have a really long binary file that I want to read.
> > The way I am doing it now is:
>
> > for i in xrange(N):  # N is about 10,000,000
> >     time = struct.unpack('=HHHH', infile.read(8))
> >     # do something
> >     tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
>
> I assume that is supposed to be infile.read()
>
> >     # do something
>
> > Each loop takes about 0.2 ms in my computer, which means the whole for loop
> > takes 2000 seconds.
>
> You're reading 400 million bytes, or 400MB, in about half an hour. Whether
> that's fast or slow depends on what the "do something" lines are doing.
>
> > I would like it to run faster.
> > Do you have any suggestions?
>
> Disk I/O is slow, so don't read from files in tiny little chunks. Read a
> bunch of records into memory, then process them.
>
> # UNTESTED!
> rsize = 8 + 32  # record size
> for i in xrange(N//1000):
>     buffer = infile.read(rsize*1000) # read 1000 records at once
>     for j in xrange(1000): # process each record
>         offset = j*rsize
>         time = struct.unpack('=HHHH', buffer[offset:offset+8])
>         # do something
>         tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
>         # do something
>
> (Now I'm just waiting for somebody to tell me that file.read() already
> buffers reads...)
>
> --
> Steven D'Aprano

I think the file.read() already buffers reads... :)