Qustion about struct.unpack

Tue May 1 11:20:41 EDT 2007

Wow, thank you all!

"Gabriel Genellina" <gagsl-py2 at yahoo.com.ar> wrote in message 
news:op.trm6zrmbx6zn5v at furufufa-ec0e13.cpe.telecentro.com.ar...
> En Tue, 01 May 2007 05:22:49 -0300, eC <ericcoetzee at gmail.com> escribió:
>
>> On Apr 30, 9:41 am, Steven D'Aprano <s... at REMOVEME.cybersource.com.au>
>> wrote:
>>> On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
>
>>> > I have a really long binary file that I want to read.
>>> > The way I am doing it now is:
>>>
>>> > for i in xrange(N):  # N is about 10,000,000
>>> >     time = struct.unpack('=HHHH', infile.read(8))
>>> >     # do something
>>> >     tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
>>>
>>> Disk I/O is slow, so don't read from files in tiny little chunks. Read a
>>> bunch of records into memory, then process them.
>>>
>>> # UNTESTED!
>>> rsize = 8 + 32  # record size
>>> for i in xrange(N//1000):
>>>     buffer = infile.read(rsize*1000) # read 1000 records at once
>>>     for j in xrange(1000): # process each record
>>>         offset = j*rsize
>>>         time = struct.unpack('=HHHH', buffer[offset:offset+8])
>>>         # do something
>>>         tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
>>>         # do something
>>>
>>> (Now I'm just waiting for somebody to tell me that file.read() already
>>> buffers reads...)
>>
>> I think the file.read() already buffers reads... :)
>
> Now we need someone to actually measure it, to confirm the expected 
> behavior... Done.
>
> --- begin code ---
> import struct,timeit,os
>
> fn = r"c:\temp\delete.me"
> fsize = 1000000
> if not os.path.isfile(fn):
>     f = open(fn, "wb")
>     f.write("\0" * fsize)
>     f.close()
>     os.system("sync")
>
> def smallreads(fn):
>     rsize = 40
>     N = fsize // rsize
>     f = open(fn, "rb")
>     for i in xrange(N):  # N is about 10,000,000
>       time = struct.unpack('=HHHH', f.read(8))
>       tdc = struct.unpack('=LiLiLiLi', f.read(32))
>     f.close()
>
>
> def bigreads(fn):
>     rsize = 40
>     N = fsize // rsize
>     f = open(fn, "rb")
>     for i in xrange(N//1000):
>       buffer = f.read(rsize*1000) # read 1000 records at once
>       for j in xrange(1000): # process each record
>         offset = j*rsize
>         time = struct.unpack('=HHHH', buffer[offset:offset+8])
>         tdc = struct.unpack('=LiLiLiLi', buffer[offset+8:offset+rsize])
>     f.close()
>
> print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import 
> fn,smallreads,fsize").repeat(3,1)
> print "bigreads", timeit.Timer("bigreads(fn)",  "from __main__ import 
> fn,bigreads,fsize").repeat(3,1)
> --- end code ---
>
> Output:
> smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
> bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]
>
> So in this sample case, reading in big chunks is about 3 times faster than 
> reading many tiny pieces.
>
> -- 
> Gabriel Genellina