help with binary file io, perhaps with generators?

Sat Apr 23 23:58:02 EDT 2005

"Marcus Goldfish" <magoldfish at gmail.com> wrote in message 
news:5e183f3d0504231924e989ea2 at mail.gmail.com...
>I need to write a "fast" file reader in python for binary files structured 
>as:
>
> … x[0] y[0] z[0] x[1] y[1] z[1]  …
>
> where c[k] is the k-th element from sequence c.  As mentioned, the
> file is binary -- spaces above are just for visualization.  Each
> element, c[k], is a 16-bit int.

You have a sequence of machine-format triples

>   (ii) how can I handle the 16-bit word aspect of the binary data?

that you want to unpack (see struct module) and unzip (your code).

>  I can assume I know the number of sequences in the file a priori.

Then you can pre-allocate x,y,z instead of growing them by appends:

x = n*[0]; ...

> Files are stored and processed on a
> WinXP machine (in case Endian-ness matters).

I believe struct module takes care of this.  Read its doc.

>   (i) should I use generators for iterating over the file?

If you will only ever unpack when you also unzip to separate lists, then 
you might put the unpack code in the inner loop of the unzip method.  If 
you think that you might someday directly process the file as a sequence of 
scaler triples without creating intermediate x,y,z lists, then you should 
factor out the file read/unpack as you are thinking.

>   (iii) ultimately, the data will need to be processed in chunks of
>         M-values at a time... I assume this means I need some
>         form of buffered io wrapper, but I'm not sure where to start
>         with this.

I presume this means M triplets or 3*N values.
In any case, Python is ahead of you ;-).

>>>help(file.read)
    read([size]) -> read at most size bytes, returned as a string.

    If the size argument is negative or omitted, read until EOF is reached.

Terry J. Reedy