python and bit shifts and byte order, oh my!

Fri Sep 10 19:45:54 EDT 2004

On Fri, 10 Sep 2004 15:51:33 -0500, Reid Nichol <rnichol_rrc at yahoo.com> wrote:

>Jason Lai wrote:
>> If efficiency isn't important, you could forget about the whole 
>> byte-order thing and just read/write it byte-by-byte. Then you can think 
>> of the file as a bit-stream (everything gets written in order and read 
>> back in order), although you still have to read/write a whole 8-bit byte 
>> at a time.
>> 
>>  - Jason Lai
>
>Since the format can have:
>5bit
>24bit
>24bit
>
>I assumed that I would have to write byte by byte.  And I don't really 
>consider speed important so I think that it's viable to do it this way.
>
>@Grant
>This is what I meant.

I would suggest you define a class (e.g., subclass the builtin file type)
that serves as a convenient (for you) bit-wise interface to a binary file
(binary is important on windows, or you will EOL conversions when you write).
E.g., so you will be able to write code like:

    bf = BitFile('data/bitfile.dat', 'wb')
    bf.write(0xfa, 5)
    bf.write(whatever, 24)
    bf.close()

The class will have to take care of buffering and packing and unpacking and endianness
and how to deal with a file that is not an integral number times 8 bits total (if you
are defining the format, you could always append an extra byte on close that says how many bits
there are in the last (preceding) data byte, so you could read back exactly the bits specified).

You could also give the class properties for common bit field widths, so that the
effect of e.g., the above writes would look like

    bf.b5 = 0xfa
    bf.b24 = whatever

would be to write (actually buffer, since you have to do that for fractional bytes
anyway, and will gain in i/o performance for larger chunks) five bits. On the read
side, you might want to distinguish between signed and unsigned bitfields, e.g.,

    signed = bf.s5   # read next 5 bits as signed integer
    unsigned = bf.u5 # ditto, except unsigned

Of course, packing bits together from a sequence of numbers into a string of bytes has
nothing necessarily to do with file i/o, so you might want to factor that out. E.g.,
you could take inspiration from struct to create something that works by bit fields, e.g.,
say '.n' means pack n bits adjacent to previously buffered bits. Say ',n' means skip n
bits as if you were reading or writing (introducing default 0 if not re-writing), and then
use the struct type letters for alignment skips, e.g., 'h' to skip to end of current short,
or 'l' to skip to end of current long. Then

    pack('<.3,2.7h.24l', x, y, z)

could be a little-endian packing of size(short)+size(long) bits, with two fields
x and y of 3 and 7 bits respectively, separated by a 2-bit space, packed into a short,
followed z packed into the bottom of a long, for six bytes total.

Probably pack should be a class so that you get back an object that has both data bytes
and total bit length and methods for convenient concatenation, so

    pack('.3', 10) + pack('.4', 15) == pack('.3.4', 10, 15)

Sorry I don't have time to implement this now (actually, I have a strictly-little-endian
hack that I used for some music compression experiments a while back, maybe I can find
it later). API preferences could probably stand a little discussion anyway ;-)

Regards,
Bengt Richter